Dense time-course gene expression profiling of the Drosophila melanogaster innate immune response

Background Immune responses need to be initiated rapidly, and maintained as needed, to prevent establishment and growth of infections. At the same time, resources need to be balanced with other physiological processes. On the level of transcription, studies have shown that this balancing act is reflected in tight control of the initiation kinetics and shutdown dynamics of specific immune genes. Results To investigate genome-wide expression dynamics and trade-offs after infection at a high temporal resolution, we performed an RNA-seq time course on D. melanogaster with 20 time points post Imd stimulation. A combination of methods, including spline fitting, cluster analysis, and Granger causality inference, allowed detailed dissection of expression profiles, lead-lag interactions, and functional annotation of genes through guilt-by-association. We identified Imd-responsive genes and co-expressed, less well characterized genes, with an immediate-early response and sustained up-regulation up to 5 days after stimulation. In contrast, stress response and Toll-responsive genes, among which were Bomanins, demonstrated early and transient responses. We further observed a strong trade-off with metabolic genes, which strikingly recovered to pre-infection levels before the immune response was fully resolved. Conclusions This high-dimensional dataset enabled the comprehensive study of immune response dynamics through the parallel application of multiple temporal data analysis methods. The well annotated data set should also serve as a useful resource for further investigation of the D. melanogaster innate immune response, and for the development of methods for analysis of a post-stress transcriptional response time-series at whole-genome scale. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07593-3.


Background
Upon microbial infection, Drosophila launch rapid and efficient immune responses that are crucial to survival. However, immune responses are energetically costly [1] because they draw resources from other physiological processes [2,3] such as metabolism, reproduction, and environmental stress responses. An excessive or overly prolonged immune response can lead to metabolic dysregulation, causing wasting in mammals and flies [4]. Furthermore, it has been shown that allocating resources to the immune system reduces resources for mating [5,6], and the opposite is also true, where mating reduces survivorship after infection and decreases resistance to infection [7][8][9]. This represents a trade-off where limited resources need to be allocated to either the immune response or reproduction [10]. Therefore, we expect that natural selection will act to tune the immune response to strike a balance between the advantage of a rapid and robust ability to fight infection, and the costly sideeffects of an over-prolonged immune response. This tuning is likely to be mediated through a series of regulatory and feedback properties of the immune system of the fly.
While gene expression has been examined at several time points after infection in Drosophila [11][12][13], the dynamics of this immune response have not yet been studied with high temporal resolution. A high-resolution time-course analysis can help profile with more certainty the types of expression dynamics that different genes and pathways undergo after infection. Dense and extended time-course sampling of gene expression of the immune response can allow us to distinguish between transient and sustained expression patterns, where expression of genes with a transient response to perturbation will return back to normal after a certain period of time, while expression of genes with a sustained response will remain at a different level of expression compared to pre-perturbation levels. This kind of temporal profiling of the immune response, coupled with computational modeling of gene interaction networks, can also suggest candidates to examine for possible interactions and trade-offs between the immune response and other physiological processes.
Statistical analysis of such high-dimensional longitudinal time-course omics data is not straightforward. While the problems of detecting differentially expressed (DE) genes and inferring gene interaction networks from gene expression data are common in genomics, computational methods have focused primarily on crosssection rather than time-course data. Most popular methods to analyze static RNA-seq datasuch as edgeR [14] or DESeq2 [15] are not ideal for dealing with time-course RNA-seq data since they do not directly model the correlation of genes between successive time points [16,17]. Smooth polynomial or spline based models of temporal dependence in gene expression, such as those employed in Limma-Voom [18] and maSigPro [19,20], can fail to capture early impulses in stress response situations, as we highlight in this paper. Also, joint network inference of temporal associations among many genes requires tackling high-dimensionality, an aspect that has not received much attention in the literature. Because there is not one consensus method for the analysis of time-course RNA-seq data, it is important to ensure robustness of findings across different types of computational modeling techniques.
In this study, we performed a dense time-course RNAseq analysis of the Drosophila transcriptional response to commercial E. coli-derived crude lipopolysaccharide (LPS), which activates the Imd pathway [21], to better understand the dynamics of activation and resolution of the innate immune response. Flies were sampled over 5 days generating a total of 20 time points post-LPS injection with an additional time point pre-injection as a baseline control. We analyzed the resulting longitudinal RNA-seq dataset using a broad range of statistical methods, including a cross-sectional and a dynamic method for differential expression (DE), clustering, and multivariate Granger causality [22], a method to investigate lead-lag relationships among DE genes. We found that commercial LPS exposure has a major impact on the expression of not only immune genes, but also genes involved in metabolism and replication stress. Clustering analysis showed that both the onset and persistence of expression changes varied across these DE genes. Clustering analysis further suggested a role in the immune response and circadian rhythm for several previously uncharacterized genes. Finally, throughout our analyses we observed a theme of interplay and trade-off between the immune response and metabolism.

High-resolution profiling of gene expression after Imd stimulation
To generate a full transcriptional profile of gene expression dynamics in Drosophila melanogaster after Imd stimulation, we injected adult male flies with commercial E. coli-derived crude lipopolysaccharide (LPS). While pure E.coli LPS does not induce an immune response in Drosophila, the peptidoglycan contamination in crude LPS preparations consistently activates the Imd pathway [21]. This was also confirmed using qPCR (see Methods). In the remaining text, we refer to this treatment as "Imd stimulation". Using commercial LPS instead of living bacteria gives the advantage of avoiding the confounding effects of a growing and changing population of pathogens, and of the mechanisms the bacteria use to circumvent immune responses [23].
Flies were sampled in duplicate for a total of 21 time points throughout the course of 5 days, which includes an uninfected un-injected baseline sample as control at time zero, and 20 time points after injection. Since this is a perturbation-response experiment, denser sampling occurred at early time points [16], with the first 13 time points taken within the first 24 h (1, 2, 3, 4, 5, 6, 8, 10, 12, 14, 16, 20, and 24 h). Sampling is also essential at later time points to know how long it takes to return to baseline, and to differentiate between transient and sustained responses [16]. For this reason, sampling continued until day 5 after Imd stimulation, although more sparsely (30,36,42,48,72,96, 120 h) (Fig. 1a). For this dataset we obtained 41 high-quality libraries with an average of 23.5 million mapped reads per sample. After normalization of libraries, only genes with more than 5 counts in at least 2 samples were kept, leaving 12,657 genes for further analysis.
Principal components analysis (PCA) on the 500 genes with highest row variance across all time points revealed a horseshoe temporal trend, with the control samples clustering in the middle, and the post-injection time points following a horseshoe-shaped track, consistent with a pattern of many genes displaying a coordinated change over the five-day interval (Fig. 1b). This type of "horseshoe" or arch temporal trend in PCA has been seen in other time-series experiments [18,[24][25][26], and is commonly seen in spatial population genetic variation [27] and in ecological gradient data that varies in a nonlinear manner [28]. PC1, PC2, and PC3 captured 35, 15, and 14.5% of the variance in gene expression respectively, and the first six PCs account for over 80% of the total variance in the data.
Proper normalization of the data was assessed by confirming the behavior of known Drosophila housekeeping genes across time (Qiagen Housekeeping Genes RT 2 Profiler PCR Array and [29]). As expected, housekeeping genes showed little change across time ( Figure S1A). The success of the Imd stimulation was confirmed by the immediate up-regulation of known immune response genes within the first time points ( Figure S1B).
Spline modeling and pairwise comparisons identify 951 genes that are differentially expressed over time following Imd stimulation To identify genes whose expression levels were significantly altered across the time course, we employed two methods. First, we used gene-wise linear models to fit cubic splines with time, on both the first 8 h and first 48 h after commercial LPS exposure. Second, because we noticed that certain expression patterns were not adequately described using cubic splines (as discussed below), we also characterized the temporal patterns of expression by estimating the differential expression of every gene at each time point, from 1 to 48 h, compared to the un-infected un-injected control samples at time zero (also referred to as baseline).
Cubic spline fits identified a total of 411 DE genes, based on a 5% False Discovery Rate (FDR) using the Benjamini-Hochberg method [30] (Table S1). Of these 411 genes, 31 genes were detected only using short spline fits on the first 8 h post-injection. Long spline fits on the first 48 h post-injection identified 363 genes, and 17 genes were identified using both short and long spline fits (Fig. 2a). Long spline fits excelled at identifying gradual changes and global patterns, such as the ones shown by the dynamics of two genes involved in galactose metabolism (Gale and Galk, Fig. 2b) which are gradually downregulated after stimulation but later recover. However, long spline fits failed to detect early impulse patterns, such as those observed in the known immune response genes AttA and DptB (Fig. 2c), which were better captured by short spline fits on the first 8 h post-injection. Still, even short spline fits failed to identify additional known immune genes with early impulse patterns, such as CecC and CecB. We also fit third degree polynomials using the R package maSigPro [19,20]. This approach identified many DE genes that had been selected using spline modeling ( Figure S2), but similarly failed to adequately describe early impulse patterns. Based on these observations, we also used pairwise comparisons to identify additional DE genes whose trajectories were not well described using cubic splines. Pairwise comparisons identified 729 DE genes that were significantly (FDR < 0.05) up-or down-regulated by a log 2 fold change (log 2 FC) of at least 1 (i.e.: 2 times higher than baseline) in at least one time point throughout the first 48 h after injections (Table S1). Within this gene set, there were 214 genes that were up-or down-regulated by at least 2 log 2 FC (4 times higher than baseline), in at least one time interval after injection ( Figure S3, Table  S1). Of these 214 genes, 91 "core" DE genes underwent at least a 2 log 2 FC in expression in at least two time intervals after injection, with a more stringent FDR < 0.01 (Fig. 3a). Among the most strongly induced genes were known immune genes DptB, AttC, Mtk, Dro, DptA, and edin (bottom of Fig. 3a and Figure S4A). These genes underwent an expression change of approximately 5 log 2 FC (32 times higher than baseline) and remained elevated up until 48 h after Imd stimulation. Further investigation of the 91 core genes showed that the number of up-regulated genes was much higher than the number of down-regulated genes across all time points (Fig. 3b). Eleven of the up-regulated genes at each time point were known immune genes, as identified by a list of immune genes curated in Early et al. [31]. Within these 91 core DE genes, we also found circadian rhythm genes period  (per), timeless (tim), takeout (to), and vrille (vri), which when plotted against time exhibit the classic 24 h periodic expression of the circadian rhythm ( Figure S4B). Of a total of 951 DE genes, 189 genes were identified as differentially expressed using both pairwise methods and spline modeling, but 762 out of 951 genes were identified using only one of these methods, indicating the importance of using complementary methods for the analysis of time course RNA-seq data (Fig. 2d).
Gene ontology and gene set analysis demonstrate a divergence in expression between immune and metabolic processes after Imd stimulation To understand the biological functions of genes whose expression is influenced by Imd stimulation, we performed both a Gene Ontology (GO) and Gene Set Analysis. GO analysis is a useful tool to illustrate the functions of genes with significant differential expression over time, in this case 951 DE genes selected using spline fitting and/or pairwise contrasts. However, focusing only on the top-scoring genes can lead to missing biologically relevant signals from genes with modest expression changes. Furthermore, GO analysis does not take into account expression changes over time. Both of these limitations are addressed by Gene Set Analysis, which searches for enriched pathways (Gene Sets) across all 12,657 genes in the dataset, guided by their log 2 fold changes for all available time points.
Of the 951 DE genes, we identified 20 genes that encode known or putative transcription factors, based on the FlyTF database [32] (Table S2). Seven of these twenty genes have a fast impulse of up-regulation, reaching their maximum expression in the first two hours following Imd stimulation (Rel, Dif, CrebA, luna, Ets21C, Hr38, and stripe; Fig. 4a). Rel and Dif encode downstream components of the Imd and Toll pathways respectively, both involved in the activation of the immune response [33][34][35][36]. Of these two transcription factors, Rel undergoes the strongest up-regulation, consistent with activation of the Imd pathway by Gram-negative peptidoglycan in commercial LPS, and remains up-regulated at 1 log 2 FC even after 48 h. Ets21C encodes a stress-inducible transcription factor, and Hr38 and stripe are the two most robust activity-regulated genes (ARGs, defined as genes that are rapidly induced upon stimulation of neurons, mostly within an hour) in Drosophila [37]. Three genes encoding transcription factors had oscillating expression patterns over time and are involved in the regulation of the circadian clock (vri, clk, Pdp1; Figure S5A [38,39];). One gene of interest, p53, involved in the response to genotoxic stress ( Figure S5B [40];), reached its maximum upregulation later, at 6 h after injection.
Overall, the GO analysis indicates that the flies manifest a robust immune response, as the gene expression changes are consistent with known expression profiles of immune response deployment in Drosophila [11,12]. In addition, the GO analysis demonstrates that the response to Imd stimulation also affects metabolic homeostasis.
Supporting these results, Gene Set analysis across all 12, 657 genes and all time points showed that the top upregulated pathways were all related to immune response, defense response to bacteria, and peptidoglycan functions (Fig. 4b). Within these we found pathways related to defense response against both Gram-negative and Grampositive bacteria. While the commercial LPS used for injections is derived from the outer membrane of Gramnegative bacteria, which activates the Imd pathway, the injections themselves also result in septic injury, which is known to activate both Gram-positive and Gram-negative immune pathways (Toll and Imd pathways correspondingly) [41]. Among down-regulated pathways we found many metabolism-related functions. Three of these pathways (glycogen metabolic process, triglyceride biosynthetic process, and gluconeogenesis) are highlighted in Fig. 4c-d and S5. The glycogen pathway down-regulation pattern was driven by genes Fatty acid synthase 1 (FASN1), and UGP, which encodes a UTP--glucose-1-phosphate uridylyltransferase (Fig. 4c). Down-regulation of the triglyceride pathway was driven by FASN1 and minotaur (mino), a glycerol-3-phosphate 1-O-acyltransferase (Fig. 4d). Finally, the gluconeogenesis pathway down-regulation was driven by fructose-1,6-bisphosphatase (fbp), a rate limiting enzyme for gluconeogenesis [42] ( Figure S6). These metabolic genes reached their lowest expression within the first 6 h after Imd stimulation, and mostly recovered to baseline levels by hours 12-24.
Clustering of temporal profiles highlights differences in the initiation and shutdown of immune and metabolic genes and demonstrates a regular rhythm of circadian clock genes GO and Gene Set Analysis illuminated functions of genes that respond to Imd stimulation, and indicated a trade-off between immune and metabolic processes. However, both GO and Gene Set Analysis are based on prior knowledge of gene function. Clustering of genes based only on their expression profiles is not influenced by prior annotations. Such an unbiased approach can thus identify responses of poorly annotated genes. In addition, clustering can illustrate how gene expression trajectories differ over time. We performed three analyses to characterize temporal profiles. First, we performed hierarchical clustering based on Pearson correlation on a set of 551 predominant time-dependent genes to identify major expression patterns over time.
These 551 genes included the 411 genes identified using spline modeling, and 214 genes with at least a 2 log 2 fold change in expression as identified using pairwise comparisons (Table S1). Second, we performed clustering based on autocorrelation on these 551 genes. As opposed to Pearson correlation or Euclidean distance, an autocorrelation function takes the ordering of time points into account, allowing us to identify more detailed characteristics of gene expression profiles in time series. Third, because circadian rhythm genes were not apparent in the clusters identified using the previous methods, but were expected to be present in our dataset, we used the R package JTK_Cycle [43] to identify genes with 24 h cycling patterns among all genes in the dataset. We were interested in these patterns since the circadian clock is known to regulate the expression of immune genes [44], and in turn, infections are known to influence the flies' circadian rhythm [45]. First, expression profiles of the 551 predominant timedependent genes fell into four main hierarchical clusters (Fig. 5a). Clusters 1 and 2 both had a strong increase in expression after Imd stimulation (Fig. 5b). Cluster 1 had a more immediate increase in expression, reaching a maximum within the first 2 h. Cluster 2, on the other hand, reached a maximum expression later, at around 9 h. Cluster 1 showed significant enrichment of GO terms for immune and stress response related processes, and contained Attacins and Cecropins, as well as Heat Shock protein family genes. Cluster 2 was enriched for GO terms for abiotic stimulus response, and contained among others Bomanins, Daisho genes, as well as genes from the Turandot family (Fig. 5c). Clusters 3 and 4 were characterized by an initial decrease in expression followed by an increase after 3 h and 6 h respectively (Fig. 5b), with cluster 4 showing a stronger decrease in expression in the early hours after Imd stimulation. These clusters had a significant enrichment of GO terms for biosynthetic, catabolic, and metabolic processes (Fig.  5c), and their down-regulation again indicates a tradeoff between metabolism and the initiation of an immune response.
Our second clustering analysis based on autocorrelation revealed additional differences regarding the initiation and resolution of gene expression after Imd stimulation. First, we identified a cluster of genes with an immediate and sustained up-regulation. This cluster was characterized by a strong early induction with an up-regulation of~2.5 to 6 log 2 FC within the first hour (6 to 64 times higher than baseline), reaching a maximum of 6 to 8.5 log 2 FC (64 to 362 times higher than baseline), and maintaining persistent up-regulation of2 .5 to 6 log 2 FC throughout 5 days (Fig. 6a). This cluster contained canonical immune response genes known to be activated downstream of Imd, such as AttA, AttB, AttC, DptA, DptB, Dro, edin, Mtk, PGRP-SB1, PGRP-SD. The cluster further contained IBIN (Induced by Infection), whose exact mode of action is unknown, but whose up-regulation stimulates starch catabolism as part of an immune-induced metabolic switch, likely to make free glucose available to circulating immune cells [46]. Finally, this immediate-response cluster also contained Mtk-like, CG43920, and CG45045, which are less characterized transcripts known to be up-regulated after bacterial infection [47].
Autocorrelation-based analysis also identified clusters of genes with transient responses to infection (Fig. 6b-c). One of these clusters was composed of Bomanins (BomS1, BomS2, BomS3, BomBc1), Daisho genes (Dso1, Dso2), and two Baramicin gene family members, CG33470 (BaraA1) and IMPPP (BaraA2) (Fig. 6b). The Bomanins are located in the 55C4 region of chromosome 2R, confer resistance against certain bacteria and fungi and are regulated by Toll signaling [48]. Dso1 and I were also reported to be regulated by Toll signaling and aid defense against specific filamentous fungi [49]. Finally, CG33470 (BaraA1) and IMPPP (BaraA2) were also shown to be regulated by Toll signaling [50][51][52]. These two genes show nearly identical gene counts in our dataset, which fits with the finding that they likely arose through gene duplication [52]. This cluster was characterized by an early induction (but not as immediate as the cluster containing known Imd-responsive  Fig. 6a) of~2.5 to 3.5 log 2 FC (6 to 11 times higher than baseline) within the first two hours, reaching a max of~2.5 to 5 log 2 FC, and returning to a steady state after 3-5 days. Thus, clustering analysis identified effector immune genes partitioned by Imd vs Toll signaling: Imd-regulated genes showed an immediate early sustained up-regulation even after 5 days (Fig. 6a), while Toll-regulated Bomanins and Daisho genes had an early up-regulation that eventually returned to steady state levels (Fig. 6b).
A final cluster illustrated a more complex expression pattern: many genes in this cluster were down-regulated immediately, 1-2 h after injection, after which they were up-regulated, reaching their maximum expression after 8-12 h, followed by a return to baseline after 2-3 days (Fig. 6c). This cluster was composed of genes from the stress-induced Turandot family (TotA, TotB, TotC, and TotX) [53] as well as Diedel, Grik, lectin-24A, NimB3, BomT2, CG11459, and CG30287. Diedel encodes an immunomodulatory cytokine known to down-regulate the Imd pathway. Grik encodes a glutamate receptor, and Lectin-24A encodes a pattern recognition receptor that mediates pathogen encapsulation by hemocytes [54]. Lectin-24A has been shown to be down-regulated in the first 2 h following septic injury and then up-regulated 9 h after [55], consistent with the pattern we see in our data. NimB3 is part of the Nimrod gene family, which is involved in phagocytosis [56]. BomT2 is part of the Bomanin gene family (like BomS1, BomS2, BomS3, and BomBc1, expressed in the previous cluster, Fig. 6b). CG11459 encodes a predicted cathepsin-like peptidase induced by bacterial infection and injury [57]. CG30287 encodes a predicted serine protease, a class of proteins that plays roles in immune response proteolytic cascades [58].
Finally, using JTK_cycle, we identified 22 periodic genes with a 24 h cycle, using a cutoff of Benjamini-Hochberg corrected Q-value < 0.05 and amplitude > 0.5 (Fig. 7). Among them were four well characterized circadian genes, suggesting that their periodicity was not affected by Imd stimulation: period (per), takeout (to), vrille (vri), and PAR-domain protein 1 (Pdp1), as well as eight genes which do not have assigned circadian functions but have evidence of cyclic behavior in previous literature (Table 1), and 10 genes, of which 8 are uncharacterized, that have not yet been reported to have cyclic expression outside this study (Table 1; CG10560, Sgroppino, CG15253, CG15254, CG18493, CG31321, CG33511, CG34134, CG42329, salt).
Overall, the combination of clustering methods augmented by GO analysis allowed us to identify strong temporal patterns that correspond to early and late induction of immune processes, as well as both transient and sustained responses to infection, which point to a trade-off between the immune response and metabolism. We found that genes that share functions often have similar temporal expression patterns, suggesting coregulation. This observation further allowed us to assign Fig. 6 Clusters of genes identified using autocorrelation. Network nodes represent genes; network edges represent the distance between gene autocorrelations, based on ACF analysis using TSclust. a A cluster containing multiple Imd-responsive genes shows sustained expression after Imd stimulation throughout 5 days (120 h). b Toll-responsive genes show a transient response to Imd stimulation. c Other stress response genes return to steady state by day 5 (120 h) post Imd stimulation putative functions to previously uncharacterized genes that cluster together with well-studied genes.

Gene interaction modeling of lead-lag patterns using Granger causality
Clustering methods based on a single gene's autocorrelation or cyclicity patterns can detect genes with similar expression profiles. However, these methods are not suitable for seeking causal relationships between genes that manifest in a lead-lag relationship, for instance when high expression of gene A results in a high expression of gene B shortly afterwards. Detecting such lead-lag patterns (Fig. 8a) is a unique advantage of dense time-course experiments. Granger causality (GC), a statistical method popular in analysis of macroeconomic time series, provides an ideal framework for modeling such patterns and building directed networks among genes. The concept of GC is based on predictability. If the knowledge of the past of one time series improves the prediction of a second one, the first is said to be Granger causal (GC) for the second. Bivariate GC analysis between two genes A and B, as described above, does not account for possible confounding effects of other genes C, D, E which can also influence genes A and B (Fig.  8b). Multivariate GC analysis alleviates this problem by explicitly accounting for the effects of the confounding genes by a joint modeling [59,60], but does not account for high-dimensionality and consequently cannot jointly model hundreds of genes based on tens of data points. We used modern high-dimensional methods (viz LASSO [61] and de-biased LASSO [62,63]) to address this problem and build lead-lag network models among 258 genes.  We constructed directed GC edges and networks of putative interactions among a subset of 258 genes (Table  S1). These genes had a |log 2 FC| > 1 (at least 2 times higher or lower than baseline) across the time course and had available functional annotations. We performed Granger causality analysis on sliding windows of 6 time points on the normalized counts of both replicates using bivariate and multivariate methods (see Methods). We investigated both positive and negative edges, reflecting positive and negative lagged correlations between genes. The overall unfiltered GC network has a multitude of relationships worth exploring, but limitations in the ability to distinguish different types of causality make widespread conclusions from the network challenging. Here, we discuss several examples of subnetworks which illustrate putative functional relationships among genes whose expression changes in response to Imd stimulation.
Based on our interest in identifying trade-offs between biological processes in infected animals, we first constructed a high-quality set of consistently significant GC edges of divergent expression (negative edges). To this end we first filtered the subnetwork by (a) removing all edges with a positive weight, (b) removing all nodes corresponding to cyclic genes identified earlier through the JTK_Cycle method, (c) using only pairs of nodes with significant edges (Benjamini Hochberg FDR < 0.05%) in at least 3 consecutive windows within the first 24 h of the time course. After filtering, the resulting high-quality GC network contained 51 nodes and 35 edges in 16 connected components ( Figure S7). This network, by design, should include the most interesting examples of divergent expression changes from our full dataset.
The largest connected component in this network (Component #1) is a multifunctional chain of 6 genes, which connects the down-regulation of four metabolic genes with the up-regulation of two genes that are involved in regulating proliferation and repair (Fig. 9a). Two of the metabolic genes, Sorbitol dehydrogenase 1 (Sodh-1) and UGP, both lead the divergent expression of Claspin (both 4 consecutive windows, 2 to 10 and 4 to 12, respectively) ( Fig. 9c and S8A). Claspin plays a role Fig. 8 Diagram describing the process of constructing directed networks from Granger causality. a Lagged correlated expression between two genes (Granger causality) leads to the construction of a directed edge between two genes (nodes), which in turn is used to build directed leadlag network models of putative interactions among genes. Edges can be positive or negative, based on the sign of lead-lag correlation between the two genes. b Bivariate associations are calculated between two genes at a time, while multivariate associations adjust for potential indirect association from all other genes in the gene set in DNA replication stress [64]. It is known that there is an interplay between host immune systems and replication stress [65]. The immune system can detect and respond to replication stress, which is an important feedback loop necessary to remove defective cells [66]. Furthermore, the activation of the immune response generates reactive oxygen species (ROS) and reactive nitrogen species (RNS), and can promote chronic inflammation, all of which can trigger DNA damage [67]. UGP and fbp were identified earlier during Gene Set Analysis to drive the down-regulation of metabolic pathways ( Fig.  4b and d), and in this cluster they are both negatively directed by LpR2 (3 consecutive windows, 6 to 13) (Fig. 9d  and S8B). LpR2 is a lipophorin receptor, known to regulate the innate immune response by clearing serpin protease complexes from the hemolymph through endocytosis [68]. Lipophorin is a known humoral factor that contributes to clot formation [69,70]. Finally, LpR2 is also shown to negatively direct juvenile hormone acid methyltransferase (jhamt) (4 consecutive windows, 1 to 9) (Fig. 9e). JHAMT is an enzyme that activates juvenile hormone (JH) precursors at the final step of the JH biosynthesis pathway in insects [71]. JH is a known hormonal immunosuppressor in Drosophila [72][73][74].
Interestingly, Claspin was identified to be part of the same pathway as Orc1 in our previous Gene Set Analysis, showing similar patterns and window of upregulation (mitotic DNA replication checkpoint pathway, Figure S9). In our network, Orc1 is part of an isolated edge with metabolic gene ABGE (Component #2, 4 Fig. 9 High-quality GC network components and their edges. a Components #1 and #2 from GC network ( Figure S4). b Diagram summarizes interplay between main represented pathways on the selected components. c-f Selected edges from the components plotted against time. Significant windows colored in blue, non-significant colored in grey. Resulting overall consecutive windows are labeled in blue dashed rectangles. Individual windows represent 6 consecutive time points, but because time points are not at regular intervals, the windows have different time ranges, but identical numbers of samples consecutive windows, 4 to 12) (Fig. 9a and e). These prioritized subnetwork components suggest an interplay between metabolic pathways and other pathways such as proliferation and repair (Fig. 9b), motivating follow-up studies to determine which pathways might be regulating and trading off with each other in the hours following Imd stimulation.
In addition to these purely negative edges, we detected highly significant positive and negative edges among circadian rhythm genes. These included cryptochrome and Smvt (6 consecutive windows, 6 to 16) ( Figure S10A), vrille and takeout (4 consecutive windows, 9 to 17) (Figure S10B), period and takeout (4 consecutive windows, 9 to 17) ( Figure S10C), and Smvt and takeout (4 consecutive windows, 9 to 17) ( Figure S10D). Smvt is predicted to encode a sodium-dependent multivitamin transporter, and takeout influences feeding behavior [75,76]. Metabolic processes and feeding are known to be under circadian control [77]. In addition, So et al. [78] reported that takeout is regulated by the circadian clock, but with a phase shift relative to period. This pattern is clearly visible in our dataset and was correctly identified using Granger causality. This shows that Granger causality can be used to infer gene dependencies/interactions using global gene expression behavior.
Finally, among genes connected only by positive edges, we identified an edge from period, a regulator of the circadian clock [79,80], to Rhodopsin 5, which encodes a G-protein-coupled receptor involved in phototransduction ( Figure S11A). Rh5 mRNA levels are known to demonstrate a cyclic pattern [81], indicating regulation by the circadian clock. We further identified positive edges between genes that are likely co-regulated. These included edges between up-regulated genes that respond to NF-κB signaling, such as edges from genes encoding peptidoglycan recognition receptors PGRP-SD and PGRP-SB1 (regulated by Imd signaling), to DptB and AttC (also regulated by Imd signaling) and to BomS1, Dso1, and BomBc1 (regulated by Toll signaling) ( Figure  S11B-F). We also observed edges between downregulated genes, such as from Hao (predicted to play a role in lactate oxidation) to AGBE (a predicted hydrolase involved in glycogen synthesis) ( Figure S11G). While the expression of these genes likely responds to similar signals, the observed lags between these genes' expression profiles suggest that there are differences in their transcriptional control, such as regulation by cofactors, differences in promoter affinity for certain transcription factors, or other variables that influence the rate of mRNA accumulation.

Discussion
We have produced a dense and high-quality time-course profiling of the Drosophila transcriptome response after Imd stimulation through commercial LPS injection, using RNA-seq sampling over 20 time points spanning five days. This profiling provides a high-dimensional dataset, which is available as a resource for the community. We analyzed this dataset using a broad range of statistical methods, including Granger causality, to investigate lead-lag relationships between genes. Because of the high dimensionality, it is not straightforward to analyze a time series, as illustrated by the partially distinct results of spline fitting and pairwise comparisons. However, using a combination of analytical methods allowed us to identify distinct patterns with high confidence, specifically responses to Imd stimulation with divergent initiation and resolution dynamics, as well as cyclic patterns of gene expression, and patterns of coregulation and trade-offs. Below, we describe and discuss the main insights from these analyses, as well as limitations and future steps.

Immune and stress response genes vary in their initiation and resolution dynamics after Imd stimulation
Clusters of genes demonstrated distinct activation kinetics after stimulation of the immune response. This phenomenon has been observed both in fly [11] and mammalian cells [82], but as a result of the dense sampling, our dataset provides a highly detailed view of these initiation dynamics. In addition, because we sampled up to 5 days post-injection, we could also observe longterm responses to Imd stimulation.
First, a cluster of 13 genes showed the fastest upregulation within the first 1-2 h and remained upregulated during the entire five-day time course (Fig.  6a). This cluster contained 10 genes known to be regulated by Imd signaling (3 Attacins, 2 Diptericins, Mtk, Dro, IBIN, PGRP-SB1, and PGRP-SD), as well as CG43920, CG45045, and Mtk-like. Imd-regulation of these 3 genes has not been experimentally validated, but their co-clustering pattern suggests that they are regulated by the Imd pathway.
Second, a cluster containing Toll-regulated Bomanins and Daisho genes [83] reached its highest point of expression at 5-8 h and recovered to a baseline state after 2 to 5 days (Fig. 6b). The later initiation of Tollresponsive genes relative to Imd-responsive genes is consistent with Tanji et al. [84], who reported earlier peak expression of an Attacin and Diptericin (Imd-responsive) versus Drosomycin (mostly Toll-responsive). The up-regulation of Toll-responsive genes might be a response to the wounding that occurred during LPS injection, as Irving et al. [85] reported that septic injury with Gram-negative E. coli induced responses typical of infection by Gram-negative and/or Gram-positive bacteria, and Boutros et al. [11] reported that clean injury experiments induced a set of genes that overlapped with those induced by a septic injury experiment, but with a lower response magnitude, consistent with our observations ( Fig. 6a-b; Imd-responsive genes reach a maximum of 6-8 log 2 FC while Toll-responsive genes reach a maximum of 2.5-4.5 log 2 FC).
Third, stress response genes, among which were members of the Turandot family, reached their highest point of expression at 10-12 h, following a pattern of delayed response in line with observations by Ekengren et al. [53]. Similar to the Toll-regulated gene cluster, stress response genes returned to a baseline state after 2 to 5 days (Fig. 6c).
It is striking that the expression of Imd-regulated genes did not recover to pre-injection levels even after 5 days. Since we stimulated the Imd pathway by injecting Gram-negative peptidoglycan within commercial LPS, it is possible that the prolonged up-regulation of AMPs and other Imd-regulated genes was due to remaining peptidoglycan in the flies. Alternatively, it might be typical for these genes to be expressed at a higher level for a certain time even after the peptidoglycan has been cleared. Troha et al. [47] also observed prolonged AMP up-regulation after bacterial infection, even when levels of bacteria were below detection threshold. On the other hand, Duneau et al. [86] reported that 7 days after an initial infection with E. coli, none of the surviving flies were completely free of bacteria -suggesting suppression rather than clearance of the infection. These observations raise the question of whether one should expect to see a return to the baseline gene expression levels. Rather than returning to a pre-infection state, the prolonged up-regulation of Imd-responsive genes and the transcription factor Rel (Fig. 4a) might be required to prevent damage from dormant bacteria.

Initiation of the immune response coincides with a downregulation of metabolic processes
Our dataset showed distinct global dynamics pointing to a divergence in expression between immune and metabolic processes (both carbohydrate and lipid metabolism), with the most up-regulated pathways related to immune and stress responses, and the most down-regulated pathways related to metabolic functions (Fig. 5).
Metabolic genes reached their maximum downregulation at 5-8 h (Fig. 4b-d, 5). FASN1, which showed the strongest down-regulation in both glycogen metabolic process and triglyceride biosynthetic process (Fig.  4c-d), is a lipogenic gene whose down-regulation might indicate a need to have easily accessible nutrients instead of storing them. Indeed, infections in mammals are known to induce adipose tissue lipolysis [87] and bacterial peptidoglycan is a ligand that stimulates lipolysis as well [88]. The gene with the strongest down-regulation in the gluconeogenesis pathway was fbp ( Figure S6), which codes for fructose-1,6-bisphosphatase, the rate limiting enzyme for gluconeogenesis. This gene was significantly down-regulated in a study that reported that Listeria monocytogenes infection in Drosophila causes a decrease in energy stores, with reduced levels of triglycerides and glycogen [89]. Krejcova et al. [90] reported that an infection-induced switch to aerobic glycolysis within macrophages coincides with a systemic depletion of glycogen stores and increased blood sugar levels. The divergent dynamics detected in our dataset are thus in agreement with known individual mechanisms characterized in the immune response.
We also observed that expression of metabolic genes and pathways recovered quickly to a baseline state around 12-24 h after Imd stimulation (Fig. 4b-d, 5). The speedy recovery of metabolic genes, despite sustained expression of Imd-regulated immune response genes, suggests that the early stages of infection likely involve the greatest trade-offs, at least in response to the commercial LPS injections. These dynamics might differ in flies infected with live bacteria and might also differ depending on the strain of bacteria [47].
We further saw implications of functional interplays using the Granger Causal (GC) network analysis. Main subnetwork components showed significant GC directional edges between down-regulated metabolic genes (such as Sodh-1, UGP, fbp, and AGBE) and up-regulated genes with cell proliferation and repair functions (Claspin, LpR2, and Orc1) (Fig. 9). These results further suggest an underlying interplay between metabolic pathways and proliferation and repair mechanisms such as regulation of DNA replication stress, endocytosis, and clot formation. While functional genetics studies have demonstrated such trade-offs previously [3,91], our dataset reveals the extent and dynamics of these tradeoffs on a genome-wide scale.

Predicting function by association
Using clustering analysis, we identified several genes that lack a well-established function, and that clustered tightly with well-studied genes. We can use this coclustering to suggest shared functions [92].
Temporal clustering analysis identified Mtk-like, CG43920, IBIN, and CG45045, which shared similar expression dynamics with Imd-regulated AMPs (Fig. 6a). Suggesting AMP-like functions to these less characterized genes can be supported by observations from literature: All four genes were previously found to respond to infection [47,57], and Mtk-like and CG43920 have been shown to encode small proteins predicted to be cationic [93], properties shared by known AMPs [94]. IBIN and CG45045 were also predicted to physically interact with antimicrobial peptide transcripts [93]. On the other hand, Valanne et al. [46] found that IBIN overexpression increased levels of hemocytes and hemolymph glucose, suggesting that IBIN functions as a link between immunity and metabolism, instead of acting as an AMP itself.
The dense sampling nature of this time course allowed us to discern the clear cycling patterns of differentially expressed genes such as period, timeless, takeout, vrille, and cryptochrome, all of which have well-characterized circadian rhythm functions [38,39,78,95,96]. Clustered with these genes, our analysis identified other genes that showcase cyclic behavior but are not canonically circadian-associated genes. This includes eight genes which do not have assigned circadian functions but do have some evidence of cyclic behavior in previous literature. It also includes ten genes that had not been reported to exhibit any cyclic expression before this study ( Table 1). The identification of canonical circadian rhythm patterns both validates our methods of data normalization and differential expression analysis, and increases the certainty that we are accurately profiling novel temporal dynamics. It is important to note, however, that proper validation of the cycling behavior of our novel cyclic genes should be performed under normal Drosophila conditions, as we do not know whether Imd stimulation affected their expression.
Overall, we were able to implicate these uncharacterized genes as potential members of specific functional pathways due to the strong similarity of their expression dynamics. This is impactful both in the functional implication of these genes, but also in demonstrating the potential of this "guilt-by-association" method to assign putative function to other uncharacterized genes through RNA expression time-course experiments.

Limitations and future steps
Limitations of our experimental design can be used as a guide to design future experiments. This time-course design lacks time-matched controls to account for expression changes associated with phenomena outside the Imd stimulation, such as aging. However, it is still highly valuable to develop and improve methods for analyzing time-course transcriptional data lacking time-matched controls, since these methods are needed to analyze processes such as development, where such controls are inherently not possible.
This study also lacks a control for the wounding injury caused by the injection itself. As discussed above, specific expression patterns such as those from the Tollresponse genes could be caused by the injection wounding rather than the Gram-negative peptidoglycan in commercial LPS. Another important experimental design aspect for time series is choosing the time frame within which sampling needs to occur, which can be difficult to establish in advance. In our study, Imd-regulated gene expression was sustained until 5 days after infection, and a more prolonged sampling would have been necessary to determine the full duration of these genes' up-regulation after Imd stimulation.
Further, our experiment sampled only males and future experiments would be needed to make direct comparisons between male-and female-specific responses to Imd challenge. In addition, multiple studies have reported a trade-off between immunity and reproduction, which occurs in singly-mated females [9] and multiply mated males [97]. Mating activity was outside the scope of our design but can be included in future designs.
In our dataset, Granger causality analysis excelled at showcasing the relationships between divergent gene pairs, but was overly sensitive to the extreme temporal correlation between large groups of genes when analyzing positive edges. To avoid a prohibitively dense network for analysis, we relied on heuristic network trimming criteria, which was effective, but is likely not generalizable to other similar experiments. Developing co-integration methods that take into account the specific bias found in high-dimensional RNA-seq datasets would provide a more robust statistical analysis of the causal relationships observed in this type of data. The biological interpretation of relationships identified using Granger causality also warrants further investigation: Granger causality was successful at identifying what are likely the downstream results of divergent regulation, and it was successful at identifying positive lead-lag relationships between genes that likely respond to similar signals, but might differ in their exact (post-)transcriptional control. However, these statistical causal relationships provide only hypotheses that should be tested with direct experimental disruptions of a system to demonstrate biological causality.

Conclusions
Our combination of analytical methods provides robust profiling of the innate immune response in Drosophila melanogaster after Imd stimulation at the highest temporal resolution to date, and serves as a proof of concept for high-density time-course RNA-seq analyses in other systems. Further, it motivates innovation in computational and statistical methods for longitudinal omics data, both to account for their inherent highdimensionality and the complex underlying architecture that contains both causal and spurious coordination. Specifically, the development and application of multivariate Granger causality analysis highlights the potential of time-course data to evaluate coordinated gene expression changes through lags and trade-offs. While the immune response in D. melanogaster has been well studied, our research using dense time-course gene expression data reveals genome-wide dynamic expression patterns at higher temporal detail. Specifically, we reveal responses to Imd stimulation with divergent initiation and resolution dynamics, cyclic patterns of gene expression, and patterns of co-regulation and functional tradeoffs, while also assigning putative gene functions to uncharacterized genes through a temporal guilt-byassociation method.

Fly lines, injections, and sample collection
Male adult Drosophila of about 4 days old were sampled from an F1 cross from two Drosophila melanogaster Genetic Reference Panel (DGRP) lines: line 379, which was shown to have low bacterial resistance, and line 360, which has high bacterial resistance [98]. Offspring from these two DGRP lines were used to allow investigation of allele-specific expression after Imd stimulation. Analysis indicated significant allele-specific differential expression across multiple loci, including loci that contain immune genes, when aggregating data across all time points (data not shown). This may reflect differential rates of transcription initiation, processivity, RNA processing, or decay. However, allele-specific expression did not significantly change over time after infection (data not shown). Because of the absence of time-dependent effects, this analysis was not included in the manuscript.
Flies were kept on a 12:12 dark-light cycle, on standard yeast/glucose food. Flies were injected in the abdomen with 9.2 μl of commercial lipopolysaccharide (LPS) (Escherichia coli 055:B5 Sigma) derived from the outer membrane of Gram-negative bacteria. Flies were injected using a Nanoinjector (Nanoject II, catalog #3-000-204, Drummond), which allows high-throughput fly injections with a constant injection volume. Injections were performed in the abdomen, as it has been shown to be less detrimental to the fly compared to thorax injury [99].
Flies were sampled for a total of 21 time points throughout the course of 5 days, which included an uninfected un-injected baseline sample as control at time zero, and 20 time points after LPS injection (1, 2, 3, 4, 5, 6, 8, 10, 12, 14, 16, 20, 24, 30, 36, 42, 48, 72, 96, 120 h). This sampling was performed twice, using flies from the same stock, on two consecutive days. Therefore, all time points have two replicates, giving a total of 42 samples. Since flies were sampled from the same stock for both replicates, we consider the second replicate to control for any effects of the injection technique. This sampling strategy was informed by experimental data and time series theoretical analysis that show that under reasonable assumptions, sampling time points at higher resolution is preferred over having more replicates [100], an important strategy to consider when having a limited experimental budget.
During collection, a group of~10 pooled flies corresponding to the sampled time point were flash frozen in dry ice and stored at − 80 C for later RNA extraction.

Experimental validation using qPCR
The Imd inducibility of commercial LPS was confirmed using qPCR. Adult male Drosophila were injected with 9.2 μl or 40 μl of 1 mg/mL LPS and flash frozen at 8 and 24 h for RNA extraction. Uninfected un-injected flies were used as control. Each sampled time point consisted of a group of~10 pooled flies. Each sample had two replicates. Genes AttA and DptB were measured to confirm Imd stimulation. Gene Rp49 was used as a baseline for expression normalization. Results showed a significant up-regulation of AttA and DptB at both volumes (9.2 μl and 40 μl) for both time points (8 and 24 h). We decided to use 9.2 μl so as to cause the least amount of disruption to flies during infections, while still eliciting an immune response.
RNA extraction, RNA sequencing, and quality control filtering RNA extraction was performed using Trizol (Life Technologies) following the manufacturer's instructions. cDNA libraries were prepared using the TruSeq RNA Sample Preparation Kit (Illumina). RNA purity was assessed using a Nanodrop instrument. RNA concentration was determined using a Qubit (Life Technologies) instrument. Sequencing was performed on an Illumina Hi-Seq 2500, single-end, and a read length of 75 bp, at Cornell Biotechnology Resource Center Genomics Facility.
Samples had an average of 24.8 M raw reads. Samples went through quality control using FastQC (v0.11.5). Truseq adapter sequences were removed from any sample that showed any level of adapter contamination using cutadapt (v1.14). Low quality bases in the beginning and end of the reads were trimmed using fastx_trimmer (v0.0.13, http://hannonlab.cshl.edu/fastx_toolkit/). Reads were mapped to the Drosophila melanogaster genome (r6.17) using STAR (v2.5.2b). BAM files were generated using SAMtools (v1.3.2). Only one sample (4B, at 3 h) out of the original 42 failed to pass the quality thresholds, and all subsequent analysis used the remaining 41 samples. An average of 92.97% reads per library mapped uniquely to the Drosophila melanogaster genome. We ended up with an average of 23.4 million uniquely mapped reads per library.
Reads mapping to genes were counted using the R package GenomicAlignments. Genes with zero counts across all samples were removed (923 genes out of 17,736). Samples were normalized to library size. A "+ 1" count number was added to all genes before performing log 2 transformation, to make sure values after transformation are finite, and stabilize the variance at low expression end. After normalization and log 2 transformation, only genes with more than 5 counts in at least 2 samples were kept (removing 4156 genes). We ended up with 12,657 genes for downstream analysis.
A heatmap of the row Z-scores of normalized counts for all 12,657 genes indicated that sample 6A (5 h after commercial LPS injection) was an outlier for a subset of the 12,657 genes ( Figure S12A), even though sample 6A did not appear as an outlier using Principal Component Analysis (see below and Fig. 1b). We identified outlier genes in sample 6A by subtracting replicate A row Zscores of normalized counts from replicate B. While for most time points the difference between samples A and B varied between − 4 and 4, samples 6A and 6B demonstrated larger differences for 1439 genes ( Figure S12B). These genes were enriched for GO terms related to neuron signaling and development (based on PANTHER GO statistical overrepresentation test with FDR 0.05). Only 40 of the 1439 genes were annotated as immune genes by Early et al. [31], and none of the 1439 genes were among the DE genes detected as differentially expressed using pairwise comparisons or spline fitting. Library sizes for samples 6A and 6B were similar (26,256,507 for replicate A and 22,980,006 for replicate B). We think the difference between the replicates at 5 h post-injection could be caused by unknown variation in the flies' environment. It did not influence our data analysis and conclusions, but it is necessary to be aware of when using this dataset for other analyses.

Principal component analysis
Principal components analysis (PCA) was performed using function plotPCA from the R package DESeq2 [15] after regularized-logarithm transformation of raw counts, using the design~time + time:time to create the DEseqDataSet. Genes with zero counts across all samples were first removed. The default number of 500 top genes with highest row variance was used to calculate the principal components.

Differential expression analysis
In order to identify genes that had differential expression over the time course, we adopted the linear model-based methodology proposed in Law et al. [18] and available in the R package limma. We first transformed the normalized RNA-seq read counts (before log 2 transformation) using the voom transformation, which estimates the heteroscedastic mean variance relationships of log-counts and adds a precision weight to each observation to make them amenable to the usual linear modeling pipelines that rely on normality. We used gene-wise linear models to fit cubic splines (with 3 degrees of freedom) with time, TMM normalization method [101], and standard empirical Bayes F-tests to select genes whose expression levels were significantly altered across the time course in both replicates.
We also fit 3 degree polynomials across the first 48 h using the R package maSigPro [19,20]. We used default parameters, including counts = F to model the data based on a normal distribution, since we ran maSigPro on counts that were normalized using Limma-Voom. We selected 169 genes as significantly time dependent if they had alfa (Benjamini-Hochberg corrected) < 0.05 and a goodness of fit R 2 value of at least 0.6.
Next, we checked for differential expression of every gene between time point 0 (control) and time point t, for t = 1, 2, …, 48 h. For each test, a multiple testing correction at 5% False Discovery Rate (FDR) using the Benjamini-Hochberg method [30] was adopted. Venn diagrams to compare results were adapted from those generated using web tool Venny (http://bioinfogp.cnb.csic. es/tools/venny/).

Functional annotation
Gene Ontology (GO) enrichment analysis was performed using PANTHER Statistical Overrepresentation Test (http://pantherdb.org/, v14.1, released 2019/04/29) [102] using default settings ("GO biological process complete" as annotation dataset, Fisher's Exact test, FDR < 0.05). Transcription factors were identified among differentially expressed genes based on a list of 753 putative site-specific transcription factors available via the FlyTF database (v1) [32]. Gene set analysis was done using the R package GSA, which uses a Gene Set Analysis algorithm [103] that improves the GSEA algorithm [104] by allowing testing for associations between gene sets and time-dependent variables [103,105]. Gene set membership was assigned from GO data downloaded from FlyBase.org in January 2019 (version FB2019_01). Normalized counts for both replicates at each time point from 1 to 120 h were compared against both control replicates (0 h), using a two-class paired vector (− 1, 1, − 2, 2) which corresponds to (control_replicateA, time-pointX_replicateA, control_replicateB, timepointX_repli-cateB). We used 100,000 permutations to estimate false discovery rates. Only pathways with P-values below 0.05 and with 5 or more genes from our full dataset were kept. A subset of most relevant pathways was compiled by selecting pathways that had more than one gene from the subset of 551 most predominant time-dependent genes, and had a score of 2.5 or more in at least one time point from 1 to 48 h. This gave us 41 unique pathways as shown in Fig. 9.

Cluster analyses
Hierarchical clustering of 91 core genes was performed using R package hclust, using Euclidean correlation as a distance metric. Hierarchical clustering of 551 predominant time-dependent genes was done using the default Pearson correlation with R package pheatmap. Cluster membership assignment and mean patterns of expression across time for genes within each cluster was performed as done in White et al. [26]. Temporal clustering was performed using the R package TSclust [106]. Normalized counts of both replicates were clustered using dissimilarity measures from Autocorrelation-based method (ACF), which computes the dissimilarity between two time series as the distance between their estimated simple autocorrelation coefficients [107]. This method was used with a P-value cutoff of 0.05, and only top 1% correlation edges were further explored.
Cyclic gene patterns were identified using the JTK_ Cycle algorithm [43] available in R package JTK_Cycle. Nine regularly distributed time points were subset from both replicates every 6 h (0, 6, 12, 18, 24, 30, 36, 42, 48 h). The time point corresponding to 18 h was approximated by averaging normalized gene counts between time points 16 and 20 h. We looked for rhythms between 18 and 30 h (4 to 6 time points) with a cutoff of BH Q-value < 0.05 and amplitude > 0.5.

Network inference
Granger causality-based methods [22] were used to construct putative interaction networks among genes in the form of directed graphs with individual genes as nodes. A directed edge from gene A to gene B is added if the time course of gene A Granger-causes the time course of gene B. The notion of 'Granger causality' is popular in learning lead-lag relationships among two or more time series. Formally, if the time series of gene A, given by x t , has some power in predicting the expression of gene B at time t + 1, called y t + 1 , over and above y t and conditioned on an information set I t , then gene A is said to exert a Granger causal effect on gene B. Bivariate Granger causality uses a small information set I t = {x 1 : t , y 1 : t } and captures Granger causal relationship from gene A to gene B by testing whether the regression coefficient β in the following bivariate regression is different from zero: y tþ1 ¼ α y t þ β x t þ error tþ1 A master set of 258 genes was constructed from the 551 predominant time-dependent genes by picking those that had available functional annotation and that had differential expression of at least 2 log 2 fold change. Using linear regression (function lm() in R), we conducted bivariate (pairwise) Granger causality tests for every pair of genes among this set of 258 genes using data on sliding windows of t = 6 consecutive time points and the two replicates (sample size = 12), and ranked them in order of increasing P-values (BH method used for calculating FDR), keeping the top resulting edges (BHFDR < 5%).
A well-known critique of bivariate Granger causality is its use of a small information set that does not contain any other factors except genes A and B [108]. This failure to account for other potential confounding variables can give rise to many spurious edges in our network [108], where Granger causal effects from gene A to gene B is an artefact of gene C, which is causal for one or both genes. To address this, we adopted multivariate (or network) Granger causality [109], allowing us to avoid such spurious inferences through multiple linear regression. In this framework, we start with p genes, and Granger causal relationship of Gene A on Gene B is tested by regressing y t + 1 on y t , x t and the time courses of the other p -2 genes z 1t , z 2t , …, z pt . y tþ1 ¼ α y t þ β x t þ γ 1 z 1t þ γ 2 z 2t þ … þ γ p−2 z p−2;tt þ error tþ1 For small sample size and large p, the above regression is not possible to run using ordinary least squares (OLS), so we use LASSO [61] regression. To test if the regression coefficient β in the above regression is different from zero, we used two different variants of de-biased LASSO [62,63], each of which corrects the bias of lasso and allows quantifying uncertainty of regression coefficients one at a time. A non-zero coefficient in the above multivariate regression suggests that gene A is Granger causal for gene B, even after accounting for the effects of the other p-2 genes. Using this method on the master set of 258 genes, we reconstructed putative directed networks of multivariate Granger causality and ranked the edges in increasing order of P-values, following the same parameters used in the bivariate (pairwise) Granger causality method (sliding window of 6 consecutive time points in both replicates, keeping the top resulting edges (BHFDR < 5%)).
Additional file 2 Table S1. List of differentially expressed genes identified using spline fitting and pairwise comparisons.
Additional file 3 Table S2. Genes that encode transcription factors and respond to commercial LPS injection.