Diverse biological processes coordinate the transcriptional response to nutritional changes in a Drosophila melanogaster multiparent population

Background Environmental variation in the amount of resources available to populations challenge individuals to optimize the allocation of those resources to key fitness functions. This coordination of resource allocation relative to resource availability is commonly attributed to key nutrient sensing gene pathways in laboratory model organisms, chiefly the insulin/TOR signaling pathway. However, the genetic basis of diet-induced variation in gene expression is less clear. Results To describe the natural genetic variation underlying nutrient-dependent differences, we used an outbred panel derived from a multiparental population, the Drosophila Synthetic Population Resource. We analyzed RNA sequence data from multiple female tissue samples dissected from flies reared in three nutritional conditions: high sugar (HS), dietary restriction (DR), and control (C) diets. A large proportion of genes in the experiment (19.6% or 2471 genes) were significantly differentially expressed for the effect of diet, and 7.8% (978 genes) for the effect of the interaction between diet and tissue type (LRT, Padj. < 0.05). Interestingly, we observed similar patterns of gene expression relative to the C diet, in the DR and HS treated flies, a response likely reflecting diet component ratios. Hierarchical clustering identified 21 robust gene modules showing intra-modularly similar patterns of expression across diets, all of which were highly significant for diet or diet-tissue interaction effects (FDR Padj. < 0.05). Gene set enrichment analysis for different diet-tissue combinations revealed a diverse set of pathways and gene ontology (GO) terms (two-sample t-test, FDR < 0.05). GO analysis on individual co-expressed modules likewise showed a large number of terms encompassing many cellular and nuclear processes (Fisher exact test, Padj. < 0.01). Although a handful of genes in the IIS/TOR pathway including Ilp5, Rheb, and Sirt2 showed significant elevation in expression, many key genes such as InR, chico, most insulin peptide genes, and the nutrient-sensing pathways were not observed. Conclusions Our results suggest that a more diverse network of pathways and gene networks mediate the diet response in our population. These results have important implications for future studies focusing on diet responses in natural populations.


Background
Individuals can withstand changing nutritional conditions by flexibly adjusting the allocation of resources to competing life history traits, allowing populations to adapt and thrive. Individual ability to partition available nutrients and optimize fitness gains requires complex cooperation at multiple levels of functional and structural organization in tandem with prevailing conditions dictating nutrient availability. Changes in diet are associated with many phenotypic changes across the tree of life. For example, in many metazoan species, moderate nutrient limitation extends lifespan and delays agerelated physiological decline [1][2][3][4]. In fluctuating resource conditions, this effect, in which the individual often shifts nutrients away from reproduction and towards somatic maintenance and repair may be adaptive, ensuring survival in bad conditions and reproduction when good conditions return [5,6]. On the other hand, constant dietary excess such as diets high in sugar, promote hyperglycemia in many genetic backgrounds, accelerate the rate of aging, and reduce lifespan [7][8][9][10].
A large and growing body of literature points to endocrine pathways being involved in nutrient perception and balance in order to coordinate organismal response to diet change. Nutrient sensing pathways are associated with aging and longevity from yeast to mammals [11][12][13][14], reviewed in [15][16][17][18][19]. The insulin/insulin-like signaling (IIS) together with the target of rapamycin (TOR) are among the most studied pathways. These pathways jointly regulate multiple metabolic processes affecting growth, reproduction, lifespan, and resistance to stress [20][21][22]. In insects, IIS/TOR signaling determines body size by coordinating nutrition with cell growth, and steroid and neuropeptide hormones to cede feeding when the target mass is attained [23]. Mutations, including experimental gene knockouts, that reduce IIS/TOR signaling reduce growth and reproduction, and increase stress resistance and lifespan [12,24,25], and apparently coordinates nutrient status with metabolic processes. For example, lack of nutrients blocks insulin production [26] and mimics the effects of a down-regulated IIS/TOR [27], while a hyperactivated IIS/TOR pathway can exclude the necessity for nutrients [27]. Fruit flies raised on excess sugar diets as larvae become hyperglycemic, fat and insulin resistant, and show increased expression of genes associated with gluconeogenesis, lipogenesis, β-oxidation, and FOXO effectors [8,9]. Additionally, modulating TOR signaling slows aging by affecting downstream processes including mRNA translation, autophagy, endoplasmic reticulum stress signaling, and metabolism (reviewed in [28]) .
Specific examples on the role of nutrient sensors abound in literature. Briefly, the forkhead transcription factor foxo in Drosophila melanogaster (D. melanogaster) and foxo orthologs in the nematode Caenohabditis elegans (daf-16) and vertebrates (FoxO) is the main transcription factor target of IIS/TOR, and is required for lifespan extension by a reduced IIS, reviewed in [18]. An activated foxo represses production of insulin-like peptides (ILPs) which in turn reduces IIS signaling [29,30]. In a related mechanism, resveratrol-mediated activation of sirtuin genes mimic the effect of dietary restriction and promote lifespan in many metazoan species [1]. For example, in the cotton bollworm Helicoverpa armigera, Sirt2 extends lifespan by its role in cellular energy production and amino acid metabolism [31,32]. Further, the regulation of appetite which has a major effect on plastic nutrient allocation (reviewed in [33]), depends on leptin signaling together with the AMP-activated protein kinase (AMPK), influencing nutrient intake and subsequent production of ILPs [34][35][36]. Lastly, the hormones ecdysone and juvenile hormone also bear on the IIS to regulate ovary size and influence dispersal-reproduction trade-offs in D. melanogaster and sand crickets, Gryllus firmus, respectively [21,[37][38][39][40], reviewed in [33]. In spite of these and other examples that demonstrate the effect of genetic variation on the metabolic response to nutrition, the underlying genetic basis diet effects in natural populations remain elusive [41].
Much of the current focus on how endocrine mechanisms affect phenotypic response to nutrition proceed in one-gene-at-a-time knockout strategies to elucidate function. This approach has been informative, largely in model species, but also supported to some extent in wild species. Endocrine pathways have been shown to affect plastic and adaptive resource allocation in wild D. melanogaster [42,43], sexual selection of horn size in rhinoceros beetles [44], sex-specific mandible development in staghorn beetles [45,46] and morph determination in wing dimorphic sand crickets [38,[47][48][49], leading to the conclusion that endocrine pathways mediate the evolution of resource allocation strategies [50][51][52]. However, natural populations have not consistently revealed these same genetic mechanisms [53][54][55][56] suggesting that large effect studies in mutants capture only the tails of effect distributions that occur in the wild [57], or that different mechanisms overlapping with endocrine pathways may be involved [58,59], reviewed in [33]. This disconnect means that our understanding of the specific genetic mechanisms that govern the response to diet in natural populations remains limited.
The majority of the studies that have characterized changes in gene expression in response to diet have controlled for the genetic background by using one or a few inbred lines [60][61][62]. However, previous studies have shown that different inbred lines can vary widely in how they respond to diet changes [61,63,64], meaning that the findings from a single genotype could represent a highly specific response and thus not be broadly applicable. One approach to improve the chances that ecologically relevant mechanisms are identified is to start with experimental panels that include greater levels of standing genetic diversity available in a species in the wild. Multi-way advancedintercross populations founded from multiple geographical inbred lines (i.e. multiparent populations -MPPs) typically integrate a greater subset of genetic diversity, and increase the ability to identify genetic variants underlying complex traits. These resources have gained traction in the past two decades in both plants and animals for the purposes of genetic mapping [65][66][67][68][69][70]. A study characterizing the overall transcriptional response to diet in a multiparent population would better capture the average response of the population and have the potential to be more broadly applicable than those characterized by only a few genotypes. In addition, MPPs are being used widely to map different complex traits, including responses to nutrition, and gaining a more complete picture of the changes in gene expression with diet could help identify possible candidate genes underlying mapped QTL in those studies.
In this study, our goal is to understand the transcriptional response in different nutritional environments in an outbred multiparent population of D. melanogaster. We use an admixed population derived from the Drosophila Synthetic Population Resources (DSPR). The DSPR is a large two-replicate set of advanced recombinant inbred lines (RILs), each derived from 8 inbred lines originating from several continents. The promise of this resource over traditional laboratory populations for characterizing the genetic mechanisms for complex traits is discussed in depth elsewhere [71,72]. We analyze RNA-seq data sequenced from pooled samples of female D. melanogaster exposed to multiple diet conditions differing in the proportion of protein and carbohydrate sources: dietary restriction (DR), control (C) and high sugar (HS). Here, we profile gene expression for three tissues: heads (H), bodies (B) and ovaries (O), in high replication, and ask: 1) How does gene expression change in response to nutritional environment? 2) What specific biological processes and pathways are significantly perturbed by diet treatment? 3) Which sets of genes show similar expression patterns across diets and tissues, and what biological processes are involved in these specific patterns?

Global expression patterns
We use a replicate population of the DSPR comprising >800 RILs. This population was developed from eight inbred founder lines that have been fully genetically characterized (full sequences, the haplotype structure inferred,~1.2 million SNPs identified, and the RILs genotyped at >10,000 SNPs). We generated a single outbred panel from 835 RILs by intercrossing the lines for five generations. Resulting flies were reared on three experimental diets (DR, C, and HS) for 10 days post-eclosion before isolation of total RNA from pools of 100 female fly tissues (head, body and ovary pair) in six replicates for each tissue-diet combination (Fig. 1). These 54 RNA samples (18 for each diet) were sequenced single end, generating a total of 35,572 transcripts, out of which 18,678 remained for analysis after removal of transcripts with a variance across samples of less than one [73]. Overall expression levels were generally consistent across diet treatments and tissues (Fig. 2). One sample (bodies, B2) in the DR treatment showed slightly lower median expression compared to the rest, but was similar enough to the others and was retained in the analysis.
To assess global expression patterns over tissues and diets we performed principal components analysis (PCA) on all samples using an expression matrix from which batch effects had been removed (Fig. 2). A similar figure prior to batch removal is shown in Additional file 1. As expected, tissue effects strongly dominated variance in the first two components which jointly accounted for 94% of the total variance. PC1 which explains 65% of the variance in expression presents non-overlapping separation of tissue expression, although body and head expression appear somewhat similar compared to the ovaries. PC2 (29%) distinguishes expression in bodies from that in heads and ovaries.

Differential gene expression in response to diet
We used DESeq2 to quantify differential gene expression in head, ovary and body samples obtained from adult flies held on C, DR, and HS diet treatments. We obtained lists of genes significantly differentially expressed due to the main effect of diet. After filtering out genes with a low overall count, a total of 12,614 genes remained in the experiment based on which we report all subsequent results. Of these, 2475 genes (19.6%, Additional file 2) were differentially expressed in response to diet treatment, and 978 (7.8%, Additional file 3) for the interaction between diet and tissue (LRT, P adj < 0.05). The overall expression differences are visualized for each tissue and diet pair in Fig. 3. Overall, relative to the C diet, many genes in all organs were expressed in the same direction in the DR and HS diets, meaning that the genes that have increased expression in the DR diet tend to also have increased expression in HS, and vice versa. This is indicated by the positive relationship between the fold changes for each of these diets (bodies: r = 0.64; heads: r = 0.59; ovaries: r = 0.59) and the proportion of genes that trend in the same direction for these two diets (i.e. number upregulated in both + number downregulated in both divided by the total number of genes; bodies: 0.70; heads: 0.82; ovaries: 0.66). However, this observed relationship between fold changes could be a result of comparing two ratios that are both calculated relative to the same reference diet (C), as randomly generated data will produce a positive relationship between these quantities and greater than 50% would be expected to show a fold change in the same direction. Several lines of evidence suggest this trend is biologically meaningful and not simply a result of comparing ratios. First, PCAs performed for each tissue separately show that clusters for DR and HS diets overlap for both bodies and heads, while the C diet forms its own cluster (Fig. 4). For ovaries, all three diets form separate clusters. Second, we calculated fold changes using both other diets as the reference diet and compared the correlation and proportion of genes trending in the same direction. In all cases, the correlation we observe between the DR and HS fold changes relative to C are higher than the correlations we observe for the other pairs of diets (Additional file 4). This also held true when comparing the proportions of genes that trend in the same direction for bodies and heads. In ovaries, the highest proportion trending in the same direction was observed for HS and C relative to DR (Additional file 4). Third, we performed 100 permutations of our expression data randomizing across the diets but constraining this to two randomly selected samples from each diet to ensure we obtained null datasets with no expectation of a diet effect and calculated pairwise fold changes, which allowed us to calculate empirical p-values (see Methods for details; Additional file 1). Only the comparison between DR and HS showed significant relationships, with no other comparison yielding a p-value less than 0.1 for either the correlation or the proportion trending in the same direction (Additional file 4. For heads, the proportion trending in the same direction is significantly greater than expected by chance (empirical p = 0.01). For ovaries, the correlation is significantly greater (empirical p = 0.04) and for bodies, the correlation is marginally significant (empirical p = 0.08). This general trend suggests a similar change in global transcription pattern in response to both the DR and HS diets relative to the C diet, despite their very different compositions by weight and subsequently their caloric content. Further, the 2475 DEGs for the main treatment effect were distributed across all diet-tissue combinations (Fig. 5), making it challenging to narrow down to a smaller list of genes for further examination.

Gene set enrichment analysis
We performed gene set enrichment analysis (GSEA) on the significantly differentially expressed genes (i.e. 2475 DEGs) for the main effect of diet, using the fold changes for each diet-tissue combination to identify pathways and gene sets which were significantly perturbed relative  to all DEGs in the model. Of these pairwise comparisons, only DR versus HS in bodies and DR versus C in bodies showed evidence for significantly enriched gene sets/pathways at an FDR P adj. < 0.05 (Benjamini & Hochberg procedure). We identified four pathways showing gene set level changes for bodies in DR relative to HS: Metabolic pathways (two-sample t-test, mean change = 5.38, FDR = 2.94e − 06 ), Carbon metabolism (two-sample t-test, mean change = 3.31, FDR = 2.26e − 02 ), Oxidative phosphorylation (two-sample t-test, mean change = 2.95, FDR = 4.52e − 02 ), and Protein processing in endoplasmic reticulum (two-sample t-test, mean change = 2.83, FDR = 4.52e − 02 , Additional file 1). Notably, metabolic pathways (dme01100), which was most significantly enriched, is a large group of pathways in the KEGG database (https:// www.genome.jp/kegg-bin/show_pathway?dme01100). At the default threshold (FDR P adj. < 0.1) in GAGE, ten more pathways appeared for DR relative to HS in bodies (Additional file 5). These additional pathways encompass three main metabolic themes: carbohydrate, amino acid and protein, and drug/xenobiotics. For the comparison of DR vs C in bodies, oxidative phosphorylation (dme00190) was significantly enriched (two-sample ttest, mean change = 3.2, FDR P adj. = 7.36e − 02 ).
Further, we examined GO term gene set enrichment for biological process (BP) to capture significant dietrelated differences occurring below the level of pathway. Four terms were enriched at an FDR P adj < 0.01. Small molecule metabolic process was enriched for the DR vs HS comparison in bodies (mean change = 4.49; P adj = 5.84e − 3 ). Cell communication (mean change = 5.10; P adj = 1.83e − 4 ), signaling (mean change = 5.06; P adj = 1.83e − 4 ), and signal transduction (mean change = 4.56; P adj = 1.37e − 3 ) were all enriched for the HS vs C comparison in heads. At an FDR P adj. < 0.05, 41 unique enriched terms were observed, of these, 34 terms were enriched for HS relative to C diet in heads (Additional file 5). These terms highlighted a broad range of  themes including signaling, metabolism, growth, cytoskeleton, gene expression and development. Three terms were enriched for HS relative to C in bodies, including cell communication, signaling, and system process. The remaining six terms were all for the HS diet relative to DR in bodies, all within one theme of metabolism (acid, small molecule, carbohydrate). No terms were enriched for the comparisons in ovaries. To understand broader inclusive processes represented by these GO terms, we evaluated our list for ancestral terms using QuickGO (EMBL-EBI https://www.ebi.ac.uk/QuickGO/). Nine ancestral terms at the same hierarchical level immediately below category BP were observed (metabolic process, biological regulation, cellular process, localization, response to stimulus, cellular component organization, multicellular organismal process, growth, and developmental process). Among these, metabolic process, cellular process, and developmental process had the most connections to child terms. Our GSEA analysis therefore highlights multiple pathways and biological processes Volcano plots (a-i) for differentially expressed genes showing genes with large fold changes that are also statistically significant. Horizontal lines indicate -log 10 (P adj. ) = 0.05, and points above the line represent genes with statistically significant differential expression. Vertical lines differential expression with the value of log 2 fold change of 1 (i.e. absolute fold change = 2) and FDR = 0.05. Upregulated and downregulated genes are on the right side and left side of the vertical lines, respectively, and statistically significant genes are above horizontal lines. Rows in the panel top to bottom are bodies, heads, and ovaries; columns left to right are DR vs C, HS vs C, DR, vs HS; color of points represent log 10 of base mean expression triggered by diet changes, especially in bodies and heads, and encompassing broad themes from metabolism to signaling to homeostasis, but none of the canonical nutrient sensing pathways such as IIS/TOR and FOXO signaling pathways. Notably, our results do not show particular enrichment of diet-associated terms in ovaries, at least for biological processes.

Diet-induced gene coexpression
Next, we asked how diet treatment affected the correlation patterns among genes (i.e. co-expression) across samples. To identify sets of genes that are highly correlated in their expression patterns (or modules), we performed hierarchical clustering on a batch-controlled, rlog transformed expression data including all replicate samples over all treatments using WGCNA [74]. We first computed a matrix of pairwise correlations for all genes on which we performed hierarchical clustering to produce module assignments. We then used a resampling procedure to determine if genes were correctly assigned to modules (see Methods for details and literature). Setting the minimum module size to 30 genes, a total of 31 modules were detected (range gene number 39-3240), with 1049 unassigned genes (grey module). After merging highly similar modules (i.e. eigengene correlation r > 0.9, see methods), 21 modules were identified with an additional module holding all unassigned genes (Additional file 5).
To appreciate module-level effects of diet and tissue on coexpression, we visualized eigengene expression across diets (Fig. 6, Additional file 6). It is clear from these plots that some modules showed greater diet by tissue interaction effects than others (e.g. e, f, m, q, s and v). These modules show either reduced or increased expression for one or two tissues in one or two diets. To gain better insight into these intra-modular effects of diet and diettissue interaction, we fit an analysis of variance model (ANOVA) to module eigengenes. For the main effect of diet, all modules turned up significant (FDR P adj. < 0.05), except modules c (Fig. 6). Similarly, for the effect of the interaction between diet and tissue, all modules showed a significant effect (FDR P adj. < 0.05), except module a.
Focusing on the modules showing a statistically significant interaction effect, and divergent expression profiles in one or more diets for a given tissue (), several distinct patterns became apparent: 1) generally reduced expression in the DR diet for ovaries and bodies unlike the rest of diets (Fig. 6e, f, k and s), 2) increased expression in the DR diet for ovaries and bodies (i, m), 3) elevated expression in bodies in both DR and HS diets (v), and 4) different responses in all three diets (g, r). An attempt to isolate specific diet-tissue combinations driving the interaction effect using post hoc tests revealed large numbers of highly significant combinations. We therefore explored the modules further via functional enrichment to identify the processes driving these coexpression patterns.
We conducted functional analysis on all modules to identify enriched GO terms (Bonferroni corrected enrichment P values, Additional file 7). Of 12,614 Entrez identifiers in our experiment, 10,334 mapped in current GO categories (see methods), and therefore used as a background list for enrichment analysis in WGCNA. A large number of terms were obtained across CC, MF and BP categories: 658 terms (P < 0.01), and 791 terms (Bonferroni corrected P < 0.05) (Additional file 7). A visual inspection of enriched terms in the 21 robustly assigned modules confirmed a large diversity of highly significantly enriched biological processes in most modules, ranging from nuclear processes to membrane and cytosolic processes; from structural to signaling and immune response processes; and from pigmentation to homeostatic processes (Additional file 7).
The first module (Fig. 6a) which included 2956 showed 291 GO terms (Bonferroni corrected, P adj. < 0.01), and had the most significantly enriched terms (i.e. > 60 terms ranged between P adj. < e − 156 -< e − 15 ). This module was characterized by greater eigengene expression in ovaries compared to heads and bodies, although the diet effect was subtle but significant. Nuclear and intracellular organelle processes including gene expression, and RNA processing were key tissue (ANOVA, P < 2e-16) and diet (ANOVA, P < 0.002) effects independently regulated (i.e. no interaction effect). With reference to the trends described above (Fig. 6), those modules showing generally reduced expression in the DR diet for ovaries and bodies (e, f, k and s), are associated with biological processes including signaling (e, P adj. < 1.1e − 10 ), cellular component organization (k, P adj. < 5.8e − 09 ), nervous system development (f, P adj. <1.3e − 14 ), signaling and protein localization on Golgi apparatus (s, P adj. < 3.0e − 06 ). Interestingly, expression increase in DR in bodies and heads compared to ovaries is related to ubiquitindependent proteolytic processes in the proteasome (i, P adj. <1.8e − 08 ), and cytosolic vesicle transport/mitochondrial activities (m, P adj. <8.9e − 156 ). Module (v, P adj. <1.1e − 21 ) was interesting because bodies show monotonic increase in expression from C to DR to HS, a trend that may relate to the GO term chitin-based cuticle structure development (P adj. = 5.78e − 30 ), indicating cuticular remodeling in stressful diets (DR and HS), presumably to accommodate gain or loss of body mass.
Analysis of our modules therefore revealed a large number of biological processes (BP), molecular function (MF) and cellular components (CC) (Additional file 7), suggesting that response to diet changes in natural D. melanogaster involves a multi-system response rather than one or a few signaling pathways that can be very different in different tissues.

Previously implicated pathways
Several pathways have been well-studied in the context of diet-induced phenotypic changes (see Background). We specifically examined these pathways in order to characterize the transcriptional response. We obtained a precomputed list of genes for key metabolic pathways (fbgn_annotation_ID_fb_2019_02.tsv): IIS (55 genes), TOR (152 genes), and FOXO (110 genes, which includes the sirtuins). Of 317 pathway genes, 47 (14.8%) were significantly differentially expressed for the diet effect (Additional file 8), representing fewer pathway genes compared to 2475 of 12,614 of all genes, genome-wide (i.e. 19.6%). However, our GSEA results presented above did not show pathway level enrichment of any of these pathways as defined in KEGG Pathway Database (https://www.genome.jp/kegg/pathway.html), although as mentioned above, a handful of member genes were differentially expressed. These genes that were differentially expressed however, showed mixed modular membership in our clustering results. Similarly, GO analysis  Figure 7 shows pattern of expression across diets, of the insulin peptide and sirtuin genes that are widely cited to act regulate IIS/TOR and AMPK. From these results we could not link any of these pathways with particular modules within our data. Importantly, we did not find evidence for significant whole pathway perturbation of IIS/TOR and the downstream FOXO signaling in our co-expressed modules.
Parsing previously identified QTL for the response to diet A useful application of genome-wide expression data is to identify possible regulatory variants underlying QTL. In a previous study, we used a multivariate approach to identify a global expression QTL for the response to diet treatment of 52 genes in the IIS/TOR pathways [64] that we hypothesized was influencing the expression of many of the genes in these pathways. After performing a discriminant function analysis predicting diet (DR or C) from expression measured on whole flies of 52 genes in the IIS/ TOR pathways, we mapped the difference in discriminant function to identify this eQTL. The eQTL interval, as defined by the Bayesian credible interval, contains 327 genes, making it challenging to narrow to possible candidates. We therefore searched for differentially expressed genes identified in this study that fall within this eQTL region of interest. Of these, we find 49 genes are differentially expressed in the different diet treatments and 13 show a diet by tissue interaction, with 5 of these genes showing both a main treatment effect and an interaction effect (i.e. Odc1, Dgat2, CG12822, CG12159, and Obp44a, Additional file 9). The patterns of expression across tissues for this set of genes is shown in Fig. 8. While our expression results don't allow us to narrow to a single candidate, they do significantly reduce the list and provide detailed expression data for the possible causal genes.

Discussion
We sequenced pooled RNA samples from a three-diet by three-tissue by six-replicate experiment of outbred mated female D. melanogaster in the DSPR. Our aim was to understand diet-induced patterns of gene expression influencing plastic nutrient allocation in different diet conditions in a multiparent population resource. Our results suggest that: Fig. 7 Individual gene plots (a-m) representing a sample of key genes hypothesized to underlie canonical nutrient sensing pathways, particularly the IIS/TOR and sirtuin deacetylases which are thought to regulate growth, reproduction, lifespan and stress resistance in many species including D. melanogaster. Ilp -insulin peptide gene, Sirt -sirtuin gene. There are 8 insulin peptides and 5 sirtuins known in D. melanogaster 1) global expression patterns are dominated by tissue and diet-tissue interaction effects, while the effects of diet alone are subtle but significant, 2) patterns of gene expression are generally similar in low-protein and carbohydrate-rich diets relative to the control diet, 3) multiple pathways, coexpressed gene modules, and biological processes are invoked that affect transcription in different diet conditions, especially in the head and body tissues, and 4) expression results did not suggest a single regulatory variant underlying QTL, but narrowed down to a few possible causal genes. Overall, our results suggest that multiple networks are involved in phenotypic changes in response to nutrient availability, rather than just a few key genes. We advocate a broader, genome-wide approach to studying the genetic mechanisms underlying diet effects on phenotypic change.
It is established that nutrient signaling pathways and hub genes in those pathways play a crucial role in how organisms adjust to changing conditions in availability and quality of nutrients to optimize fitness traits [50][51][52]. Analysis of differential gene expression in a population presented with diets differing in nutritional richness provides an 'omic' alternative to study intermediate processes that connect genetic architectures to phenotypic outcomes such as allocation patterns. In fruit flies, studies typically focus on whole-body or head tissue transcription (e.g. [75]); one or a few gene pathways known to affect diet responses at a time (e.g. [64]); one or two diet manipulations (e.g. [75][76][77]), but scarcely integrate over more than two organs and conditions at a time, or explore expression outside a few gene pathways. Further, despite costs trending downwards recently, sequencing of more highly replicated experiments remains unaffordable for many laboratories. By and large, studies in model organisms focus on genes in a few endocrine pathways, so called nutrient sensors, as critical players in coordinating growth, reproduction, stress resistance and somatic maintenance responses to changing diet conditions. Components of the IIS/TOR, growth, and ecdysone hormones; and sirtuin deacetylases are deemed some of the major players in this respect. Our results suggest an expanded scope of mechanisms underlying flexible responses to nutrient limitation (DR studies) or oversupply (high sugar and high fat diet studies) in natural populations, which has also been suggested by [78][79][80] and reviewed by [58,81]. We discuss our major results and their implications below.
First, we observed a large global effect of tissue type and a more subtle, but significant, effect of diet treatment. Previous studies in flies have also found relatively small effects of diet on transcription. Previously, we characterized the genetic basis of standing genetic variation for 55 genes of the IIS/TOR pathway following treatment with the same diets used here [64], and found only small changes in gene expression associated with diet treatment although most of those genes were differentially expressed. Similarly, Reed et al., [82] [83] investigated expression differences in D. melanogaster fed with two different forms of sugar and found small but significant changes in gene expression. Overall, diet seems to produce fairly small magnitude changes in expression in many genes across the genome, which in concert presumably can lead to large phenotypic changes.
Secondly, comparisons between DR and HS diets relative to C revealed a similar pattern of expression. This result mirrors our earlier finding in an eQTL mapping experiment using DSPR lines in which gene expression in DR and HS relative to C generally trended in the same direction [64] . This result is in spite of the fact that the DR and HS diets lead to very different outcomes in median lifespan and fecundity in our population [64,84]. Nutritional geometry studies which measure traits in a series of concentrations of liquid media suggest that traits such as lifespan and reproduction (which differ significantly across our diets) are influenced primarily by the diet protein to carbon (P:C) ratio, not its caloric content (e.g., [8,85,86]. Thus, calorie limitation alone does not drive phenotypic patterns in these studies. While our HS diet has a high concentration of sugar per liter of food, the P:C ratio is a Fig. 8 Differentially expressed genes under a previously identified eQTL interval [64]. The patterns of expression of the genes within the Bayesian credible interval that are differentially expressed in different diet treatments in this study are shown here lot lower (i.e.~1:2 yeast to sugar ratio). Lifespan and fecundity in nutritional geometry studies are maximized at a much higher P:C ratio (i.e. 2-to 4-fold higher than our HS diet) [85,86]. Thus, our results are consistent with similarity in expression pattern between DR and HS. It is also possible that diet macromolecules serve only as a cue that lead to optimal allocation of resources in natural populations. Thus, it is possible for expression measures to trend in the same direction in DR and HS treatments. Further, Dobson et al. [87] found that excess sugar diets in young adult flies inhibited foxo and reduced survival in middle and old age. While we did not measure lifespan and fecundity here, our data showed mixed pattern of foxo activation across diet tissue combinations, and these mapped to many co-expression modules which may be due to the relatively young age (i.e. 21 days) of our experimental flies.
Thirdly, an overall take away from our GSEA on the full dataset, and also on gene modules coming from hierarchical clustering was that diet effects could not be attributed to a particular genetic mechanism. GSEA highlighted metabolism, oxidative phosphorylation and protein processing at pathway level, but showed a broad spectrum of processes in GO term enrichment encompassing metabolism, cell signaling, structural development and organization, and defence. Similarly, GO analysis of gene modules yielded a broad range of biological processes. From both these analyses, IIS, TOR, and FOXO signaling were not significantly enriched. However, several genes had significantly reduced expression in the IIS/TOR (e.g. Ilp5, Rheb, Atg1, Myc, and eIF4E1) and significantly higher expression in FoxO downstream effectors (e.g. AMPK, orct2, Gadd45, cdk2 and p38) in most DRtissue combinations consistent with indicating canonical activity. With the exception of Ilp5 [41] however, undetectable differential expression of hub genes (such as Ilp2, S6K, chico, InR, Akt1, Torc, Thor, and foxo) suggest that diet induced effects may involve many more pathways/genes than have been traditionally studied in this context. This argument is further strengthened in this study by relatively fewer genes belonging to nutrient sensing pathways, (IIS, TOR and AMPK) (i.e. 14.8%) compared to genes differentially expressed for the effect of diet (i.e. 19.6%). However, this last point should be interpreted with caution because sets of genes defined in different databases as belonging to these pathways, and their relative importance in those pathways may differ and we may not have included all genes.
Evidence in C. elegans suggest that the worm ortholog of foxo, daf-16 is not required in the DR response [88][89][90][91]. On the other hand, sir-2.1 is a worm ortholog of the fruit fly Sirt 2 (which was significantly differentially expressed in this study), and is required for lifespan extension in adult worms by diet deprivation was independent of the daf-2/insulin-like signaling [91]. In D. melanogaster, similar evidence is emerging that suggest that foxo is not required for the response to DR [92], but is involved in the normal response to DR [93]. When dFOXO was removed, DR treatment still resulted in significant lifespan extension in null flies [92]. Another study testing a novel DR assay in C. elegans found that DAF-16, but not DAF-2 (the worm lnR) was required when DR was performed on solid media, and concluded that AMPK-FOXO signaling resulted in lifespan extension on solid food [94]. Our data provides further evidence supporting these studies in suggesting a broader mechanism in which IIS, TOR and FOXO play a role, but in concert with other pathways.
A potential limitation of our study is the heterogeneity in tissue types present in our samples, which may affect the level and nature of gene expression [95]. For instance, assuming fewer cell types are available in the ovaries or head samples compared to body samples, we could expect the range of biological processes triggered by nutrient levels to scale with cell types to some degree. In addition, our use of an outbred population also means that our samples encompass heterogeneity stemming from genetic variation for the response to diet, and averaging over this heterogeneity might obscure some patterns. Ideally, to fully capture the genetic variation for the transcriptional response to diet, one would perform RNA-seq separately in hundreds of the DSPR lines reared in multiple diets, but this would obviously be a much larger experiment requiring a much larger investment of experimental resources. Further, DR and HS protocols vary tremendously across laboratories which can result in different studies detecting only subsets of the gene network distribution which responds to nutrient change [92]. Patterns of phenotypic expression (fecundity and lifespan) and patterns of gene expression have held stable across several studies using the same set of diets in our population, suggesting that the effects we find are biologically relevant in the DSPR. However, in addition to a broader view of the potential mechanisms causing phenotypic changes in response to diet, a broader consideration of different diets would also benefit the field.

Conclusion
We have characterized the genome-wide transcriptional response to diet composition in multiple tissues in D. melanogaster, providing a comprehensive picture of potential genetic mechanisms underlying phenotypic changes. We found that the general pattern of expression was similar in DR and HS diets relative to C diet, probably reflecting specific nutrient ratios. In addition, we identified a large set of co-expression networks, pathways and gene ontology terms that were enriched in response to diet. Our data did not show enrichment of canonical nutrient sensing pathways and key genes, although some genes in those pathways were significantly perturbed. Our results therefore support the hypothesis that in natural populations, multiple gene networks and pathways are invoked to respond to environmental differences in nutrient availability, and not just a few pathways as it is often assumed. Our study has potential implications for future studies focusing on the effects of stressful diets in natural populations, including our own. We therefore urge future studies to look beyond traditional genetic mechanisms governing diet effects and to move beyond the use of just a single genotype when characterizing these responses.

Experimental population
We used 835 recombinant inbred lines (RILs) of the B sub-population of the DSPR as source of our experimental population. Five young females that had mated intra-line were randomly selected from each RIL and placed in six custom-made cages each measuring 20.3 cm × 20.3 cm × 20.3 cm. Eggs were collected (twice on successive days) after 22 h of oviposition in milk bottles (250 ml) containing 40 ml media. Resulting flies were introduced to six cages in large population sizes (i.e. > 2000 per cage) and allowed to mate for five generations (21-day cycle). To ensure a genetically homogenous base population was generated, eggs from each cage equally seeded each of the 5 other cages at each generation. We refer to these six cages as the base population. All life stages during generation of the base population were reared on a cornmeal-dextroseyeast maintenance diet. Proportions of ingredients in our maintenance diet are presented in Additional file 10. Additional culture practices including equipment and supplies are described in [96].

Study design
We used one random cage from the base population for phenotyping to generate RNA-seq data we present here. This population is referred to as the "synthetic population". We adopted a factorial design comprising three dietary treatments, three tissues, and six replicates per each treatment-tissue combination. To obtain flies for each diet treatment, we placed two plates (100 mm × 15 mm, Fisherbrand®) of maintenance diet in the synthetic population and allowed for 24 h of oviposition, repeating egg collection three times. From the two egg plates, a thin slice of media bearing 50-90 eggs (visually estimated) was cut out and grafted to each of 60 new vials (25 mm × 95 mm, Polystyrene Reload, cat. no. 32-109RL, Genesee Scientific, USA) of maintenance food containing 10 ml media. At 12 days post-oviposition, eclosed flies were released into 9 cages, 3 cages for each of three diet treatments, such that treatments are equally represented across egg collection dates. For each cage, we used 20 vials to seed the cage with adult flies. Flies were held on treatment diet for 10 days, and provided with new food every 2-3 days. Each replicate was started on a new week to provide for time that would be needed to dissect a large number of flies per replicate later on. All flies were reared in a growth chamber at 23°C, ≥50% relative humidity, and a 24:0 light:dark cycle, which are the typical maintenance conditions for the DSPR flies. Details of our study design are schematically shown in Fig. 1.

Experimental diets
We extracted RNA for sequencing from 22 day-old (post-oviposition) female flies held on three experimental diets adapted from Bass et al, [97] and Skorupa et al, [8]: C, DR, and HS for 10 days, as described above. These are all cornmeal-sucrose-yeast diets which differ in two ways: relative to C, DR contains 50% less yeast, and HS contains about seven times sucrose (Additional file 10). We have used these diets in past studies to map expression QTL in the DSPR RILs [64], and to estimate quantitative genetic parameters in an outbred DSPR population [84]. To prevent desiccation and preserve quality, diets were covered with multipurpose sealing wrap (Press'n Seal®) sealed and stored at 4°C and used within 2 weeks of preparation. To prevent food degradation, individuals were moved to vials with fresh food three times per week.

Sample preparation and sequencing
At day 22, we collected 100-104 females per diet treatment under mild CO 2 anesthesia. We replicated this process six times over 6 days, thus creating six same-age sample batches with each batch including all three treatments (Additional file 11). This yielded assays of at least 100 females per treatment diet (HS, C and DR), three tissues within each diet (heads, ovaries and bodies), biologically replicated 6 times. Our experiment therefore comprised 54 samples (18 per diet). We immediately dissected tissues from these females and promptly flashfroze them in liquid nitrogen and held them in a closed styrofoam box on dry ice before storage at − 80°C. We purified whole RNA from each of 54 samples using a protocol modified from Life Technologies TRIzol RNA extraction protocol. Tissues were homogenized in a tissue lyser using steel beads in 1 ml TRIzol reagent, and subsequent RNA extraction by following the TRIzol protocol. RNA quality was evaluated on a Nanodrop 2000 spectrophotometer (Thermo Scientific) and concentration on a Qubit 2.0 fluorometer (Invitrogen, Life Technologies). An additional clean-up step with a QIAGEN RNeasy Plus Mini Kit (Hilden, Germany) was used for gDNA elimination following manufacturer's protocol to achieve high quality RNA for library preparation. After the RNA cleaning step, yields typically ranged between 400 and 800 ng/μl; and absorbance values (260/280 and 260/230) above 2.0 (Additional file 12). Fifty-four libraries were each prepared from 1.5 μg total mRNA, RNA integrity (RIN) ≥ 8.0 using Illumina's TruSeq Stranded mRNA (poly A enrichment) and single-end read sequencing on Illumina NextSeq 500. Barcoded libraries were combined in a single 54-plex library, which we sequenced on three lanes of a NextSeq 1 × 75 bp run for an average of 23 million reads per sample. The resulting reads were trimmed of adaptor sequences and FASTQC was used to assess quality. Sequencing was performed at the University of Missouri DNA Core Facility.

Data analysis Read alignment and quantification of expression
We employed the 'new Tuxedo' suite analysis pipeline [73] for read assembly and transcript quantification. We mapped single-end reads to the reference genome of D. melanogaster, bdgp6_tran (ftp://ftp.ccb.jhu.edu/pub/ infphilo/hisat2/data/, updated March, 2016,), using HISAT2 (version 2.1.0, Kim et al, [98]. Alignments were converted to BAM file format and runs combined using SAMtools [99]. Then, StringTie [73,100] was used to assemble RNA-seq alignment into full-length transcripts, and to quantify levels transcript expression. We intended our downstream analysis to focus on genelevel differential expression of known genes in the D. melanogaster transcriptome. Therefore we used the alternative workflow given by Pertea et al. [73] skipping steps 3-5 in that protocol, (i.e. we skipped the individual assembly of each sample and the merge step, documented here http://ccb.jhu.edu/software/stringtie/index.shtml?t=man-ual#de, last accessed May 20, 2019). To do this, we ran 'stringtie -eB' directly on the output of 'samtools -sort'. In order to extract more gene (FBgn) ids from reference gene annotations into StringTie output, we ran a Perl postprocessing script, 'mstrg_prep.pl' (Pertea, https://gist. github.com/gpertea/b83f1b32435e166afa92a2d388527f4b) that appends reference gene ids to the MSTRG gene ids used in StringTie. In order to perform differential expression analysis on row counts using DESeq2 [101], we processed the output from StringTie with a Python script 'prepDE.py' (Pertea M, http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de) with the -e parameter to extract read count matrices (one for transcripts, one for genes) directly from the files generated by StringTie.

Controlling for batch effects
As described earlier, our samples were processed in six groups on separate days (all treatments and tissues represented equally in each) including setup, dissection, and RNA extraction and preparation (Additional file 11). We sought to control for this obvious batch effect and any unknown underlying batch effects. We therefore used surrogate variable analysis (SVA, [102] implemented in the R library svaseq [103]. We used the DEseq2 package [101] to obtain counts that are normalized for library size (i.e. counts divided by size factors). Eliminating genes with zero counts, we used the svaseq function comparing the full model with a known batch to a null model with batch only. A single surrogate variable (SV) was identified and included in all subsequent models.

Differential gene expression
Next, we estimated differential gene expression with DESeq2 for DR and HS treatment conditions relative to the C treatment. We fit two generalized linear models (GLM), with parameter fitType = 'local' and only included genes with at least 10 reads in at least 2 samples. We compared the following GLMs: (1) Expression~SV + batch + tissue (2)  From these model comparisons, we identified sets of significantly differentially expressed genes using an FDR threshold of 0.05. To visualize and rank the genes we used the function lfcShrink, which performs shrinkage on log 2 (Fold Changes), which have been shown to produce better estimates. All log 2 (Fold Changes) reported here are the shrinkage estimated values using the "normal" estimator [101]. To identify global patterns of expression across diets and tissues, we performed principal components analysis (PCA) on expression values before and after removal of batch effects. We also performed PCAs for each tissue separately to better identify the global diet effects and the relationships between each diet. Interpreting fold changes can be challenging when trying to compare the diet treatments because they are essentially a ratio. Thus, when comparing two-fold changes that are both calculated with the same diet as the reference diet (in our case the C diet), we expect to see a positive relationship even for randomly generated data, making it challenging to interpret these patterns. To determine if our observed patterns were significantly different than those expected by chance, we performed 100 permutations of our dataset to eliminate the true diet effect. For each permutation, we constructed each of our three diet categories by randomly choosing two samples from each diet, thereby eliminating the diet effect and allowing us to obtain a set of null fold changes for each diet pair. We performed the permutation with a random set of two from each diet instead of simply permuting across all samples because with only 6 samples per diet in each tissue, it is not unlikely to randomly construct a treatment that contains samples from mostly one diet. For each permuted dataset, we calculated fold changes as described above. We then calculated the correlation between each pair of fold changes (e.g. fold change in HS relative to C vs. fold change in DR relative to C) and the proportion of genes that changed in the same direction in each pair (i.e. number upregulated in both + number downregulated in both/total number of genes). We then compared these values to our observed data to calculate empirical p-values [104].

Gene set enrichment in relation to diet
We considered our list of DEGs for the effect of diet treatment and used the R Bioconductor GAGE package [105] to infer gene sets and pathways that were significantly perturbed relative to all DEGs. Briefly, GAGE takes an expression matrix with log-based fold changes as input. It assumes that a particular gene set or canonical pathway (from literature or database) comes from a different distribution than the experiment-derived background list. Therefore, a two-sample t-test is applied to account for the gene set specific variance and the background variance and derives 1) a global p-value using a meta-test on p-values from gene set comparisons, and 2) a FDR q-value adjustment. Using GAGE to access the Fly annotation database 'org. Dm.eg.db', we generated current KEGG (Kyoto Encyclopedia of Genes and Genomes [106]) pathway gene sets, limiting our search to metabolic and signaling functional annotations. Similarly, we obtained up-to-date gene ontology (GO) gene sets specifying biological processes. We mapped our FlyBase gene (FBgn) IDs to ENTREZ ids, and performed pathway and gene set enrichment, and GO analysis on the resulting gene list within GAGE with the gage() function. We tested for perturbation of each gene set relative to all genes in the experiment by calculating the mean individual statistics (stat.mean) from multiple gene set tests using a two-sample t-test implemented in GAGE as gs.tTest(), and obtain a global p-value from individual pvalues. The global p-values were then adjusted for multiple testing using the Benjamini & Hochberg procedure [107], and refer to these as FDR [105].

Gene co-expression analysis
Next, we sought to identify sets of genes (modules) showing high correlation in their pattern of expression across tissues with respect to treatment condition. We used the removeBatchEffect() in the R limma package to correct for known (batch) and identified (SV) effects. We applied a regularized log transformation (rlog) to the batchcorrected matrix to minimize differences between samples for rows with low counts as well as normalize to library size. The rlog transformation is recommended if, as in this study (0.52-1.92), size factors vary widely [101]. The resulting expression matrix was used as input in the R package, Weighted Gene Co-expression Network Analysis, WGCNA, [74] to identify co-expressed gene modules showing similar patterns of expression across tissues and treatments. We built the initial network from samples over all treatments (N = 54) using a signed adjacency matrix with power 23 (i.e. function pickSoftThreshold()) to construct the topological overlap matrix (TOM) from all 12,614 genes (Additional file 1). This power represents the lowest power for which the scale-free topology fit index curve flattens out after reaching a high value of r 2 = 0.90. We performed hierarchical clustering of the TOM using the flashClust() function (method = "average") which implements hclust() clustering more efficiently for large datasets [108]. We used the cutTreeDynamic() function to identify initial the initial set of modules at the following thresholds: cutHeight = 0.95 (default), deepSplit = 2, min-ClusterDendro = 30, pamRespectsDendro = FALSE. A module eigengene is defined as the first principal component of a given set of co-expressed genes, and can be considered as a representative of the gene expression profiles in that set [74]. By convention, modules are referred to by their color labels, and, the grey module is used by default to contain genes not assigned to any module [109].
We then evaluated module membership using a resampling strategy described in Sikkink et al., [110]. Randomly, we chose four of six replicates for each dietsample combination to produce 100 new datasets, each with 36 samples. Using the same parameters as in the full dataset, WGCNA was re-run for each new dataset and resulting modules examined for gene membership. First, a resampled module was accepted if it included at least 10% of the genes in the corresponding module in the full dataset. Secondly, a gene was considered to be strongly supported to belong to a module in the full dataset if it appeared in that module in at least 50% of resampled networks. We identified genes that failed to meet these criteria and placed them in the unassigned module. e merged highly correlated modules (r ⪰ 0.9) during network construction for each resampled dataset to be consistent with how the full dataset was analyzed.
To summarize major patterns of within-module expression with respect to diet condition, we extracted the first principal component of expression of genes in each module, called the module eigengene. We then performed an ANOVA for each module eigengenes to assess the effects of diet, tissue, and diet-tissue interactions on total module expression. Because all modules had a significant effect of either diet or diet-tissue interaction, we examined module enrichment for diet-related functional annotations. We therefore accessed Bioconductor annotation libraries AnnotationDbi and org. Dm.eg.db using the GOenrichmentAnalysis() function within WGCNA, and calculated the Fisher's Exact test with the Bonferroni correction to identify significantly enriched GO terms in each module, providing all genes available in our experiment as a background list. We reviewed all terms significantly enriched (P adj. < 0.01) for the BP category (to view terms for CC, MF, see Additional file 7). For this discussion, we restricted the analysis to the BP category to focus only on biological function and not biochemical activities (MF) or subcellular location where a gene product is active (CC), at the same time reducing the number of tests for enrichment as a way to limit the number of terms for interpretation.