- Research article
- Open Access
Contribution of trans regulatory eQTL to cryptic genetic variation in C. elegans
BMC Genomics volume 18, Article number: 500 (2017)
Cryptic genetic variation (CGV) is the hidden genetic variation that can be unlocked by perturbing normal conditions. CGV can drive the emergence of novel complex phenotypes through changes in gene expression. Although our theoretical understanding of CGV has thoroughly increased over the past decade, insight into polymorphic gene expression regulation underlying CGV is scarce. Here we investigated the transcriptional architecture of CGV in response to rapid temperature changes in the nematode Caenorhabditis elegans. We analyzed regulatory variation in gene expression (and mapped eQTL) across the course of a heat stress and recovery response in a recombinant inbred population.
We measured gene expression over three temperature treatments: i) control, ii) heat stress, and iii) recovery from heat stress. Compared to control, exposure to heat stress affected the transcription of 3305 genes, whereas 942 were affected in recovering animals. These affected genes were mainly involved in metabolism and reproduction. The gene expression pattern in recovering animals resembled both the control and the heat-stress treatment. We mapped eQTL using the genetic variation of the recombinant inbred population and detected 2626 genes with an eQTL in the heat-stress treatment, 1797 in the control, and 1880 in the recovery. The cis-eQTL were highly shared across treatments. A considerable fraction of the trans-eQTL (40–57%) mapped to 19 treatment specific trans-bands. In contrast to cis-eQTL, trans-eQTL were highly environment specific and thus cryptic. Approximately 67% of the trans-eQTL were only induced in a single treatment, with heat-stress showing the most unique trans-eQTL.
These results illustrate the highly dynamic pattern of CGV across three different environmental conditions that can be evoked by a stress response over a relatively short time-span (2 h) and that CGV is mainly determined by response related trans regulatory eQTL.
Many organisms can respond to sudden changes in the ambient environmental conditions by adjusting their gene expression levels . In particular, invertebrates are prone to environment-induced rapid gene-expression changes. For instance, gene expression in the nematode Caenorhabditis elegans can swiftly change due to exposure to pathogens, temperature, and toxicants [2,3,4,5,6]. A common denominator of many of these studies is that they provide snapshots in time of the responses elicited by these environments. As such, they provide static profiles of gene expression at a given moment. Further insight into the dynamics of gene expression and gene-expression regulation can be achieved by following responses, such as development or aging, over time [7,8,9].
Gene-expression regulation can be studied by investigating the transcriptional response in the context of natural variation using genetical genomics . In this approach, a genetically segregated population (e.g. recombinant inbred lines, RILs) is used in a transcriptomics experiment to determine the genetic architecture of gene expression. Genetical genomics has been used in many species, including C. elegans, and many environments. One of the marked observations resulting from a comparison of different environments is the change in genetic architectures, revealing additional variation (as conceptualized by ). For example, the transcriptional architecture of C. elegans grown at 16 °C or 24 °C differs markedly . Such studies show that the genetic architecture of gene expression is a dynamic process affected by relatively long-term differences in environment or age.
Gene expression is also highly dynamic during development and growth. For example, development in C. elegans is a tightly regulated process that has strongly correlated patterns of gene expression [8, 9, 13]. These developmental expression dynamics can be affected by natural genetic variation, for example between two commonly used divergent strains N2 and CB4856 . Depending on the developmental stage, up to 10% of the genes in these strains show differential expression linked to genotype. This enabled the mapping of QTLs for gene expression dynamics during development .
Environmental changes can unlock genetic variation that remains hidden when in one condition but becomes apparent in another, a phenomenon called cryptic genetic variation (CGV) . CGV has received quite some renewed interest over the past decade and it is suggested that CGV provides the raw material of evolution and adaptation under different environmental conditions . Unlocking CGV can be achieved by altered gene-expression regulation, such as the transcriptional response to changing ambient conditions. Gene regulatory networks play an important role in understanding how environmental cues affect cryptic genetic variation . Here we aim to investigate the CGV of the genetic architecture over the course of a strong environmental stimulus across a recombinant inbred line (RIL) population in the nematode C. elegans. As stimulus we used a heat stress, since C. elegans strongly reacts to temperature differences . Furthermore, the two parental strains of the RIL population, Bristol (N2) and Hawaii (CB4856), display extensive variation in response to heat [12, 19,20,21]. First, we obtained the transcriptomes of the RILs over three treatments: (i) control, 48 h at 20 °C; (ii) heat stress, 46 h at 20 °C followed by 2 h exposure to 35 °C; (iii) recovery, same as the heat stress, followed by an additional 2 h at 20 °C. The transcriptomes were used for comparing the transcriptional differences as well as the identified eQTL between the three treatments. The treatments resulted in strong differences in gene expression, whereby the recovery treatment showed characteristics of both the control and the heat-stress treatment. Comparative analysis over the identified eQTL per treatment showed that cis-eQTL were strongly conserved over treatments. Trans-eQTL were more dynamic and display little overlap between treatments. We show that CGV is mainly manifested by trans-eQTL of specific sets of genes in specific environments. This makes the genetic architecture of gene expression variation an even more complex and cryptic phenomenon than previously thought.
The wild-types N2 and CB4856 and 54 RILs derived from a CB4856 x N2 cross were used (strains generated in ). For 49/54 of these strains low-coverage sequencing was applied to construct a more detailed genetic map (see also ). A matrix with the strain names and the genetic map can be found in Additional file 1.
The strains were kept on 6-cm Nematode Growth Medium (NGM) dishes containing Escherichia coli strain OP50 as food source . Strains were kept in maintenance culture at 12 °C, the standard growing temperature for experiments was 20 °C. Fungal and bacterial infections were cleared by bleaching . The strains were cleared of males prior to the experiments by selecting L2 larvae and placing them individually in a well in a 12-wells plate at 20 °C. Thereafter, the populations were screened for male offspring after 3 days and only the 100% hermaphrodite populations were transferred to fresh 9-cm NGM dishes containing E. coli OP50 and grown until starved.
Control, heat stress, and recovery from heat stress experiments for transcriptomics
The experiments were started by transferring a starved population to a fresh 9-cm NGM dish. This population was grown for 60 h at 20 °C to obtain egg-laying adults, which were bleached in order to synchronize the population. The eggs were transferred to a fresh 9-cm NGM dish. Three growing conditions were applied: (i) the control treatment was grown for 48 h at 20 °C, (ii) the heat-stress treatment was grown for 46 h at 20 °C followed by 2 h at 35 °C, and (iii) the recovery treatment was grown for 46 h at 20 °C, followed by 2 h at 35 °C and thereafter 2 h at 20 °C. Before the start of the treatment, the developmental stage of the population was determined by observing the developmental stage of the vulva in multiple individuals. Populations not consisting of L4 larvae were not isolated. Directly at the end of the treatment, the population was washed off the plate with M9 buffer and collected in an Eppendorf tube, which was flash frozen in liquid nitrogen. In this manner, 48 RILs per condition were assayed.
Genotypes and genetic map construction
Previously, 49 lines were sequenced and aligned. The single-nucleotide polymorphism (SNP) calls per strain were taken for constructing the genetic map . The SNP density was determined per 10 kb bins and recombination events were recognized as transition of an area where there were no CB4856 SNPs in 10 consecutive bins into an area where there were CB4856 SNPs and the other way around. It was not allowed to have two recombination events within 10 consecutive bins (100 kb). The 10 kb bin where the first SNPs were detected was marked as the recombination event. Before use in mapping, the map was filtered for informative markers – that is - markers indicating a recombination event in at least one of the lines. This resulted in a map of 729 informative markers, each indicating the location of the recombination events within 10 kb (see the figure in Additional file 2).
The genetic map was investigated by correlation analysis to assess the linkage between markers. Markers on the centers of the chromosomes showed strong linkage (see also ). No strong in between chromosome correlations were found (see the figure in Additional file 3).
The RNA of the RIL samples was isolated using the RNeasy Micro Kit from Qiagen (Hilden, Germany). The ‘Purification of Total RNA from Animal and Human Tissues’ protocol was followed, with a modified lysing procedure; frozen pellets were lysed in 150 μl Rneasy Lysis Tissue buffer, 295 μl RNAse-free water, 800 μg/ml proteinase K and 1% ß-mercaptoethanol. The suspension was incubated at 55 °C at 1000 rpm in a Thermomixer (Eppendorf, Hamburg, Germany) for 30 min or until the sample was clear. After this step the manufacturer’s protocol was followed.
cDNA synthesis, labelling and hybridization
The ‘Two-Color Microarray-Based Gene Expression Analysis; Low Input Quick Amp Labeling’ -protocol, version 6.0 from Agilent (Agilent Technologies, Santa Clara, CA, USA) was followed, starting from step five. The C. elegans (V2) Gene Expression Microarray 4X44K slides, manufactured by Agilent were used. Before starting cDNA synthesis, quality and quantity of the RNA were measured using the NanoDrop-1000 spectrophotometer (Thermo Scientific, Wilmington DE, USA) and RNA integrity was determined by agarose gel electrophoresis (3 μL of sample RNA on 1% agarose gel).
Data extraction and normalization
The microarrays were scanned by an Agilent High Resolution C Scanner with the recommended settings. The data was extracted with Agilent Feature Extraction Software (version 10.5), following manufacturers’ guidelines. For normalization, “R” (version 3.3.1 × 64) with the Limma package was used. The data was not background corrected before normalization (as recommended by ). Within-array normalization was done with the Loess method and between-array normalization was done with the Quantile method . The obtained single channel normalized intensities were log2 transformed and used for further analysis.
The transcriptional response to heat stress was determined by explaining the gene expression over the treatment with a linear model,
where y is the log2-normalized intensity as measured by microarray of spot i (i = 1, 2, ..., 45,220), and T is the treatment (either control, heat stress, or recovery from heat stress). This analysis ignored genotype.
The significances were corrected for multiple testing by applying the Benjamini Yekutieli method in p.adjust (R, version 3.3.1 Windows ×64) at FDR = 0.05 . Thresholds of -log10(p) ≥ 2.87 for the control versus heat-stress treatment, −log10(p) ≥ 3.09 for the control versus recovery treatment, and -log10(p) ≥ 3.02 for the heat-stress versus recovery treatment were determined.
Due to the setup of our experiment, potential variation in development could exist among the RILs and the treatments. The recovery animals were sampled two hours later than the control and heat-stress animals, furthermore, heat stress slows the developmental rate . We estimated the relative age by using a set of ~100 genes that show a strong, positive, linear response during development . By setting the average age of the control RILs to 48 h we could estimate and compare the RILs in all treatments (Additional file 8).
Principal component analysis
A principal component analysis was conducted on the gene-expression data of the RILs over the three treatments. For this purpose, the data was transformed to a log2 ratio with the mean, using
where R is the log2 relative expression of spot i (i = 1, 2, ..., 45,220) in strain j (RIL) over all three conditions (n = 48 per condition), and y is the intensity (not the log2-transformed intensity) of spot i in strain j.
The transformed data was used in a principal component analysis, where the first six axes were further examined.
Expression quantitative trait locus analysis
eQTL mapping and threshold determination
The eQTL mapping was done in “R” (version 3.3.1 Windows ×64). The gene-expression data was fitted to the linear model,
where y is the log2-normalized intensity as measured by microarray of spot i (i = 1, 2, ..., 45,220) of RIL j. This is explained over the genotype (either CB4856 or N2) on marker location x (x = 1, 2, ..., 729) of RIL j.
The genome-wide significance threshold was determined via permutation, where the log2-normalized intensities were randomly distributed per gene over the genotypes. The randomized data was tested using the same model as for the eQTL mapping. This was repeated for ten randomized datasets. A false discovery rate was used to determine the threshold (as recommended for multiple testing under dependency) ,
where FDS (false discoveries) is the outcome of the permutations and RDS (real discoveries) is the outcome of the eQTL mapping at a specific significance level. The value of m0, the number of true null hypotheses tested, was 45,220-RDS, and for the value of m, the number of hypotheses tested, the number of spots (45220) was taken. The q-value was set at 0.05. This yielded a threshold of –log10 (p) > 3.9 for the control, −log10 (p) > 3.5 for the heat stress, and–log10 (p) > 3.9 for the recovery treatment. For the analyses we used the most conservative thresholds measured, −log10 (p) > 3.9, for all the sets.
Statistical power calculations
In order to determine the statistical power at the set FDR threshold, QTL were simulated using the genetic map of the strains used per condition (n = 48 per condition). For each marker location, ten QTL were simulated that explained 20–80% of the variation (in increments of 5%). Random variation was introduced based on a normal distribution with sigma = 1 and mu = 0 and a peak of the corresponding size (e.g. a peak size of 1 corresponds to 20% explained variation) was simulated in this random variation. From the simulation, the number of correctly detected QTL, the number of false positives and the number of undetected QTL were counted. This was based on the thresholds determined in the permutations, −log10 (p) > 3.9. Furthermore, the precision of the effect-size estimation and the precision of the QTL location (based on a –log10(p) drop of 1.5 compared to the peak) were determined. A table summarizing the results can be found in Additional file 9.
The distinction between cis- and trans-eQTL was made on the distance between the physical location of the gene and the location of the eQTL-peak. For cis-eQTL the gene lies within 1 Mb of the peak or within the confidence interval of the eQTL. The confidence interval was based on a –log10(p) drop of 1.5 compared to the peak.
The amount of variation explained per microarray spot with an eQTL was calculated by ANOVA, by analysis of the gene expression explained over the peak-marker. For spots with multiple peaks, this analysis was conducted per peak, not using a full model, since a single-marker model was used in the analysis.
In order to identify trans-bands (an enrichment of trans-eQTL), a Poisson distribution of the mapped trans-eQTL was assumed (as in ). Therefore the number of trans-eQTL per 0.5 Mb bin were counted. Since trans-eQTL peaks were mapped to 107, 106, and 103 bins (respectively in control, heat stress, and recovery), it was expected that 9.16, 20.64, and 9.01 spots with a trans-eQTL were to be found at each of these markers. Based on a Poisson distribution, it was calculated how many trans-eQTL needed to be found to represent an overrepresentation. For example, for p < 0.001 there should be 20, 36, or 20 spots with a trans-eQTL at a specific marker (respectively in control, heat stress, and recovery).
To test for polymorphisms in genes with eQTL, we used the data from the CB4856 reference genome . The genes with eQTL were matched to the polymorphisms. The frequencies of polymorphisms in each of the groups (genes with cis-eQTL, genes with trans-eQTL, and genes without eQTL) were counted and compared versus each other by a chi-squared test in “R” (version 3.3.1, ×64).
Detection of eQTL across treatments
Two criteria were used to detect the occurrence of eQTL over multiple treatments.
In the first criterion, it was tested whether or not an eQTL was mapped in treatment one versus treatment two, by simply comparing the tables listing the eQTL. This allowed for comparison of the actual mapped peaks and for comparison of eQTL effects of trans-eQTL regulated from different loci. In order to estimate the false-discovery rate associated with this comparison, the same analysis was applied to ten permutated datasets per condition, using the –log10(p) > 3.9 for eQTL discovery.
The second criterion compared the occurrence of eQTL at the exact same marker location. In this comparison, the eQTL mapped in one treatment were taken as lead for the occurrence of the same eQTL in the other two treatments. This comparison allowed for direct comparison of the eQTL effect at the locus. Based on observations on the effect distribution, this approach was used to estimate the number of trans-eQTL not detected due to statistical power or not detected due to absence of the eQTL in a treatment (see also text in Additional file 15).
Functional enrichment analysis
Gene group enrichment analysis was done using a hypergeometric test and several databases with annotations. The databases used were: the WS220 gene class annotations, the WS256 GO-annotation, anatomy terms, phenotypes, RNAi phenotypes, developmental stage expression, and disease related genes (www.wormbase.org) ; the MODENCODE release 32 transcription factor binding sites (www.modencode.org) [30, 31], which were mapped to transcription start sites (according to ); and the KEGG pathway release 65.0 (Kyoto Encyclopedia of Genes and Genomes, www.genome.jp/kegg/) .
Enrichments were selected based on the following criteria: size of the category n > 3, size of the overlap n > 2. The overlap was tested using a hypergeometric test, of which the p-values were corrected for multiple testing using Bonferroni correction (as provided by p.adjust in R, 3.3.1, ×64). Enrichments were calculated based on gene names, not on spots.
Transcriptional response over the course of heat stress
To better understand the transcriptional response to heat stress, we obtained the transcriptomes of 48 recombinant inbred lines (RILs) at the L4 stage in each of three treatments: control, heat stress, and recovery from heat stress (Fig. 1a). The effects of the treatments on gene-expression levels were analyzed using a linear model for pairwise comparisons between each of the conditions (see volcano plots in Additional file 4 and a list of affected spots in Additional file 5). In this way, we identified 7720 differentially expressed genes over the course of the three treatments (FDR = 0.05; Fig. 1b). We found that both control and heat stress had many unique differentially expressed genes: 2321 genes were only differently expressed in the comparisons of the control treatment to the other two treatments and 3305 genes were only differently expressed in the comparisons of the heat-stress treatment with the other two treatments. In the comparisons with the recovery treatment, only 942 genes were found unique for that treatment. Furthermore, many differentially expressed genes were shared in the comparison of the recovery treatment versus the control and heat-stress treatment (2251 genes). Again, the control and recovery (1152) and heat stress and recovery (1189) shared fewer genes. There were 1255 genes that were differentially expressed between all three conditions and were therefore highly treatment dependent. These results indicate that the control and heat-stress treatments are strongly contrasting in gene expression, whereas the recovery treatment shares characteristics with both other treatments.
To follow up on this interpretation, a principal component analysis (PCA) was conducted on the gene-expression data, transformed to the log2 ratio with the mean. The first axis (20.0% of the variation) captures variation mostly related to the heat-shock treatment, which is expected as this treatment specifically affect the largest number of genes (Additional file 6). The second axis (11.4% of the variation) captures variation mostly related to the control treatment, which also fits the analysis with the linear model as the control treatment was the second most distinct treatment. Together, these two axes also place the recovery treatment in between the control and heat-stress treatment, showing the contrast with the other two treatments was lower and possibly indicating transcript levels in the recovery treatment are returning to normal. The third principal component (10.1% of the variation) captures variation that sets the heat-stress treatment completely apart from the other two treatments, which is as expected since the heat-stress treatment has the most unique differentially expressed genes.
In order to gain further insight into the functional differences between the treatments, an enrichment analysis was conducted on genes belonging to the different overlap groups as shown in Fig. 1 (For example genes differentially expressed in only the control treatment, see the list in Additional file 7). Each of the groups was enriched for many processes, showing that the treatments had a profound impact on gene expression. Interestingly, we found that genes specific for each of the three treatments were enriched for genes expressed in the intestine. Furthermore, the control and heat-stress treatments were strongly enriched for genes expressed in the germline. These enrichments indicated that the expression of genes involved in metabolism and reproduction (or development of the reproductive organs) were strongly altered during the heat-stress response. As this may be caused by a developmental difference in the sampled populations, we estimated the developmental age using a transcriptional ruler (see Methods for details) . It was found that the control population was transcriptionally slightly younger than the heat shock (estimated ~1 h older) and the recovery population (estimated 1.3 h older; Additional file 8).
Gene expression linked to genetic variation
Linkage mapping was performed using 48 RILs for each of the three treatments. Statistical power analysis showed that this population has the power to detect 80% of the eQTL that explain at least 35% of the variation (see Methods and the table in Additional file 9). Identified eQTL (FDR ≤ 0.05; Fig. 2 ; a table with all eQTL is given in Additional file 10) were compared between the treatments (Table 1). Most genes with an eQTL were found in the heat-stress treatment (2626), whereas the control (1797) and recovery (1880) had similar numbers. This increase in genes with eQTL was primarily caused by the larger number of genes with trans-eQTL in the heat-stress treatment (1560; ~57% of total eQTL) compared to the control (751; ~40% of total) and recovery (739; ~38% of total). The number of cis-eQTL was almost identical among conditions (Table 1).
The cis-eQTL showed a bias for higher expression if the regulatory locus had the N2 allele (on average 70% of the genes with a cis-eQTL). This bias was absent in the trans-eQTL where, on average, 43% of the genes with a trans-eQTL was more highly expressed if the locus had the N2 allele (illustrated by the figure in Additional file 11). This discrepancy was to be expected, since the microarray platform used to measure the transcripts was designed for the N2 genotype. Therefore, part of this variation is likely due to mis-hybridization. However, previous studies have shown that this does not explain all the variation (see ). In congruency, we found genes with a cis-eQTL to be more polymorphic than genes with a trans-eQTL and genes without an eQTL (summarized in Additional file 12). For example, genes with a cis-eQTL were more likely to be fully deleted in CB4856 (9.5% of the genes, compared to 0.9% in genes without an eQTL or 1.1% in genes with a trans-eQTL; Chi-squared test, P < 1*10−34). Furthermore, supporting that not all cis-eQTL stem from mis-hybridization, polymorphisms in the flanking 3′ and 5′ regions were about two times more likely to occur near cis-eQTL compared to genes without a cis-eQTL (Chi-squared test, P < 1*10−6). When these enrichments for 3′ and 5′ polymorphisms were compared between cis-eQTL with an N2 or CB4856 higher effect, we found no significant difference.
The trans-eQTL were mainly found in 19 treatment-specific trans-bands, loci that regulate the abundance of many transcripts. The 19 trans-bands were identified by analysis of the occurrence of trans-eQTL across the genome (Poisson distribution, P < 0.001; listed in Additional file 13). The seven trans-bands detected in the heat-stress treatment affected most genes (1141; ~73% of all trans-eQTL). In the control treatment, five trans-bands were found (325 genes; ~43% of all trans-eQTL) and in recovery treatment seven trans-bands were identified (343 genes; ~46% of all trans-eQTL). Six out of the 19 trans-bands individually affected >100 genes, one located at chromosome X: 0.5–2.0 Mb in control treatment; four in heat-stress treatment at chromosome I: 2.0–3.5 Mb, II: 12.0–13.5 Mb, IV: 1.0–2.5 Mb, and V:1.0–3.0 Mb; and one in recovery treatment at chromosome I: 1.5–3.0 Mb. Importantly, the distribution across the genome of trans-bands and eQTL is treatment-specific. To further investigate this, we determined the overlap in mapped eQTL over the course of the heat-stress response.
In contrast to cis-eQTL, trans-eQTL were environment-specific
Comparing the genes with a cis-eQTL among the three treatments, we found 1086 out of 1789 unique genes (~61%) with a cis-eQTL in more than one treatment and 664 (~37%) in all three treatments (Fig. 3a). Because cis-eQTL can be caused by mis-hybridizations, we also calculated the overlap for cis-eQTL with an N2 higher effect. In that selection, 303 out of 615 unique genes (~49%) had a cis-eQTL in more than one treatment and 157 (~26%) in all three treatments. For the whole set of genes with a trans-eQTL the overlap was much smaller (Fig. 3b); 360 out of 2610 genes (~14%) were found in more than one treatment and only 80 genes (~3%) in all three treatments. By definition, the locus at which cis-eQTL were detected was the same between treatments. Furthermore, the cis-eQTL effect sizes and directions were highly comparable (Pearson correlation coefficients between 0.94–0.96; shown in a figure in Additional file 14). The cis-eQTL that were detected in only one treatment were probably missed due to the small amount of variation explained by these eQTL (see the text in Additional file 15). Interestingly, there were only three genes with cis-eQTL showing genotypic plasticity: C52E2.4, C54D10.9, and nhr-226. This meant that we observed allelic variation acting in opposite directions between environments (Fig. 3c).
The effect sizes and direction of genes with a trans-eQTL found in more than one treatment were very similar (Pearson correlation coefficients between 0.90–0.93; figure in Additional file 14). However, only a few genes with a trans-eQTL were found over multiple treatments, far lower than the overlap in cis-eQTL between conditions (Fig. 3a and b). Since trans-eQTL are not by definition regulated from the same location, the overlap between treatments declines even further if location is taken into account. Taking the trans-eQTL from the three treatments together, only ~38% of the multi-treatment trans-eQTL were located at a different locus from one treatment to another (~28% if only loci at different chromosomes were counted), see Additional file 16 : (Fig. A). Interestingly, although the regulatory locus was located elsewhere, the genotypic effect of the trans-eQTL was almost identical (Additional file 16: Fig. B), yet most genes with trans-eQTL only displayed an eQTL in one treatment (text in Additional file 15). The likely reason that the majority of trans-eQTL was not detected across treatments is that most trans-eQTL are environment-specific and therefore highly cryptic.
Trans-eQTL display two types of cryptic variation
For treatment-specific, cryptic trans-eQTL, we found that a regulator is active when the eQTL is detected and it is not active when no eQTL is detected. With on and off switching regulators between treatments, genes can have dynamic trans-eQTL, which appear as a treatment-specific switch of regulatory loci (Fig. 3-5). Since the majority of genes with a trans-eQTL have one unique trans-eQTL in only one treatment (Fig. 3b), a switch in regulatory loci seems to occur less frequently compared to the on/off switch.
As the majority of the trans-eQTL was treatment-specific, we investigated whether detection in only one treatment was a result of the statistical power of our study or if it was a biological phenomenon. As trans-eQTL explain 34.2% of variation on average, compared to 52.1% of variation for cis-eQTL (text in Additional file 15), it is possible that the detection of trans-eQTL was more affected by lack of statistical power. By simulations, we estimated that this only affected 20.6% of the trans-eQTL that were not detected in multiple treatments (for the detailed analysis, see the text in Additional file 15), which argues for the cryptic nature of trans-eQTL. Another line of evidence for this is the low overlap in affected genes in co-locating trans-bands across treatments (13/19 trans-bands co-locate). We only found significant overlap in three pairs of trans-bands, where 6.5–11.1% of the affected genes overlap (text in Additional file 15 ; hypergeometric test p < 1*10−4). For example, one of these, a major trans-band at chromosome IV:1–2.5 Mb in the heat-stress treatment, affected 244 genes of which 22 overlapped with the 31 genes in the recovery trans-band on chromosome IV:1–2 Mb. Together, these results show that the majority (67.3%) of trans-eQTL indeed are cryptic. For example, the transcript levels of flp-22, pqm-1, and sod-5 were affected by treatment (Fig. 4a) and showed a trans-eQTL in only one treatment (Fig. 4b).
As mentioned before, only a minority of the genes with a trans-eQTL have different eQTL across treatments. Of those genes with different trans-eQTL over treatments, the genotypic effects of the eQTL were similar across treatments, even if the loci were different (Additional file 16: Fig. B). Only between chromosome IV and V were eQTL with changes in effect directions observed. One of the genes displaying this pattern was gei-7 (also known as icl-1). This gene was represented by three different micro-array probes, all showing the same pattern: a primary eQTL on chromosome V at ~12.0 Mb in all three conditions and a secondary eQTL in the heat-stress treatment at chromosome IV at ~1.5 Mb (Fig. 5).
Functional enrichment of eQTL
To find which biological processes were affected by genetic variation on a gene-expression level we looked for enrichment in gene classes, phenotypes, KEGG pathways, GO terms, and anatomy terms (see the list in Additional file 17). For cis-eQTL enriched categories were similar in all three treatments, as expected by the consistent nature of cis-eQTL. For cis-eQTL, we found enrichments for the gene classes bath, math, btb, and fbxa, which were previously found to be highly polymorphic between CB4856 and N2 . Moreover, we found enrichment for genes involved in the innate immune response and protein homo-oligomerization. It should be noted that these enrichments are likely due to hybridization differences for the polymorphic genes (as cis-eQTL with a positive N2 effect are enriched for exactly these categories, Additional file 17).
The trans-eQTL were also enriched for genes functioning in the innate immune response, especially for genes where the N2 allele leads to higher expression. Furthermore, genes expressed in the intestine were enriched in the trans-eQTL found in control and heat-stress conditions. Contrasting to the genes with cis-eQTL, the genes with trans-eQTL were enriched for many different transcription factor binding sites, indicating active regulation of trans-eQTL.
Consistent trans-eQTL were found in all three treatments for the enriched NSPC (nematode specific peptide family, group C) gene class. This was remarkable as only a very small part of the trans-eQTL were shared over the three treatments. For the heat stress and recovery trans-eQTL, genes expressed in the dopaminergic neuron were enriched, with the strongest enrichment in the heat-stress treatment. These genes were also enriched in the control treatment, however, in a group of trans-eQTL mapping to a different trans-band. In the heat stress and recovery treatment the dopaminergic neuron-specific genes showed trans-eQTL at the IV:1-2 MB locus, whereas in the control they showed trans-eQTL at the X:4–6 locus (Fig. 6).
Transcriptional response over the course of a heat stress
Here we present a comprehensive study of the effects of an induced heat-stress treatment on the genetic architecture of gene expression. The obtained transcriptomes were analyzed in the context of the treatments and in the context of the genetic variation present in the strains used in the experiments.
We found that many of the genes that are affected over the heat-stress course are associated with expression in the germline and intestine. These findings are partially in line with findings from an investigation on the heat-shock regulatory network using a genome-wide RNAi screen in C. elegans . Their heat-stress conditions were 31.5 °C for two hours followed by 24 h of recovery at 20 °C. The authors found that genes associated with the proteasome induced heat-stress-specific gene expression only in the intestine and spermatheca, which corroborates with our results. Differences with Guisbert et al. (2013), could be explained by the differences in larval stage used (L4 vs L2) as well as differences in temperature and duration of the heat stress (and recovery). Another study exposed C. elegans at the L4 stage to a heat stress of 30 min at 33 °C, and measured transcriptome differences using RNA-seq . In support of our study, they also detected genes associated with metabolism and reproduction, whereas, in contrast to our findings, they found a strong link with cuticle specific genes. This contrast could be due to the different experimental conditions of Bunquell et al. (2016), as they used RNAi treatments (empty vector or against hsf-1), a different heat shock duration, method of synchronization (additional L1 arrest by ), and rearing temperature before heat shock (23 °C versus 20 °C in this study).
We hypothesize that ultimately these discrepancies are likely to result from developmental differences. The transcriptional program in C. elegans differs strongly during development [8, 9]. Therefore, application of a heat shock on L2 larvae has a very different developmental (and therefore transcriptional) starting point compared to a heat shock applied on L4 larvae. Furthermore, within the L4 stage there is a strong difference in gene expression in early-stage L4 and late-stage L4. We estimated that the heat-stress and recovery RILs were one hour older than the control RILs suggesting that the former had an accelerated development as these populations were physically the same age as the control population. The main processes that affect transcription in the L4 stage are reproduction and development . These are exactly the processes that are halted upon induction of a heat shock [5, 34]. Therefore, it is likely that the state in which these processes are strongly affects the possible routes for down regulation. It would therefore be very interesting to study the effect of heat shock in relation to the developmental dynamics.
If the effect of a heat shock is indeed dependent on developmental status, then the transcriptomes of the RILs presented in this study could be indicative of the phenotypic outcome (e.g. heat-stress survival). The reason being the developmental gradient generated by RILs (as shown by  within a single experiment, or explicitly by  between different stages), which mainly affects trans-eQTL. Alternatively, our experiment could also be analysed by including the variation among the estimated age of the RILs in the mapping model together with treatment effects. A thorough interpretation of these models requires a better understanding of the transcriptional dynamics over the heat-shock and recovery response, a goal we are currently actively pursuing .
In contrast to trans-eQTL, cis-eQTL are directly linked to polymorphisms
The cis-eQTL over the three treatments, which strongly overlapped, are highly enriched for polymorphic genes. This has been reported before in C. elegans, but also in A. thaliana, Mus musculus, and for human cis-eQTL [12, 28, 37,38,39,40,41]. This can result in the detection of transcriptional variation that is actually caused by hybridization differences [28, 40]. Analysis of the bias in cis-eQTL with higher expression in N2 (the strain for which the microarray was developed) versus CB4856 indeed shows that a proportion of the cis-eQTL is likely to stem from hybridization differences. Also apparent from the gene-enrichment analysis, cis-eQTL were overrepresented for polymorphic gene classes such as bath, math, btb, and fbxa, which are also divergent among other wild strains [42, 43]. Other experimental methods could limit such ‘false positives’, for example, RNA sequencing is expected not to suffer from such biases .
Interestingly, genes with a cis-eQTL were also strongly enriched for polymorphisms in regulatory regions. For these cis-eQTL, it could be true that the expression is affected by transcription factor (TF) binding sites , yet we did not detect any enrichment for such sites as mapped by ModEncode [30, 31]. An explanation for this is that genes with a cis-eQTL are regulated by different TFs, therefore the affected TF binding site is different per gene with a cis-eQTL making an overrepresentation among all cis-eQTL unlikely.
CGV of transcriptional architecture is determined by trans-eQTL
Although previous studies in C. elegans have focused on continuous (thermal or developmental) treatments or gradual change over time [12, 15, 35], only few genetical genomics experiments have measured the effect of acute perturbation, where an organism is suddenly exposed to a different environment [39, 46]. This acutely affected the transcriptional architecture of gene expression, which consists of cis- and trans-acting eQTL. We found that cis-eQTL were robust across all three treatments, including heat stress, which was found before in studies on C. elegans and other species where eQTL patterns have been studied in different conditions. For example, Li et al. (2006) found that more than 50% of all trans-eQTL were affected by temperature compared to cis-eQTL , and Smith and Kruglyak (2008) also reported that cis-eQTLs were hardly affected by external conditions as compared to trans-eQTL . Also, in other species, like Arabidopsis thaliana, it was reported that cis-eQTL were robust to light perturbation, whereas trans-eQTL were more affected by different light regimes . In humans, cis-eQTL showed a very high correlation between tissues compared to trans-eQTL, and are replicated across populations in lymphoblastoid cell lines [48, 49].
By inducing a strong environmental perturbation in the form of a heat shock, a high amount of different trans-eQTL were detected across the control, heat-stress, and recovery treatments. One of the things we noted is the increase in the number of trans-eQTL and trans-bands in the heat-stress treatment compared to the other two treatments. This strong transcriptional variation could underlie – or result from - the phenotypic trait differences observed between N2 and CB4856 in temperature experiments. However, it should be noted that also developmental differences due to timing between the three treatments can contribute [9, 15]. Temperature-affected trait differences have been observed for age at maturity, fertility, body size, vulval induction, and lifespan [19, 50,51,52,53]. Furthermore, these strains also display behavioural differences in heat avoidance and thermal preference [20, 21]. Likely candidates for these trait differences could be found in loci affecting the expression of many trans-eQTL. For example, the left arm of chromosome IV harbours a trans-band affecting the expression of 244 genes and coincides with a QTL affecting lifespan after heat shock . Analogously, the laboratory allele of npr-1 affects many trait differences between N2 and CB4856 (as reviewed in ). Most importantly, npr-1 affects the behaviour of the animal, which probably results in gene-expression differences, as part of the expression differences can be mimicked by starving nematodes . These gene-expression differences can be picked up as a trans-band . The latter example illustrates both a link between gene-expression and classical traits and that caution is required for inferring the direction of causality.
Why is CGV mainly affecting trans-eQTL? We hypothesize that this is due to the versatile nature of trans-eQTL. First, trans-eQTL are loci that are statistically associated with variation in transcript abundance from genes elsewhere on the genome. The ultimate causes for this association can be manifold, from direct interactions such as polymorphic transcription factors that affect gene expression to indirect interactions such as receptor-kinase interactions , receptors [46, 55, 56], or effects at the behavioural level that result in expression differences . Therefore, an environment can require the organism to respond, thereby requiring specific polymorphic genes to react, ultimately leading to the expression of cryptic variation.
The trans-eQTL architecture is comprised of treatment-specific genes
The trans-eQTL architecture is remarkably unique over the three treatments tested. We only observed 3% overlap in trans-eQTL in the three treatments, for which the main cause was treatment specificity of trans-eQTL. Surprisingly, genetic variation affects the expression of genes in only one direction; only in rare cases does allelic variation change the sign of the effect and this was only observed for cis-eQTL (C52E2.4, C54D10.9, and nhr-226). On the one hand, this is in congruency with other eQTL studies comparing different environments; trans-eQTL are strongly affected by different environments (for example, see [12, 39, 47]). On the other hand, it raises questions about the genetic architecture of trans-bands; co-localizing trans-bands are generally not affecting the expression of the same genes (see text in Additional file 15), which can imply the involvement of multiple regulators (causal genes).
However, it seems unlikely that an abundance of novel trans-eQTL also implies an abundance of novel causal genes. First of all, over all three treatments, the majority of trans-eQTL are located in trans-bands, which are mostly non-overlapping between treatments. This is an observation that extends to other studies and other species in which eQTL have been mapped: the majority of trans-eQTL are found on a few regulatory hotspots (for example, see [28, 38, 57]). Therefore, it is logical to assume that a small set of causal genes ultimately explains the majority of trans-eQTL. Second, the allelic effect only has one direction, which is easily aligned with the notion of a few regulators instead of many. Together, these observations can help in further dissecting loci to identify causal genes. Trans-band regulators might play a role in the dynamic response, which could aid in narrowing down candidate genes.
However, it should be reiterated that the route from genetic variation resulting in transcriptional effect can be manifold, which can also obscure the ultimate cause of the observed trait variation. An organism is an intricate web of interdependencies leading to the phenotype. Therefore, upon further dissection of the loci, a single eQTL might prove to be many. Furthermore, it should be noted that trans-eQTL explain less variation compared to cis-eQTL, which has been established across species [28, 58]. Therefore, it is more likely that trans-eQTL are not detected. However, the treatment specificity and direction of allelic effects of trans-eQTL across three treatments robustly show trans-eQTL architecture is comprised of treatment-specific genes.
Here we present the contribution of CGV on eQTL across three treatments in the nematode C. elegans. We find that mainly trans-eQTL are affected by CGV, in contrast to cis-eQTL, which are highly similar across treatments. Furthermore, we show that most CGV results in unique genes with a trans-eQTL, instead of different allelic effects and/or different eQTL for the same genes. This shows the highly dynamic nature of CGV.
Cryptic genetic variation
Expression quantitative trait locus
False discovery rate
Logarithm (base 10) of odds
Nematode growth medium
Nematode specific peptide family, group C
Principal component analysis
Recombinant inbred lines
Single nucleotide polymorphism
Grishkevich V, Yanai I. The genomic determinants of genotype x environment interactions in gene expression. Trends Genet. 2013;29(8):479–87.
GuhaThakurta D, Palomar L, Stormo GD, Tedesco P, Johnson TE, Walker DW, et al. Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. Genome Res. 2002;12(5):701–12.
Bakowski MA, Desjardins CA, Smelkinson MG, Dunbar TL, Lopez-Moyado IF, Rifkin SA, et al. Ubiquitin-mediated response to microsporidia and virus infection in C. elegans. PLoS Pathog. 2014;10(6):e1004200.
Nakad R, Snoek LB, Yang W, Ellendt S, Schneider F, Mohr TG, et al. Contrasting invertebrate immune defense behaviors caused by a single gene, the Caenorhabditis elegans neuropeptide receptor gene npr-1. BMC Genomics. 2016;17:280.
Guisbert E, Czyz DM, Richter K, McMullen PD, Morimoto RI. Identification of a tissue-selective heat shock response regulatory network. PLoS Genet. 2013;9(4):e1003466.
Vinuela A, Snoek LB, Riksen JA, Kammenga JE. Gene expression modifications by temperature-toxicants interactions in Caenorhabditis elegans. PLoS One. 2011;6(9):e24676.
Golden TR, Melov S. Microarray analysis of gene expression with age in individual nematodes. Aging Cell. 2004;3(3):111–24.
Kim D, Grun D, van Oudenaarden A. Dampening of expression oscillations by synchronous regulation of a microRNA and its target. Nat Genet. 2013;45(11):1337–44.
Snoek LB, Sterken MG, Volkers RJ, Klatter M, Bosman KJ, Bevers RP, et al. A rapid and massive gene expression shift marking adolescent transition in C. elegans. Sci Rep. 2014;4:3912.
Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends Genet. 2001;17(7):388–91.
Li Y, Breitling R, Jansen RC. Generalizing genetical genomics: getting added value from environmental perturbation. Trends Genet. 2008;24(10):518–24.
Li Y, Alvarez OA, Gutteling EW, Tijsterman M, Fu J, Riksen JA, et al. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. 2006;2(12):e222.
Hendriks GJ, Gaidatzis D, Aeschimann F, Grosshans H. Extensive oscillatory gene expression during C. elegans larval development. Mol Cell. 2014;53(3):380–92.
Capra EJ, Skrovanek SM, Kruglyak L. Comparative developmental expression profiling of two C. elegans isolates. PLoS One. 2008;3(12):e4055.
Francesconi M, Lehner B. The effects of genetic variation on gene expression dynamics during development. Nature. 2014;505(7482):208–11.
Paaby AB, Rockman MV. Cryptic genetic variation: evolution's hidden substrate. Nat Rev Genet. 2014;15(4):247–58.
Ledon-Rettig CC, Pfennig DW, Chunco AJ, Dworkin I. Cryptic genetic variation in natural populations: a predictive framework. Integr Comp Biol. 2014;54(5):783–93.
Iwasaki WM, Tsuda ME, Kawata M. Genetic and environmental factors affecting cryptic variations in gene regulatory networks. BMC Evol Biol. 2013;13:91.
Rodriguez M, Snoek LB, Riksen JA, Bevers RP, Kammenga JE. Genetic variation for stress-response hormesis in C. elegans lifespan. Exp Gerontol. 2012;47(8):581–7.
Gaertner BE, Parmenter MD, Rockman MV, Kruglyak L, Phillips PC. More than the sum of its parts: a complex epistatic network underlies natural variation in thermal preference behavior in Caenorhabditis elegans. Genetics. 2012;192(4):1533–42.
Glauser DA, Chen WC, Agin R, Macinnis BL, Hellman AB, Garrity PA, et al. Heat avoidance is regulated by transient receptor potential (TRP) channels and a neuropeptide signaling pathway in Caenorhabditis elegans. Genetics. 2011;188(1):91–103.
Thompson OA, Snoek LB, Nijveen H, Sterken MG, Volkers RJ, Brenchley R, et al. Remarkably Divergent Regions Punctuate the Genome Assembly of the Caenorhabditis elegans Hawaiian Strain CB4856. Genetics. 2015;200(3):975–89.
Brenner S. The genetics of Caenorhabditis elegans. Genetics. 1974;77(1):71–94.
Rockman MV, Kruglyak L. Recombinational landscape and population genomics of Caenorhabditis elegans. PLoS Genet. 2009;5(3):e1000419.
Zahurak M, Parmigiani G, Yu W, Scharpf RB, Berman D, Schaeffer E, et al. Pre-processing Agilent microarray data. BMC bioinf. 2007;8:142.
Smyth GK, Speed T. Normalization of cDNA microarray data. Methods. 2003;31(4):265–73.
Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29(4):1165–88.
Rockman MV, Skrovanek SS, Kruglyak L. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science. 2010;330(6002):372–6.
Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, et al. WormBase 2014: new views of curated biology. Nucleic Acids Res. 2014;42(Database issue):D789–93.
Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science. 2010;330(6012):1775–87.
Niu W, Lu ZJ, Zhong M, Sarov M, Murray JI, Brdlik CM, et al. Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans. Genome Res. 2011;21(2):245–54.
Tepper RG, Ashraf J, Kaletsky R, Kleemann G, Murphy CT, Bussemaker HJ. PQM-1 Complements DAF-16 as a Key Transcriptional Regulator of DAF-2-Mediated Development and Longevity. Cell. 2013;154(3):676–90.
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27(1):29–34.
Brunquell J, Morris S, Lu Y, Cheng F, Westerheide SD. The genome-wide role of HSF-1 in the regulation of gene expression in Caenorhabditis elegans. BMC Genomics. 2016;17:559.
Vinuela A, Snoek LB, Riksen JA, Kammenga JE. Genome-wide gene expression regulation as a function of genotype and age in C. elegans. Genome Res. 2010;20(7):929–37.
Jovic K, Sterken MG, Grilli J, Bevers RPJ, Rodriguez M, Riksen JAG, Allesina S, Kammenga JE, Snoek LB: A Point Of No Return Leading To Death During Heat-Shock In <em>C. elegans</em>. bioRxiv 2017. https://doi.org/10.1101/135988.
Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature. 2011;477(7365):419–23.
Keurentjes JJ, Fu J, Terpstra IR, Garcia JM, van den Ackerveken G, Snoek LB, et al. Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Natl Acad Sci U S A. 2007;104(5):1708–13.
Snoek LB, Terpstra IR, Dekter R, Van den Ackerveken G, Peeters AJ. Genetical Genomics Reveals Large Scale Genotype-By-Environment Interactions in Arabidopsis thaliana. Front Genet. 2012;3:317.
Alberts R, Terpstra P, Li Y, Breitling R, Nap JP, Jansen RC. Sequence polymorphisms cause many false cis eQTLs. PLoS One. 2007;2(7):e622.
Li Y, Breitling R, Snoek LB, van der Velde KJ, Swertz MA, Riksen J, et al. Global genetic robustness of the alternative splicing machinery in Caenorhabditis elegans. Genetics. 2010;186(1):405–10.
Thompson O, Edgley M, Strasbourger P, Flibotte S, Ewing B, Adair R, et al. The million mutation project: a new approach to genetics in Caenorhabditis elegans. Genome Res. 2013;23(10):1749–62.
Volkers RJ, Snoek LB, Hubar CJ, Coopman R, Chen W, Yang W, et al. Gene-environment and protein-degradation signatures characterize genomic and phenotypic diversity in wild Caenorhabditis elegans populations. BMC Biol. 2013;11:93.
Panousis NI, Gutierrez-Arcelus M, Dermitzakis ET, Lappalainen T. Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies. Genome Biol. 2014;15(9):467.
Chen K, van Nimwegen E, Rajewsky N, Siegal ML. Correlating gene expression variation with cis-regulatory polymorphism in Saccharomyces cerevisiae. Genome Biol Evol. 2010;2:697–707.
Terpstra IR, Snoek LB, Keurentjes JJ, Peeters AJ, van den Ackerveken G. Regulatory network identification by genetical genomics: signaling downstream of the Arabidopsis receptor-like kinase ERECTA. Plant Physiol. 2010;154(3):1067–78.
Smith EN, Kruglyak L. Gene-environment interaction in yeast gene expression. PLoS Biol. 2008;6(4):e83.
Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet. 2012;44(10):1084–9.
Stranger BE, Montgomery SB, Dimas AS, Parts L, Stegle O, Ingle CE, et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8(4):e1002639.
Gutteling EW, Doroszuk A, Riksen JA, Prokop Z, Reszka J, Kammenga JE. Environmental influence on the genetic correlations between life-history traits in Caenorhabditis elegans. Heredity. 2007;98(4):206–13.
Gutteling EW, Riksen JA, Bakker J, Kammenga JE. Mapping phenotypic plasticity and genotype-environment interactions affecting life-history traits in Caenorhabditis elegans. Heredity. 2007;98(1):28–37.
Kammenga JE, Doroszuk A, Riksen JA, Hazendonk E, Spiridon L, Petrescu AJ, et al. A Caenorhabditis elegans wild type defies the temperature-size rule owing to a single nucleotide polymorphism in tra-3. PLoS Genet. 2007;3(3):e34.
Duveau F, Felix MA. Role of pleiotropy in the evolution of a cryptic developmental variation in Caenorhabditis elegans. PLoS Biol. 2012;10(1):e1001230.
Sterken MG, Snoek LB, Kammenga JE, Andersen EC. The laboratory domestication of Caenorhabditis elegans. Trends Genet. 2015;31(5):224–31.
Andersen EC, Bloom JS, Gerke JP, Kruglyak L. A variant in the neuropeptide receptor npr-1 is a major determinant of Caenorhabditis elegans growth and physiology. PLoS Genet. 2014;10(2):e1004156.
Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, et al. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet. 2003;35(1):57–64.
Brem RB, Yvert G, Clinton R, Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science. 2002;296(5568):752–5.
West MA, Kim K, Kliebenstein DJ, van Leeuwen H, Michelmore RW, Doerge RW, et al. Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in Arabidopsis. Genetics. 2007;175(3):1441–50.
The authors are grateful for the discussions and support from the GRAPPLE project partners: Olga Valba, Sreenivas Chavali, Benjamin Lang, Mirko Francesconi, Sergei Nechaev, Olga Vasieva, M. Madan Babu, and Ben Lehner. We are also grateful for Katharina Jovic, Lisa van Sluijs and two anonymous reviewers for feedback on the manuscript. We also want to thank Harm Nijveen for making our data available in EleQTL.
LBS was funded by ERASysbio-plus ZonMW project GRAPPLE - Iterative modelling of gene regulatory interactions underlying stress, disease and ageing in C. elegans (project 90.201.066) and The Netherlands Organization for Scientific Research (project no. 823.01.001). The funding bodies had no role in the design nor the collection, analysis, and interpretation of the data, nor the writing of the manuscript.
Availability of data and materials
All strains used can be requested from the authors. The transcriptome datasets generated and the mapped eQTL profiles can be interactively accessed via (http://www.bioinformatics.nl/EleQTL). Moreover, the transcriptome datasets are also deposited at ArrayExpress (E-MTAB-5779).
Ethics approval and consent to participate
Ethics approval and consent does not apply because this study is based on non-regulated invertebrates.
Consent for publication
The authors declare this not relevant.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Strains and genotypes. Matrix with the strain names and genotypes of the recombinant inbred lines used in this study. The genotypes are based on genome sequencing (see Methods). (XLSX 160 kb)
A figure of the location of the 729 markers. The marker locations are plotted across the genome. Locations are based on WS256. (PDF 20 kb)
A figure of the marker correlation analysis. Correlations between the 729 markers in the sequenced RIL population. The markers are plotted at their physical locations across the chromosomes. (PDF 1279 kb)
Figure of the treatment comparisons. Volcano plots of the expression comparisons per treatment (n = 48 RILs per treatment). The horizontal line in the plots indicates the FDR = 0.05 threshold. The colored dots indicate spots that are significantly different between treatments. Blue indicates spots more highly expressed in the control treatment, red indicates spots more highly expressed in the heat-stress treatment, and green indicates spots more highly expressed in the recovery treatment. (A). The comparison between control and heat stress, threshold: -log10(p) ≥ 2.87. (B). The comparison between control and recovery, threshold: -log10(p) ≥ 3.09. (C). The comparison between heat stress and recovery, threshold: -log10(p) ≥ 3.02. (TIFF 1329 kb)
Table of treatment comparison results. A table with the spots that are significantly different between treatments. The spot number, the comparison in which the spot was different, and the characteristics of the difference (significance and effect) are given. Also information about the gene represented by the spot is shown (WormBase identifier, sequence name, public name, and the location on the genome). The last column indicates to which group the spot belongs (e.g. specific for heat-stress treatment, or significantly different between all three treatments). (XLSX 2485 kb)
Figure of the principal component analysis. The first six axes of the principal component analysis are shown, plotted against each other. The axis capturing most variation (PC1), captures 20% and the sixth axis captures 4.7%. The dots represent individual samples (n = 48 RILs per treatment) and are colored according to treatment: blue for control, red for heat stress, and green for recovery. The first axis separates the heat-stress treatment from the control and recovery treatment, but also captures some technical variation (19 samples that fall to the right of the plot). The second axis (11.4%) places the recovery treatment in between the other two treatments, whereas the third axis (10.1%) separates the heat stress from the control and recovery treatment. (TIFF 3473 kb)
A list of the enrichment analysis treatment responsive genes. Enrichment analysis on the genes with transcriptional responses by heat stress. The database used for enrichment (Annotation) and the category (Group), and the number of genes on the array that are in the group (Genes_in_group) are also indicated. Furthermore, the overlap with the cluster (Overlap) and the Bonferroni-corrected significance of that overlap are shown. (XLSX 59 kb)
Developmental ruler applied to the RIL populations. The age of the samples was estimated using the transcriptional ruler from , the reported ages are relative, where the average age of the control samples were set to 48 h. Each point represents one sample, whereof the relative age was determined by assessing the expression of about 100 genes. (TIFF 969 kb)
Table summarizing the statistical power calculations. Outcome of the statistical power calculations conducted for the RIL population of each treatment (n = 48 RILs per treatment). The outcomes are ordered per treatment population, and per simulated QTL peak size. All peaks were simulated in random variation generated by a standard normal distribution. The simulation reports on (i) QTL detection, e.g. how many of the simulated QTL were detected, how many false QTL were reported; (ii) QTL effect size estimation, mapped effect/simulated effect (reported in quantiles); (iii) QTL location estimation, |mapped location – simulated location|. (XLSX 15 kb)
Table of the mapped eQTL in the control, heat stress, and recovery treatment. The eQTL are given per trait (Spot) and treatment. The QTL type, location and confidence interval is listed, as is the significance and effect. The effect is higher in N2 (positive numbers) or higher in CB4856 (negative numbers) loci. Furthermore, information about the affected gene represented by the microarray spot is shown (name, and location). (XLSX 1110 kb)
Figure of eQTL effect distribution. (A) Volcano plots of the eQTL mapped per type (cis or trans) per treatment. On the x-axis the effect is plotted and on the y-axis the significance of the association is plotted (−log10(p)). Each dot represents a microarray spot and only the significant associations are shown (FDR ≤ 0.05, −log10(p) > 3.9 in all three treatments). (B) A histogram of the eQTL effect sizes, per type (cis or trans) per treatment. Again, the number of significantly associated spots are counted. (TIFF 738 kb)
Table listing polymorphisms in eQTL. The polymorphism between and CB4856 and N2 were taken from Thompson et al., 2015 and counted in the genes with cis-, trans-, or no eQTL. Furthermore, the cis- and trans-eQTL have also been compared based on effect (higher expression in CB4856 or N2). The occurrences of polymorphic genes in the three sets were compared by a chi-squared test. (XLSX 18 kb)
A list of trans-bands. This table lists the identified trans-bands per treatment. The number of affected genes and the number of spots with a trans-eQTL these genes were represented by are shown. (XLSX 8 kb)
Figure comparing allelic effects of genes with an eQTL between treatments. A scatter plot of the effects of the eQTL of genes with an eQTL found in multiple treatments. Each dot represents a spot. The Pearson correlation values between the different comparisons are: R = 0.94 and 0.91 for cis- and trans-eQTL in control versus heat stress, R = 0.96 and 0.93 for cis- and trans-eQTL in control versus recovery, and R = 0.94 and 0.89 for cis- and trans-eQTL in heat stress versus recovery. The striped diagonal lines are shown as an optical reference. (TIFF 454 kb)
Text detailing the calculations on overlap in cis- and trans-eQTL. (DOCX 23 kb)
Figure comparing genes with a trans-eQTL in different treatments. (A). A plot of the eQTL location in treatment 1 versus the eQTL-location in treatment 2. Treatment 1 is the first treatment listed in the legend, for example: the magenta dots represent eQTL where the first treatment is control and the second treatment is heat stress. The grey lines represent the confidence interval of the eQTL based on a 1.5 drop in –log10(p). The diagonal band in this plot represents trans-eQTL that are regulated from the same location across treatments. (B) The eQTL effects of the trans-eQTL shown in (A), ordered per chromosome. The striped diagonal lines are shown as an optical reference. (TIFF 1633 kb)
A list of the enrichment analysis of genes with an eQTL. Enrichment analysis on the genes with eQTL across different conditions. The sets were also analyzed for genes with an eQTL with a specific allelic effect, belonging to a specific trans-band and found, and belonging to a specific treatment. The allelic effects are indicated based on direction (e.g. cis-CB4856 means cis-eQTL higher expressed in CB4856). The database used for enrichment (Annotation) and the category (Group), and the number of genes on the array that are in the group (Genes_in_group) are indicated. Furthermore, the overlap with the cluster (Overlap) and the Bonferroni-corrected significance of that overlap are shown. (XLSX 24 kb)
About this article
Cite this article
Snoek, B.L., Sterken, M.G., Bevers, R.P.J. et al. Contribution of trans regulatory eQTL to cryptic genetic variation in C. elegans . BMC Genomics 18, 500 (2017) doi:10.1186/s12864-017-3899-8
- Caenorhabditis elegans
- Genetical genomics
- Heat stress
- Cryptic genetic variation
- eQTL hotspot