Expression variation in connected recombinant populations of Arabidopsis thaliana highlights distinct transcriptome architectures
- Francisco A Cubillos†1,
- Jennifer Yansouni†1, 2,
- Hamid Khalili1, 3,
- Sandrine Balzergue2,
- Samira Elftieh2,
- Marie-Laure Martin-Magniette2, 4,
- Yann Serrand1,
- Loïc Lepiniec1,
- Sébastien Baud1,
- Bertrand Dubreucq1,
- Jean-Pierre Renou2, 5,
- Christine Camilleri1Email author and
- Olivier Loudet1Email author
© Cubillos et al; licensee BioMed Central Ltd. 2012
Received: 13 November 2011
Accepted: 27 March 2012
Published: 27 March 2012
Expression traits can vary quantitatively between individuals and have a complex inheritance. Identification of the genetics underlying transcript variation can help in the understanding of phenotypic variation due to genetic factors regulating transcript abundance and shed light into divergence patterns. So far, only a limited number of studies have addressed this subject in Arabidopsis, with contrasting results due to dissimilar statistical power. Here, we present the transcriptome architecture in leaf tissue of two RIL sets obtained from a connected-cross design involving 3 commonly used accessions. We also present the transcriptome architecture observed in developing seeds of a third independent cross.
The utilisation of the novel R/eqtl package (which goal is to automatize and extend functions from the R/qtl package) allowed us to map 4,290 and 6,534 eQTLs in the Cvi-0 × Col-0 and Bur-0 × Col-0 recombinant populations respectively. In agreement with previous studies, we observed a larger phenotypic variance explained by eQTLs in linkage with the controlled gene (potentially cis-acting), compared to distant loci (acting necessarily indirectly or in trans). Distant eQTLs hotspots were essentially not conserved between crosses, but instead, cross-specific. Accounting for confounding factors using a probabilistic approach (VBQTL) increased the mapping resolution and the number of significant associations. Moreover, using local eQTLs obtained from this approach, we detected evidence for a directional allelic effect in genes with related function, where significantly more eQTLs than expected by chance were up-regulated from one of the accessions. Primary experimental data, analysis parameters, eQTL results and visualisation of LOD score curves presented here are stored and accessible through the QTLstore service database http://qtlstore.versailles.inra.fr/.
Our results demonstrate the extensive diversity and moderately conserved eQTL landscape between crosses and validate the utilisation of expression traits to explore for candidates behind phenotypic variation among accessions. Furthermore, this stresses the need for a wider spectrum of diversity to fully understand expression trait variation within a species.
KeywordseQTL Natural variation Selection RILs
Most traits vary quantitatively between individuals and are significantly influenced by genetic variation and its interaction with the environment. In general, transcript abundance of a gene can be considered as a quantitative trait since it differs between individuals with respect to genetic factors [1, 2]. The utilisation of technologies such as microarray and, lately, next generation sequencing, allowed the simultaneous quantification of thousands of transcripts, shedding light into the genetics behind gene expression variation [3–5]. Comparison in Arabidopsis thaliana showed that at least 46% of the genes in the genome are differentially expressed between a pair of accessions . Variation in gene expression levels is highly heritable and specific genomic regions can be mapped, underpinning transcript abundance variation [1, 3] and underlying phenotypic diversity, causing for example differential disease susceptibility in humans  or variation in metabolite accumulation in plants ... Previous studies in experimental segregating populations have proved the polygenic nature of gene expression variation among lines and mapped thousands of expression quantitative trait loci (eQTLs) in different model organisms, e.g.: Arabidopsis thaliana[3, 9–12], Saccharomyces cerevisiae[13, 14], and Caenorhabditis elegans. No less important is the interaction between genotypes and the environment . For instance, a third of the budding yeast genes showed a significant strain x condition interaction effect, demonstrating the complexity of the control of transcript abundance.
Transcript level differences can originate from cis- and/or trans-regulatory changes. Cis- acting regulations represent polymorphisms in physical linkage to the gene, affecting the transcription in an allele-specific manner. Consequently, cis-eQTLs are usually detected as local-eQTLs, being mapped in the vicinity of the gene; whereas trans-acting regulations act distantly in a non allele-specific way and can be located anywhere in the genome, most likely detected as distant-eQTL [1, 11, 17]. Although linkage or association mapping approaches allow the identification of local- and distant-eQTLs, cis and trans effects should be directly assessed by allele-specific expression assays . In yeast, most expression differences mapped as major trans-acting loci, where few modifiers regulate the expression of hundreds of genes [1, 14]. Contrasting with these results, experiments in C. elegans revealed an opposite pattern, with an over-representation of local-eQTLs compared to distant regulators . Only few similar studies have addressed this subject in A. thaliana. Using microarray technology in recombinant inbred lines (RILs) from a cross between accessions Bayreuth (Bay) and Shahdara (Sha), more than 36,000 eQTLs were described using whole rosettes . In contrast, a parallel study using Landsberg (Ler) and Cape Verde Island (Cvi) accessions as RIL progenitors, detected 9 times less eQTLs than those mapped in the Bay x Sha population . Discrepancies between the two studies were also evident in the ratio between local- and distant-eQTLs, likely due to differences in statistical power and significance thresholds . Different growth conditions, stages and the lack of similar analytical strategies in these studies impairs fine comparisons between crosses, stressing the need for a systematic approach to better understand the role of natural variation in expression traits among Arabidopsis accessions.
The identification of thousands of local eQTLs allows testing for marks of concerted evolution in transcript abundance from genes with related function . Several studies have addressed this issue and demonstrated the presence of signatures of selection associated with expression traits in a number of systems [5, 20]. For example, several pathways were identified in a yeast hybrid between two Saccharomyces species exhibiting polygenic directional divergence in cis-regulatory variants . Similarly, evidence for selection in expression traits involved in processes such as growth, locomotion and memory was detected in two subspecies of Mus musculus. Thus far, evidence for a directional allelic effect between Arabidopsis accessions is scarce. Transcriptome analyses for 18 natural accessions determined an overrepresentation of differentially expressed genes within GO categories, involving response to the biotic environment, among others .
In order to identify genomic regions underlying transcript variation in A. thaliana and signatures of selection between genes with related function, we performed eQTL mapping in two well-established recombinant populations. The analysis was essentially based on a connected-cross design between three accessions : the reference Columbia (Col-0), Cape Verde Island (Cvi-0) and Burren (Bur-0). Expression level variation was monitored in particular for more than 26,166 annotated nuclear genes in 314 RILs and eQTLs were mapped using a novel R/eqtl package, accurately designed for such a study. Here, we show that the eQTL landscape is moderately conserved across populations and many of the eQTLs identified are cross-specific. Moreover, we show that accounting for confounding sources in one of the crosses increases the sensitivity and the power to detect eQTLs, allowing the identification of a potentially coordinated transcriptional evolution in genes with related functions.
eQTL analyses were performed in three displays: two on young rosette transcripts in Cvi-0 × Col-0 (hereafter 'CviCol') and Bur-0 × Col-0 (hereafter 'BurCol') RIL sets, and one on developing seed transcripts in Bay-0 × Shahdara (hereafter 'BaySha') RIL set. We will henceforth focus on and compare the results obtained for rosette samples, while the BaySha data is briefly presented as Additional file 1.
Transcript abundance can be treated as a highly heritable phenotype affected by natural genetic diversity . In order to understand the genetics underlying gene expression variation among A. thaliana accessions, we utilised the CATMA array technology , which has the advantage of not being sensitive to punctual polymorphisms (SNPs) between accessions' transcripts thanks to the length of the probes used (Gene-specific Sequence Tags: GSTs). We measured expression levels for almost 35,000 traits (= GSTs) in three week-old rosettes of 314 RILs obtained from a connected-cross set between three parental accessions (Col-0 as the common male parent, Cvi-0, Bur-0 ), including 158 RILs from CviCol and 156 RILs from BurCol. Initially, in order to characterise expression divergence between the accessions selected in this study, the number of differentially expressed genes between pairs of accessions was estimated (Cvi versus Col and Bur versus Col). A total of 1,709 differentially-expressed genes between Col-0 and Cvi-0 and 1,083 between Col-0 and Bur-0 (P < 5 × 10-2; Additional file 2: Table S1) were detected. This cut-off value represents, at least, 1.3 fold expression differences between the two allelic variants. From this differentially expressed set, none of the parents showed a significant excess of up- or down-regulated genes (49.0% and 54.3% of the probes revealed up-regulation in Cvi and in Bur, respectively, relative to Col). About half of the genes (48.3%) showing significant expression contrast between Col-0 and Bur-0 were also found when comparing the other pair (Additional file 3: Figure S1). Interestingly, 92.3% of this overlap was similarly up- or down-regulated in Col-0 relative to the other two accessions, suggesting that many of the polymorphisms contributing to these differential expression patterns may arise from Col-0 or be shared by Cvi-0 and Bur-0.
Local and distant eQTL distributions are highly divergent between crosses
Linkage analysis allows the examination of local and distant eQTLs influencing transcript abundance [1, 14]. To detect local associations, we set a 1Mb cut-off distance between the eQTL peak and the CATMA GST physical position. Any eQTL reaching a significance peak within 1Mb from the location of the probe it controls was considered as potentially acting in cis and hence classified as a local eQTL. In both populations we observed that the distributions of local and distant eQTLs were not uniform throughout the LOD scale (Figure 1c-d). Distant eQTLs were enriched at lower LODs, compared to local eQTLs, which were over-represented at high significance values. Hence, 50% (25.5%) of all eQTLs detected at 5% FDR mapped locally in the CviCol (BurCol) population, while this number would rise to 69.6% (53.6%) if using a much more conservative FDR of 0.1% (P < 1 × 10-4 in CviCol and P < 5 × 10-5 in BurCol). Moreover, compared to distant eQTLs, local eQTLs explained a significantly higher fraction of the phenotypic variance per trait (Kolmogorov-Smirnov test, P < 2.2 × 10-16; Additional file 4: Table S2). These results can lead to contrasting conclusions as we vary threshold values (Figure 1c-d). The higher phenotypic variance explained by local eQTLs underlies the fact that most of the phenotypic variation in expression levels is due to local associations, and trans-acting factors would have essentially minor effects on single transcripts, but overall a widespread effect over many genes.
Distant eQTL hotspots are cross-specific
We found little overlap between hotspots in the two populations (Spearman correlation test P = 0.21). From the 41 hotspots detected in total, only three were collinear between sets, suggesting little conservation in the major determinants of distant eQTLs across the genome. None of the hotspots co-localised with any of the major-effect developmental QTLs known to segregate in A. thaliana (i.e. ERECTA, FRIGIDA, CRY2, ...), except in BurCol where a marginally significant hotspot was found on chromosome 5 (bin #93), which contains FLC. Although 62 traits with a distant eQTL mapping to this region were observed, only 2 have been shown to interact with FLC, suggesting potentially a minor contribution from this locus. The gene-poor centromeres rich in transposons and pseudogenes were not detected in any of the hotspot intervals. The lack of a significant count of eQTLs extended for at least 1Mb from the centromere until the nearest hotspot, except in the BurCol set where we detected distant eQTLs-rich regions immediately downstream and upstream the centromeres on chromosome 1 and 2, respectively (Figure 3).
Controlling for confounding factors increases eQTL detection in the CviCol dataset
We focused the utilisation of this approach on the CviCol population. Linkage analysis detected a total of 8,156 eQTLs (5% FDR; 7,817 at 1% FDR; Additional file 7: Table S5), which is almost twice as many eQTLs as with the standard method, demonstrating the sensitivity of the approach. Interestingly, the fraction of local and distant eQTLs remained almost unchanged. At the less conservative threshold (5% FDR) we mapped 44.6% of the eQTLs in the vicinity of the transcript, slightly less than from the analysis of the raw expression values. Moreover, the contribution to the total number of local and distant eQTLs was increased by 1,461 and 1,025 eQTLs respectively. Many of the local associations mapped by the standard approach were also mapped by VBQTL (92.6%) and, contrarily to the BurCol data set, 56.2% of the distant associations were also retained (Figure 4). Approximately one third of the peaks that had remained only marginally significant (= significant at a lower threshold of 10% FDR) in the standard analysis, were now detected as significant at 5% FDR after VBQTL analysis, likely due to the negative effect of confounding sources on power in the original approach.
Evidence for directional allelic effect in Col-0 versus Cvi-0 accessions
Overrepresented GO categories and terms among CviCol local-eQTLs
Expected Number of eQTLs
Observed Number of eQTLs
cell organization and biogenesis
DNA or RNA metabolism
electron transport or energy pathways
other biological processes
other cellular processes
other metabolic processes
response to abiotic or biotic stimulus
7.70 × 10-11 (S*)
response to stress
2.19 × 10-11 (S*)
unknown biological processes
Directional allelic effect in CviCol
Col up-regulated alleles
Cvi up-regulated alleles
P-value (directional allelic effect)
response to far-red light
plant-type hypersensitive response
response to fungus
response to oxidative stress
response to hypoxia
We have performed quantitative genetic analyses for genome-wide expression traits in a connected-cross set between three accessions in the model organism A. thaliana. Utilising the R/eqtl package in both crosses, we observed a 52.3% difference in the number of significant eQTLs between sets, with the Bur-0 × Col-0 cross being the one with the highest count (Figure 1). Regardless of this difference, the conservative figures of eQTLs obtained in both sets resemble the one previously described in RILs between accessions Ler-0 and Cvi-0 , but they significantly differ from a previous study involving Bay-0 and Shahdara accessions, where more than 36,000 eQTLs were detected . In our study, the use of a common approach for both sets allowed us to perform straightforward comparisons between them, drawing conclusions not necessarily affected by differences in statistical power . We observed that 17.4% of all local eQTLs detected were shared between crosses, likely due to polymorphisms solely present in Col-0 or shared by Cvi-0 and Bur-0 (although independent SNPs or some allelic heterogeneity pattern could also lead to this observation). An argument to support this hypothesis is the high level of shared eQTLs with an additive effect in the same direction (88.4%) and of similar strength (Figure 2). However, the low level of overlapping eQTLs demonstrates the greater number of cross-specific eQTLs, possibly due to the great complexity of expression traits [1, 11, 29] and their high potential for evolution . Our results are consistent with a recent survey in several Arabidopsis accessions that identified high allelic heterogeneity within local regulatory eQTLs [6, 12], suggesting the existence of many private polymorphisms between all three accessions extending the eQTL landscape.
Although cis-acting regulation among local eQTLs remains to be confirmed, the co-localisation of the eQTL and the controlled gene provides a robust probabilistic way of classifying eQTLs into their mode of action. We observed that, albeit the two crosses shared a common accession, the eQTL distribution among these types is very dissimilar. The BurCol set shows three times more distant than local eQTLs. At the same FDR, the CviCol set has a very different balance, with an equivalent number of both types of eQTLs (50% distant and 50% local, Figure 1c) and a lower number of hotspots (Figure 3) encompassing 44.2% of the trans eQTLs. Nevertheless, the distribution pattern of local eQTLs on each chromosome was conserved between crosses, likely due to a strong correlation with gene density and chromosome structural features. Also, we found that local eQTLs were overrepresented at higher significance in all crosses (Figure 1c-d), explaining a greater phenotypic variance per trait compared to distant eQTLs. Our result agrees with another study in Mouse, where many highly significant local-eQTLs and moderately significant trans-acting associations were observed . Furthermore, the BurCol recombinant population exhibited a higher number of distant eQTLs, concentrating 75% of them in 24 bins classified as hotspots (Figure 3). One of the most interesting intervals was the hotspot at bin #8, which contained the highest number of genes whose function is related to the regulation of transcription (52 genes), including the strong candidate GIGANTEA, a gene known to be involved in diverse developmental processes . It is difficult to guess whether a hotspot could be the expression of a locus actually controlling the transcription of many genes, or that of a major (for example, developmental) player that has indirect effects on many genes' expression (as secondary consequences of a strong phenotype for example). The presence of pleiotropic hotspots affecting the expression of many transcripts has already been described in yeast , where no more than 200 trans-eQTLs explained transcript variation for 1,716 traits. Similarly, many trans-hotspots were described in previous A. thaliana studies [3, 10]. In agreement with these studies, the hotspots detected here explained approximately 10% of the individual traits' variation, demonstrating their milder effect on transcript abundance compared to local eQTLs . Interestingly, only ~7% of the hotspots overlapped between crosses, suggesting either alternative master regulators or little conservation in complex regulatory networks . These conserved regions could also suggest the presence of polymorphisms in Col-0 as major contributors for the large number of trans eQTLs within these shared regions.
To increase the power in detecting eQTLs and identify milder genetic variations, we have utilised the bayesian framework VBQTL, a model designed to dissect gene expression variation in order to account for confounding factors . The implementation of this model in the CviCol set allowed us to detect twice as many eQTLs compared to the standard approach (Figure 4), without affecting the ratio of local versus distant eQTLs. Subsequently, we used this extended set and focused the posterior analyses on local eQTLs since they likely represent independent alleles, only affecting a single transcript and explaining a greater phenotypic variance, in contrast to distant eQTLs that may affect many genes and show minor phenotypic effects. The use of this new set of local eQTLs allowed us to look for potential traces of selection where Col alleles were overexpressed in a set of genes with related function. Assuming neutrality, no over-representation of either up-regulated or down-regulated alleles from the same accession is expected within a cluster of genes, unless a directional allelic drift has occurred, which may represent selection . We detected an overrepresentation of eQTLs in several GO functional categories and a significant skew within GO terms (Table 1, 2). The accumulation of local regulatory polymorphisms up-regulating Col alleles in all cases, suggests dissimilar patterns of responses to the environment and an advantage in a particular niche  or, else, a specific lack of cost in loosing this response. For example, one of the clusters of genes globally up-regulated in Col with respect to Cvi relates to hypoxia response, which is a major determinant of submergence tolerance. It was indeed recently shown that Col and Cvi contrasts for this trait, with Cvi being one of the least submergence-tolerant accessions . Interestingly, the entire set of significant categories here described are related to response to stress or biotic and abiotic response, highlighting its potential link with environmental and niche-adaptation changes.
Our results demonstrate the importance of extending the number of mapping populations in order to understand the eQTL landscape within a species. We found that the distribution of local and distant eQTLs is moderately conserved between recombinant populations, demonstrating the complex inheritance of transcript abundance. Moreover, the identification of candidate pathways for signatures of selection is a significant step towards understanding accession diversification and their adaptation to the environment. Further studies using novel technologies, such as next generation sequencing (RNA-seq), association mapping, or the combination of both, will help elucidating and validating transcript abundance divergence with a greater resolution within A. thaliana species.
Plant material, sample preparation and microarray hybridization
We used 158 RILs from the core-population of the Cvi-0 × Col-0 set and 156 RILs from the core-population of the Bur-0 × Col-0 set , along with their respective parents. Plants were cultivated in a greenhouse in typical long day conditions (16h photoperiod) at 20°C and whole plants were collected above the roots 20 days after sowing, which corresponds on average to growth stage '1.08' from Boyes et al. . Total RNA was extracted using RNeasy Plant mini Kit (Qiagen kit #74904) according to the supplier's instructions. For each biological sample (RIL), total RNAs were obtained by pooling RNAs from three randomized plants within a single experiment.
Transcript abundance estimates were carried out using the bicolour CATMA microarrays version 5, containing 34,529 spots (including multiple internal controls) allowing, among others, to uniquely interrogate 26,166 nuclear genes (predicted from TAIR and EUGENE algorithms) in A. thaliana[22, 35]. For each comparison of two RILs (random pair design), one technical replicate with fluorochrome reversal was performed (i.e. four hybridisations -2 arrays- per comparison of two RILs). Labelling of cRNAs with Cy3-dUTP or Cy5-dUTP (Perkin-Elmer-NEN Life Science Products), hybridisation to the slides, and scanning procedures were performed as previously described .
Microarray data analysis
For each array, the raw data comprised the logarithm of the median feature pixel intensity at wavelengths 635 nm (red) and 532 nm (green) and no background was subtracted. An array-by-array normalisation was performed to remove systematic biases. First, spots considered as badly formed features were excluded. Then a global intensity-dependent normalisation using the loess procedure was performed to correct the dye bias . Finally, for each block, the log-ratio median calculated over the values for the entire block was subtracted from each individual log-ratio value to correct print tip effects. Differential analysis was based on the log ratios averaged on the dye-swap: the technical replicates were averaged to get one log-ratio per biological replicate and these values were used to perform a paired t-test. A trimmed variance is calculated from spots which do not display extreme variance . The raw P-values were adjusted by the Bonferroni method, which controls the Family Wise Error Rate in order to keep a strong control of the false positives in a multiple-comparison context. We considered probes as being differentially expressed after Bonferroni correction using a P < 0.05. For the eQTL analysis, normalised intensity per probe × RIL has been calculated from the normalised log-ratio by sharing the correction value between both samples . To be specific, after the normalization of each array, we get a log-ratio denoted M and a mean intensity denoted A for each probe. First, we average M and A across the two arrays taking into account the dye switching and then calculate the normalised intensity of the two co-hybridized RILs which are equal to (2A-M)/2 and (2A + M)/2. The intensity (2A-M)/2 corresponds to the RIL labelled in green on the first array of the dye-swap. Then a between-array normalisation was performed to rescale the mean intensity of each slide at an arbitrary value equal to 8.5. This second round of normalisation makes probe × RIL intensity comparable and gives us a single expression level for each probe × RIL that can be used further for the eQTL mapping strategy.
Microarray data was deposited at Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/, accession no. GSE28791 and at CATdb (http://urgv.evry.inra.fr/CATdb/; Project: GNP07_RILKIT) according to the 'Minimum Information About a Microarray Experiment' standards.
R/eqtl package and eQTL mapping
eQTL mapping was performed utilising the normalised microarray data using our R package R/eqtl http://cran.r-project.org/web/packages/eqtl/, which exploits functions from R/qtl . R/eqtl is designed specifically for the mapping of thousands of traits in parallel. It includes functions to systematically and automatically identify and select QTLs with supporting intervals from R/qtl simple genome scan results (interval mapping), defining QTL as covariates for a composite interval mapping, calculating additive effect, estimating heritability for single QTL and for QTL × QTL interactions, classifying eQTL as candidate for trans- (distant) and cis-acting (local) eQTL (according to the position of the controlled gene with respect to the eQTL support interval and/or an arbitrary physical window), projecting and plotting the QTL estimated location relative to the gene location and summarising eQTL data at the genome scale with various generic plots and tabular files. R/eqtl is a free and open-source multi-platform package developed under the statistical language R, and is available under the GPLv3 license. R/qtl and its extension R/eqtl are hosted and can be downloaded from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org.
Interval Mapping (IM) and Composite Interval Mapping (CIM) were performed on each cross for all of the 32,300 non-technical and non-repetitive probes on the CATMA array, as implemented in the R/eqtl package. The Cvi × Col and Bur × Col RIL populations have been previously genotyped for 90 and 87 markers, respectively . For each probe, IM was first applied to identify a primary set of eQTLs and then this set was used as co-factor in CIM to detect less significant QTLs. Taking into account population size and, hence, the number of observed informative recombinants that conditions the ability to distinguish two linked QTL, a 15 cM exclusionary window on each side of all QTL peaks was applied to refrain from trying to detect too closely linked eQTLs or interpreting large peaks into multiple eQTLs. The analysis was completed with the estimation of the additive effect by averaging the phenotype value for each allele at the QTL marker (or pseudomarker) and estimating the relative contribution of each allele as Xxx minus Col (so that a negative allelic effect indicates that Col is up-regulated with respect to Xxx). We also estimated the proportion of the phenotypic variation explained by the segregation of each individual eQTL or significant eQTL × eQTL interaction (R2) by analysis of variance. QTLs were defined using a LOD significance threshold computed by permuting the phenotypes while maintaining the genotype across the RIL set  and a 1.5 LOD drop-support interval . In order to obtain a genome-wide threshold we determined the 95-percentile permutation threshold among 500 randomly chosen traits and estimated the 95% upper-bound from the traits distribution. We called an eQTL significant if the LOD score was above the threshold (genome-wide False Discovery Rate, FDR = 0.05)[3, 16]. To correct for multiple-traits testing, we estimated the minimum P-value from each randomly chosen trait and estimated the corresponding q-value . P-values for transcriptome-wide FDR 5% (P = 6 × 10-3 in CviCol and P = 9 × 10-3 in Burcol) and 1% (P = 1 × 10-3 in CviCol and BurCol) were also used as significance threshold.
All results are recorded in QTLstore, a comprehensive web service which allows to store, manage, explore and cross QTL experiment results from the single-trait scan to genome-wide expression QTL experiments. QTLstore is hosted at http://qtlstore.versailles.inra.fr/ and is composed of an extensive SQL database and a web interface, which are available under the GPLv2 license. The SQL database is convenient for all kinds of QTL experiments and analysis, by being able to store all the experiments primary data separately from all subsequent analyses results. It typically stores and interrogates QTL parameters such as location, significance, effect and type, among others.
Local and distant eQTL distribution
Local and distant eQTLs were classified in 1Mb bins as in . Briefly, we divided the genome into 116 physical bins (independent of the number of markers on each population) of 1Mb each (the chromosome ends were included into the previous 1Mb bin if the remaining region was smaller than 500 Kb). For distant eQTLs we performed a permutation test as previously described . Bins containing a higher number than the maximum expected by chance (α = 0.05) were considered as trans-hotspots. For local eQTLs, we performed a regression analysis against SNP (for Cvi-0 and Bur-0, SNP data was downloaded from the 1001genomes project website http://1001genomes.org/) and gene densities as described in . Significance was estimated using a one-way ANOVA. Significance for GO enrichment was assessed utilising a P < 2 × 10-5 after Bonferroni correction.
VBQTL and test for directional allelic effect
We applied the VBQTL (Variational Bayesian QTL Mapper) approach as previously described  on the dataset obtained in the Cvi-0 × Col-0 population, using one known factor to model hidden confounding sources. Residuals of the estimated effect were used as phenotypic values and eQTLs were detected again using R/eqtl and the statistical procedure implemented for the standard approach (P < 1 × 10-2, FDR 5%). We tested for 5, 10, 20 and 30 factors and chose the model with the highest number of additional linkages (FDR 5%). Next, potential directional selection was tested as previously described with modifications . Briefly, only local eQTLs were retained after the VBQTL analysis and tabulated based on their gene ontology (GO) classification. Initially, we selected functional categories with a significant over-representation of eQTLs within the 'Biological process' set. For this purpose, we assessed significance utilising the hypergeometric test (P < 8 × 10-4 after Bonferroni) considering the number of genes within each category at the whole-genome level. Within the selected functional categories, we focused the subsequent analysis on GO categories containing more than 20 genes. For each category we tested departure from the 1.6:1 ratio (Col:Cvi) using a hypergeometric test.
This work was supported by ANR grant RILKIT/07-GPLA-07003G and ANR grant ARABIDOSEED/GPLA-007-01. FAC was supported by funding from the European Commission Framework Programme 7, ERC Starting Grant DECODE/ERC-2009-StG-243359 to OL. We thank Michel Caboche and David Bouchez for their support with the 'Arabidoseed' grant and discussion of the results and Leopold Parts for the implementation of VBQTL.
- Brem RB, Yvert G, Clinton R, Kruglyak L: Genetic dissection of transcriptional regulation in budding yeast. Science. 2002, 296 (5568): 752-755. 10.1126/science.1069516.View ArticlePubMedGoogle Scholar
- Cubillos FA, Coustham V, Loudet O: Lessons from eQTL mapping studies: non-coding regions and their role behind natural phenotypic variation in plants. Curr Opin Plant Biol. 2012,Google Scholar
- West MA, Kim K, Kliebenstein DJ, van Leeuwen H, Michelmore RW, Doerge RW, St Clair DA: Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in Arabidopsis. Genetics. 2007, 175 (3): 1441-1450.PubMed CentralView ArticlePubMedGoogle Scholar
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464 (7289): 768-772. 10.1038/nature08872.PubMed CentralView ArticlePubMedGoogle Scholar
- Bullard JH, Mostovoy Y, Dudoit S, Brem RB: Polygenic and directional regulatory evolution across pathways in Saccharomyces. Proc Natl Acad Sci USA. 2010, 107 (11): 5058-5063. 10.1073/pnas.0912959107.PubMed CentralView ArticlePubMedGoogle Scholar
- Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT: Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature. 2011, 477 (7365): 419-423. 10.1038/nature10414.View ArticlePubMedGoogle Scholar
- Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ, Zhang C, Lamb J, Edwards S, Sieberts SK: Variations in DNA elucidate molecular networks that cause disease. Nature. 2008, 452 (7186): 429-435. 10.1038/nature06757.PubMed CentralView ArticlePubMedGoogle Scholar
- Wentzell AM, Rowe HC, Hansen BG, Ticconi C, Halkier BA, Kliebenstein DJ: Linking metabolic QTLs with network and cis-eQTLs controlling biosynthetic pathways. PLoS Genet. 2007, 3 (9): 1687-1701.View ArticlePubMedGoogle Scholar
- DeCook R, Lall S, Nettleton D, Howell SH: Genetic regulation of gene expression during shoot development in Arabidopsis. Genetics. 2006, 172 (2): 1155-1164.PubMed CentralView ArticlePubMedGoogle Scholar
- Keurentjes JJ, Fu J, Terpstra IR, Garcia JM, van den Ackerveken G, Snoek LB, Peeters AJ, Vreugdenhil D, Koornneef M, Jansen RC: Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Natl Acad Sci USA. 2007, 104 (5): 1708-1713. 10.1073/pnas.0610429104.PubMed CentralView ArticlePubMedGoogle Scholar
- Kliebenstein D: Quantitative genomics: analyzing intraspecific variation using global gene expression polymorphisms or eQTLs. Annu Rev Plant Biol. 2009, 60: 93-114. 10.1146/annurev.arplant.043008.092114.View ArticlePubMedGoogle Scholar
- Zhang X, Cal AJ, Borevitz JO: Genetic architecture of regulatory variation in Arabidopsis thaliana. Genome Res. 2011, 21 (5): 725-733. 10.1101/gr.115337.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Brem RB, Storey JD, Whittle J, Kruglyak L: Genetic interactions between polymorphisms that affect gene expression in yeast. Nature. 2005, 436 (7051): 701-703. 10.1038/nature03865.PubMed CentralView ArticlePubMedGoogle Scholar
- Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L: Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet. 2003, 35 (1): 57-64.View ArticlePubMedGoogle Scholar
- Rockman MV, Skrovanek SS, Kruglyak L: Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science. 2010, 330 (6002): 372-376. 10.1126/science.1194208.PubMed CentralView ArticlePubMedGoogle Scholar
- Smith EN, Kruglyak L: Gene-environment interaction in yeast gene expression. PLoS Biol. 2008, 6 (4): e83-10.1371/journal.pbio.0060083.PubMed CentralView ArticlePubMedGoogle Scholar
- Wittkopp PJ, Haerum BK, Clark AG: Evolutionary changes in cis and trans gene regulation. Nature. 2004, 430 (6995): 85-88. 10.1038/nature02698.View ArticlePubMedGoogle Scholar
- Ronald J, Akey JM: The evolution of gene expression QTL in Saccharomyces cerevisiae. PLoS ONE. 2007, 2 (7): e678-PubMed CentralView ArticlePubMedGoogle Scholar
- Wray GA: The evolutionary significance of cis-regulatory mutations. Nat Rev Genet. 2007, 8 (3): 206-216.View ArticlePubMedGoogle Scholar
- Fraser HB, Babak T, Tsang J, Zhou Y, Zhang B, Mehrabian M, Schadt EE: Systematic detection of polygenic Cis-regulatory evolution. PLoS Genet. 2011, 7 (3): e1002023-10.1371/journal.pgen.1002023.PubMed CentralView ArticlePubMedGoogle Scholar
- Simon M, Loudet O, Durand S, Bérard A, Brunel D, Sennesal F-X, Durand-Tardif M, Pelletier G, Camilleri C: QTL mapping in five new large RIL populations of Arabidopsis thaliana genotyped with consensus SNP markers. Genetics. 2008, 178: 2253-2264. 10.1534/genetics.107.083899.PubMed CentralView ArticlePubMedGoogle Scholar
- Hilson P, Allemeersch J, Altmann T, Aubourg S, Avon A, Beynon J, Bhalerao RP, Bitton F, Caboche M, Cannoot B: Versatile gene-specific sequence tags for Arabidopsis functional genomics: transcript profiling and reverse genetics applications. Genome Res. 2004, 14 (10B): 2176-2189. 10.1101/gr.2544504.PubMed CentralView ArticlePubMedGoogle Scholar
- Deng W, Ying H, Helliwell CA, Taylor JM, Peacock WJ, Dennis ES: FLOWERING LOCUS C (FLC) regulates development pathways throughout the life cycle of Arabidopsis. Proc Natl Acad Sci USA. 2011, 108 (16): 6680-6685. 10.1073/pnas.1103175108.PubMed CentralView ArticlePubMedGoogle Scholar
- Stegle O, Parts L, Durbin R, Winn J: A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol. 2010, 6 (5): e1000770-10.1371/journal.pcbi.1000770.PubMed CentralView ArticlePubMedGoogle Scholar
- Nica AC, Parts L, Glass D, Nisbet J, Barrett A, Sekowska M, Travers M, Potter S, Grundberg E, Small K: The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet. 2011, 7 (2): e1002003-10.1371/journal.pgen.1002003.PubMed CentralView ArticlePubMedGoogle Scholar
- Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT: Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010, 465 (7298): 627-631. 10.1038/nature08800.PubMed CentralView ArticlePubMedGoogle Scholar
- Parts L, Stegle O, Winn J, Durbin R: Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet. 2011, 7 (1): e1001276-10.1371/journal.pgen.1001276.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G: Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003, 422 (6929): 297-302. 10.1038/nature01434.View ArticlePubMedGoogle Scholar
- Edwards J, Martin AP, Andriunas F, Offler CE, Patrick JW, McCurdy DW: GIGANTEA is a component of a regulatory pathway determining wall ingrowth deposition in phloem parenchyma transfer cells of Arabidopsis thaliana. Plant J. 2010, 63 (4): 651-661. 10.1111/j.1365-313X.2010.04269.x.View ArticlePubMedGoogle Scholar
- Penfield S, Hall A: A role for multiple circadian clock genes in the response to signals that break seed dormancy in Arabidopsis. Plant Cell. 2009, 21 (6): 1722-1732. 10.1105/tpc.108.064022.PubMed CentralView ArticlePubMedGoogle Scholar
- Orr HA: Testing natural selection vs. genetic drift in phenotypic evolution using quantitative trait locus data. Genetics. 1998, 149 (4): 2099-2104.PubMed CentralPubMedGoogle Scholar
- Vashisht D, Hesselink A, Pierik R, Ammerlaan JM, Bailey-Serres J, Visser EJ, Pedersen O, van Zanten M, Vreugdenhil D, Jamar DC: Natural variation of submergence tolerance among Arabidopsis thaliana accessions. New Phytol. 2011, 190 (2): 299-310. 10.1111/j.1469-8137.2010.03552.x.View ArticlePubMedGoogle Scholar
- Boyes DC, Zayed AM, Ascenzi R, McCaskill AJ, Hoffman NE, Davis KR, Gorlach J: Growth stage-based phenotypic analysis of Arabidopsis: a model for high throughput functional genomics in plants. Plant Cell. 2001, 13 (7): 1499-1510.PubMed CentralView ArticlePubMedGoogle Scholar
- Sclep G, Allemeersch J, Liechti R, De Meyer B, Beynon J, Bhalerao R, Moreau Y, Nietfeld W, Renou JP, Reymond P: CATMA, a comprehensive genome-scale resource for silencing and transcript profiling of Arabidopsis genes. BMC Bioinformatics. 2007, 8: 400-10.1186/1471-2105-8-400.PubMed CentralView ArticlePubMedGoogle Scholar
- Lurin C, Andres C, Aubourg S, Bellaoui M, Bitton F, Bruyere C, Caboche M, Debast C, Gualberto J, Hoffmann B: Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell. 2004, 16 (8): 2089-2103. 10.1105/tpc.104.022236.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30 (4): e15-10.1093/nar/30.4.e15.PubMed CentralView ArticlePubMedGoogle Scholar
- Gagnot S, Tamby JP, Martin-Magniette ML, Bitton F, Taconnat L, Balzergue S, Aubourg S, Renou JP, Lecharny A, Brunaud V: CATdb: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform. Nucleic Acids Res. 2008, 36: D986-990.PubMed CentralView ArticlePubMedGoogle Scholar
- Normalization for two-color cDNA microarray data: IMS Lecture Notes - Monograph Series. Edited by: Yang YH, Thorne NP. 2004
- Broman KW, Wu H, Sen S, Churchill GA: R/qtl: QTL mapping in experimental crosses. Bioinformatics. 2003, 19 (7): 889-890. 10.1093/bioinformatics/btg112.View ArticlePubMedGoogle Scholar
- Churchill GA, Doerge RW: Empirical threshold values for quantitative trait mapping. Genetics. 1994, 138: 963-971.PubMed CentralPubMedGoogle Scholar
- Visscher PM, Thompson R, Haley CS: Confidence intervals in QTL mapping by bootstrapping. Genetics. 1996, 143: 1013-1020.PubMed CentralPubMedGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100 (16): 9440-9445. 10.1073/pnas.1530509100.PubMed CentralView ArticlePubMedGoogle Scholar
- Loudet O, Chaillou S, Camilleri C, Bouchez D, Daniel-Vedele F: Bay-0 × Shahdara recombinant inbred line population: a powerful tool for the genetic dissection of complex traits in Arabidopsis. Theor Appl Genet. 2002, 104: 1173-1184. 10.1007/s00122-001-0825-9.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.