Genome-wide eQTLs and heritability for gene expression traits in unrelated individuals
© Yang et al.; licensee BioMed Central Ltd. 2014
Received: 9 April 2013
Accepted: 22 November 2013
Published: 9 January 2014
While the possible sources underlying the so-called ‘missing heritability’ evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits.
Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed.
We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability.
KeywordsMicroarray gene expression eQTLs Heritability Mixed model HapMap populations Epistasis
In genome-wide association studies (GWAS) of conventional complex traits such as human complex diseases, a fundamental and yet unsolved question is that of so-called missing heritability, i.e., the significant and often numerous variants collectively explaining only a small fraction of the total phenotypic variation [1, 2]. For example, recent studies show that ~50 variants explain only ~5% of the phenotypic variation for human height, a highly heritable trait with narrow sense heritability of ~80% [3, 4]. While fully resolving the missing heritability remains a challenging task, we have studied the heritability of gene expression traits to shed light on the relationship between trait phenotypic variation and genetic variation on the basis that gene expression is the process linking genetic information to the final phenotype, and is itself genetically controlled. Furthermore, gene expression is generally assayed in well controlled experiments, suggesting less vulnerable to environmental variation than conventional phenotypes, and is thus an ideal choice for studying the extent to which genetic components contribute to phenotypic variation.
With the advent of DNA microarrays and more recently deep sequencing-based profiling approaches, the expression of thousands of genes can be readily measured simultaneously, creating a global snapshot of cellular activity. A number of studies have assessed the heritability of microarray gene expression traits in different species, including Arabidopsis and rat . The heritability of gene expression means it can be subject to the same quantitative trait loci (QTL) analyses as conventional trait data to reveal the so-called expression QTLs (eQTLs). For example, several studies have analysed the gene expression profile of lymphoblastoid cell lines (LCLs) from HapMap samples and reported that genetic factors make an important contribution to variation in gene expression [7–11]. These studies, however, focused on differentially expressed genes and exploring cis and trans genetic determinants of gene regulations either from one single ethnic group or across ethnic groups. There is not yet any report in the literature on the gnome-wide distribution of heritabilities of gene expression traits and the cis and trans eQTLs across different HapMap populations. Furthermore, the phenotypic variations explained by interactions between eQTLs have never been exploited at a genome-wide scale in humans to our best knowledge. Recently, Price et al.  analysed microarray gene expression data from blood and adipose samples of Icelandic family cohorts and began to partition the heritability into cis and trans components using a variance component model composed of polygenic effects estimated using identity by descent (IBD) for chromosome segments both proximal (cis) and distal (trans) to the gene of interest. However, their method implicitly assumed the sum of variance components to be unity after normalising gene expression values to have mean 0 and variance 1, and only genetic variance component parameters were estimated using a binary search algorithm. While samples of related individuals were collected and hence genetic correlations among samples were expected, assumption of unity variance virtually neglected the variance-covariance structure and hence might introduce bias in the estimation of heritability. Moreover, a large number of negative heritability estimates were derived in that study, raising the challenge for a meaningful biological explanation of the negative heritability estimates. While it is possible that variation noise caused the negative estimates of heritabilities as discussed in Price et al. , a robust statistical approach which enables to prevent such negative heritability estimate is highly desirable.
In this study, we re-analysed gene expression microarray data from four HapMap populations using a statistically rigorous variance component model with motivation to explore (1) the global pattern of the distribution of heritabilities of gene expression traits in unrelated individuals, (2) the cumulative genetic contribution of cis and trans acting eQTLs to gene expression heritability, and (3) the potential of gene-gene interactions in explaining the missing heritability at gene expression level.
Gene expression data and quality control
We analyzed the gene expression levels measured previously in LCLs from 210 unrelated HapMap individuals, using Illumina’s human whole-genome expression array (WG-6 version 1) . In this experiment, each of the two in vitro transcription (IVT) reactions from the 210 samples was hybridized to each of two arrays, resulting in four replicate hybridizations for every sample. We downloaded background-corrected gene expression values from the Gene Expression Omnibus (GEO) database (accession number GSE6536), and then carried out quantile normalisation across replicates of a single individual and subsequently median normalisation across all individuals by using the R package beadarray . It should be noted that the Illumina Genome Studio software can work out detection scores for each probe and flag the presence/absence calls of expression of the features. However, the detection scores were not provided for the current microarray dataset and hence we applied a filter to exclude the microarray probes based on their expression levels as described below.
We conducted BLAT analysis  to map all 47,294 Illumina array probes onto human cDNA sequences from Ensembl (hg19). Among these, 21,152 mapped probes were retained after removing probes mapped with over 90% identity to multiple genes or mapped to sex chromosomes or mitochondrial DNA. Further removed were 24 probes that carried at least one genetic variant within the probe region (according to the Ensembl Variation database). This further reduced the potential bias in gene expression estimation due to the mis-match between probe and transcript sequence. To exclude those genes with an extremely low expression level, additionally, we filtered out those probe features whose raw intensity values were smaller than background noises in more than half of the total individuals in all the four replicated arrays, resulting in 12,158 probes from 10,720 genes which were recognized to be expressed features or genes. Finally, for the genes surrogated by multiple probes, we took the average over all relevant probes as estimates of their expression levels. Following heritability and eQTL analyses were based on the 10,720 expressed genes.
Genotype data and quality control
We downloaded the phase II and III combined genotype data for HapMap individuals from the HapMap project website. In total, there are over 4 million SNPs genotyped for the present HapMap samples. Genotype quality checking was performed independently in each individual population. SNPs were removed for each of the HapMap populations if they were: i) located on sex chromosomes, ii) genotyped in less than 90% of individuals, iii) with allele frequency < 0.05, and/or iv) demonstrating significant departure from Hardy-Weinberg Equilibrium (P < 0.001). The final dataset contained genotypes at 1,299,240 consensus SNPs from all four HapMap populations.
Genetic relationship estimation
To estimate heritability from fitting a variance component model, it is firstly necessary to estimate pairwise genetic relationship coefficients for the HapMap samples. A number of statistical methods have been proposed for estimating genetic relationships from genome-wide high density marker genotypes for homogeneous (i.e., non-stratified) populations, e.g. the program PLINK . However, for the present analysis using samples from four HapMap populations, relationship estimation will be biased unless the population structure is considered. Since the population origin of each HapMap individual is clear and the different populations have been geographically isolated from each other for many generations, it is reasonable to assume that individuals from different populations are unrelated and their relationship coefficients are zero. Therefore, to adjust for population stratification we used PLINK to estimate the coefficients of genetic relationship based on autosomal marker genotypes in each HapMap population independently and then merged all four population genetic relationship matrices by setting the relationship coefficients between individuals from different populations to be zero. To account for linkage disequilibrium (LD) between SNPs, we utilized the PLINK SNP pruning function to generate a subset of 36,609 SNPs that were in approximate linkage equilibrium (pairwise genotypic correlation r2 < 0.05) to be used for relationship estimations. It should be noted that the genetic relationship coefficients derived from PLINK are the probabilities of genome-wide allelic identical by descent (IBD) . In the following, we denote by K pop as the genetic relationship matrix in population pop (where pop = CHB, JPT, CEU or YRI) and K as the final merged genetic relationship matrix.
Gene expression heritability estimation
where Z is a design matrix relating gene expression levels from each array to individual sample. We implemented a restricted maximum likelihood (REML) approach using R (http://www.r-project.org/) programming language to obtain the maximum likelihood estimates (MLEs) of the model parameters . With the estimates of variance components, i.e., and , narrow sense heritability of the gene expression trait can be estimated by . Gene expression heritability analysis was performed in the four HapMap populations combined, and also in each individual HapMap population using a simplified linear mixed model y ij = μ + u i + e ij , where u ~ N(0, ) and e ~ N(0, ).
eQTL scan and eQTL heritability estimation
where yi · is the mean expression value for a target gene for individual i, x ik is the genotype score of individual i at the k th SNP marker, with values 0, 1 and 2 representing the number of a reference allele at the SNP locus, α k is the regression coefficient at the SNP, and e i is the residual term. μ, S i and β are defined as in equation (1). A t-test is then carried out against the null hypothesis of α k = 0 for the target gene at the k th SNP. A conservative Bonferroni P value threshold (0.05/1,299,240 = 3.85 × 10-8) was applied to account for the large number of tests. Significant SNPs were merged into eQTLs using criteria as detailed in Results.
This single point heritability estimation approach can be readily applied to estimate aggregated heritability from multiple SNPs by simply fitting multiple SNP genotypes into the above equation (4).
Population structure and genetic relationship
We firstly utilized multidimensional scaling (MDS) within the PLINK program  to investigate population structure among the four HapMap populations (CHB, JPT, CEU and YRI) based on marker genotype data. The first two principal coordinates (PCo) clearly separate the CEU and YRI populations from each other and from the two Asian populations (CHB and JPT), which are in turn distinguished from each other by the third PCo (Additional file 1). Separation of the four geographically isolated HapMap populations indicates it would not be appropriate to treat the four populations as a homogeneous sample, but instead there should be proper control of the heterogeneity caused by the population structure effect in both the genetic relationship inference and association analysis.
We used PLINK to infer the genetic relationship matrix for each of the four HapMap populations independently and then merged the resulting four matrices by setting inter-population pairwise relationship coefficients to zero. The mean (and standard deviation) of IBD coefficients were 0.0067 (0.0074), 0.0056 (0.0073), 0.0068 (0.0080), and 0.0035 (0.0145) for CHB, JPT, CEU and YRI HapMap populations, respectively. Clearly the IBD coefficients were very low, as expected since individuals collected in each population were genetically unrelated. Consistent with previous reports (e.g., ), six pairs of individuals were evidently highly related (IBD coefficient > 0.05). Three related pairs were from the YRI population: NA18913 and NA19238 (IBD coefficient 0.5005), NA19130 and NA19192 (IBD coefficient 0.2392), and NA19092 and NA19101 (IBD coefficient 0.1231). The remaining three potentially related pairs were between CEU individuals NA06993 and NA07022 (IBD coefficient 0.0696), NA06993 and NA07056 (IBD coefficient 0.0686), and NA12155 and NA12264 (IBD coefficient 0.0679). To avoid having close relatives in the data, we selectively excluded one individual with greater number of missing genotypes for each of the three pairs of highly related individuals.
Gene expression heritability
Normalised gene expression levels were fitted into the variance component model (equations 1 and 2) to estimate the proportion of phenotypic variance explained by polygenic effects, i.e., the heritability in the present HapMap populations. A total of 10,720 genes were selected for heritability estimation using the REML technique. After REML fitting, we checked, using simple linear regression analyses, that the expression values for more than 95.3% genes, which were predicted from estimate of fixed effects in the mixed model, were not in significant linear correlation with the estimates of the random and residual terms in the model.
We explored the bimodal pattern of distribution of the heritability estimates for gene expression traits. Additional file 2 shows an empirical relationship between expression levels and heritabilities of gene expression traits. It is clear from the file that lower heritability estimates of gene expression (h2 < 0.2) were more likely to occur in genes with low expression levels, which is consistent with the fact that low expression levels are associated with a lower level of phenotypic variation. High heritability estimates (h2 > 0.5) were present in genes with a wider range of gene expression phenotype. Together with the fact that housekeeping genes had higher expression levels (median 8.29; mean 8.52) than non-housekeeping genes (median 7.46; mean 7.98), the results suggested a possible link between the expression levels and estimates of gene expression heritabilities. We will elaborate the observation in below.
Genome-wide association eQTL analysis
In the genome-wide association scan, normalised gene expression levels were averaged among four replicated arrays for each individual and then scanned for genome-wide SNP associations using a multiple linear regression analysis with correction for population structure in the mixed HapMap populations. A total of 11,290 regression models involving 988 genes and 10,712 SNPs were declared significant at Bonferroni-corrected P-value threshold 3.85 × 10-8. It is noted that the number of genes was not taken into account in correction for multiple tests. At an overall false positive rate of 5%, about 550 expression-SNP models were expected to be significant at the given threshold. Because the 550 expected false positives accounted for only 4.7% of the total 11,290 discoveries, the present threshold should be recognized to be conservative and appropriate for further statistical analyses. For genes with multiple associations, the significant SNPs were merged into eQTLs. An eQTL in the present analysis was defined as an independent peak in the P-value profile across a given chromosome. Following Jiang et al. , any peak occurring within a chromosome region of 5 Mb in size was taken as a single eQTL peak. The eQTLs thus defined were further classified based on their physical distance from the associated gene, either as cis eQTLs if the SNP locates within 500 kb upstream of the transcript start and 500 kb downstream of 3’ end of the gene or otherwise as trans eQTLs. The 11,290 significant associations gave rise to 1,043 eQTLs, of which about two third (671) were trans eQTLs while only 372 were in cis, consistent with previous eQTL studies (for example, [7, 22]).
Previously, Stranger et al.  found an over-representation of cis associations (803 cis and 44 trans) from the same microarray dataset analysed here. We compared the cis eQTLs mapped in the present study with those predicted in the study of Stranger et al. and found that 591 out of 803 (73.6%) cis eQTL genes detected in Stranger et al. were included in the current selected gene set. More importantly, 252 cis eQTLs were common between the two studies as shown in Additional file 3, i.e., 70% of the eQTLs predicted here were also detected by Stranger et al., suggesting a high level of comparability of the present study to the previous eQTL analysis. Discrepancy in the number of eQTLs predicted between the two studies may be partly due to the fact that different genes or SNPs were selected for the analyses and partly due to different eQTL analysis methods used. Unlike the present analysis in which the population origin is used as a covariate to correct for structure in the linear regression analysis of the pooling sample, Stranger et al.  performed within population permutation to correct for inflated associations in simple linear regression. Because the population structure is clearly present in the pooled sample, linear regression conditional on the known population structure is much simpler but more effective in correcting for spurious associations than the permutation test. In fact, Stranger et al.  detected much less trans eQTLs largely due to their ways to determine the associated SNPs as the trans eQTLs. To avoid the computational burden and statistical challenges in testing all SNPs against all candidate expression traits, Stranger et al. tested for trans effects in only ~25,000 SNPs (roughly 1 percent of the total SNPs) selected for potential functional significance. In contrast, we screened for trans associations from the full SNP genotype data without prior selection.
We were interested in how much heritability in gene expression traits could be accounted for by the predicted eQTLs. It is statistically challenging to directly estimate the proportion of phenotypic variance explained by each individual eQTL as the status of the underlying QTL gene linked to the SNP under study is unknown, so we followed an approximation approach as in Cockram et al.  by comparing the estimates of variance components between a SNP-inclusive mixed model (i.e. Equation 4) and a SNP-free model (i.e. Equation 1). The difference in the sum of variance component estimates can be used as an estimate of the phenotypic variance accounted for by the associated SNP. Here we used to represent the estimate of the fraction of phenotypic variance explained by the eQTL. The sum of variance components is expected to decrease with the SNP-inclusive model because incorporation of additional explanatory variables generally improves model fit and hence will give a non-negative estimate of SNP heritability; to validate the present eQTL heritability estimation, we further confirmed that this decrease in the sum of variance components is due to a decrease in genetic variance components in the SNP-inclusive model relative to the SNP-free models and not due to a decrease in residual variance (Additional file 5). A stacked histogram presents the distribution of the heritability values for cis and trans eQTLs (Figure 3b). It is clear that the majority of the eQTLs explained individually very small fractions of phenotypic variance, particularly for the trans eQTLs, though there were some eQTLs explaining up to 80% of the phenotypic variance. Due to the large proportion of low heritability estimates among trans eQTLs, thus on average, this group tended to contribute smaller genetic variation (mean = 15.6%) to gene expression variation compared with cis eQTLs (mean = 22.0%).
We calculated heritability of expression phenotype by regressing midparent expression values on their offspring expression values for the 10,720 selected genes in the CEU and YRI trio population datasets . After removing those pairs of individuals with predicted hidden relatedness, there are 28 (or 27) trio families retained in the CEU (or YRI) population. The microarray data from the trio families were pre-processed with the same normalisation procedure as described above for the unrelated individuals. Because every sample was replicated by four arrays, we took the average of replicates as gene expression estimate. In the trio-family analysis, heritability of gene expression was estimated by the regression coefficient Additional file 6 summarizes the heritability estimates, showing that there were about 32-35% genes showing negative heritability estimates in the CEU and YRI trio-family populations. The negative heritability estimates reflected nature of the regression analysis which is highly vulnerable to environment variation, particularly when a small population size is used. However, Fisher’s exact test demonstrated that the genes with the cis and trans eQTL genes predicted from the analysis above were specifically enriched for non-negative heritability estimates, suggesting a strong concordance between the trio-family based analysis and the population based eQTL analysis. We observed a highly significantly positive correlation in the expression heritability estimates between the two analyses (Pearson’s r = 0.22, P value < 10-7). Moreover, focus was on only those genes detected with significant eQTL regulation, the correlation coefficient increased to 0.31 (P value < 10-7). These may suggest that the mixed model analysis confers statistically more robust estimation of gene expression heritability, at least in the present setting.
To investigate the influence of gene-gene interactions on gene expression trait heritability, we selected genes presenting multiple eQTLs and for each selected gene we fitted a linear mixed model with incorporation of SNP-SNP multiplicative interaction effect. There were 39 genes with two eQTLs and 3 genes with three or more eQTLs. For simplicity, only genes presenting two eQTLs were selected for interaction test. As listed in Additional file 7, incorporation of interaction terms in the model in the model did not necessarily lead to an increased proportion of phenotypic variation for all the 39 genes tested. In 16 of the selected genes, the interaction terms had increased the explained genetic variance by as large as 35%. To assess the appropriateness of including interaction terms in the model, Akaike information criterion (AIC) was calculated for the models with or without interaction term. The models having a lower AIC value were recognized to be statistically more appropriate in decomposing the gene expression phenotypic variation. For five of the 39 genes, the models with the interaction term showed lower AIC values and hence were preferred in comparison with the corresponding additive models (Additional file 7). This demonstrates that the analysis of gene-gene interactions may be useful to uncover the genetic components that are not explained by additive gene effects only for some expression traits but this is not always the case for all genes.
Approaches that combine genome-wide gene expression profiling and genome-wide marker genotype data are offering new insights for dissecting the genetic basis of complex traits including common human diseases. In this study, we used publicly available datasets of microarray gene expression measurements and genome-wide SNPs from 210 HapMap individuals to examine the heritability of gene expression traits in LCL samples by using REML analysis of a variance component model. Differing from previous studies, which also analyzed the same microarray datasets but aimed to infer differentially expressed genes and/or to detect genetic controls of the expression regulations within or across ethnic groups (eg, [7–11, 23, 25–27]), this paper represents a detailed exploitation of the genome-wide distribution of heritabilities of gene expression traits. We present here for the first instance the cis and trans eQTLs through analysis using the information jointly from four HapMap populations. In contrast to a most recent study on gene expression heritability , in which the heritability in expression were estimated to be negative for a large number of genes, the present study developed a statistically robust variance component approach which may, ensure non-negative estimates of variance components and, in turn non-negative estimates of gene expression heritability in the range of 0 ~ 100%. The mixed model analysis was originally proposed for estimating quantitative genetic parameters using pedigree information of outbred populations . The present variance component model utilises the relationship matrix inferred from genome-wide genotype, and the method developed here can be readily implemented for a population based analysis.
In this study, we observed a common pattern in distribution of gene expression heritabilities among four HapMap populations (CHB, JPT, CEU and YRI), suggesting possibly similar levels of constraints imposed on the expression variability of the most genes in the populations, or no apparent evolutionary divergence has been detected to impact the expression regulatory mechanisms of these genes. In this aspect, our result is consistent with a previous study which did also observe that expression variability of most human genes in one population was not markedly deviant from another population . In the present study, we observed that heritability of gene expression was estimated to vary from as low as zero to as large as nearly 100%, indicating a large varying levels of genetic contribution to variation in the gene expression phenotype. While there was an abundance of genes with very low heritability, a large number of genes clustered at heritability levels of around 90%, resulting in a bimodal distribution. The analysis also shows that genes with larger variability tend to have a larger heritability (Additional file 8). In general, housekeeping genes exhibited greater heritability than non-housekeeping genes, demonstrating that a greater level of genetic control has been preserved in this group of genes. However, caution must be taken in interpreting the bimodality pattern in distribution of gene expression heritabilities. One of possible explanations for the bimodality is that a expression trait can be regulated either in cis or in trans and that the expression traits regulated in cis intend to have a higher heritability whilst the trans regulated expression traits intend to have a lower heritability. One other possibility as suggested from Additional file 2 is that those genes with an ultra low level of expressions are subjected to a very low statistical power to be detected for a significant genetic component in their regulation.
It needs to be stressed that the housekeeping genes listed here was derived by an independent study from multiple gene expression profiling experiments with different human tissues . Therefore the different patterns of the heritability distribution between housekeeping and non-housekeeping genes revealed in this study should not be confounded by expression level related assignment of genes into different categories. Because information was not available for presence or absence of expression in the Illumina bead-based microarray dataset analysed, we have applied a filter to exclude lowly expressed genes to avoid the bias in estimates of gene expression heritability. Classifying genes into discrete abundance classes, e.g., highly expressed, lowly expressed and non-expressed, is not a simple task because genes show broad and quantitative expression levels with no clear separation into distinct classes with the current transcriptomic profiling platforms . Many technique issues might influence the estimation of expression abundance, e.g., sensitivity of the platform, sample preparation and data processing algorithm. Recently, sequencing based application such as RNA-seq  is increasingly being used to quantify gene expression with more precise measurement. However, the RNA-seq technique has its own limitations. For example, sequencing depth exerts a profound impact on the expression abundance estimation and the subsequent data analyses such as differential expression prediction . We argue that care must be taken interpreting the gene expression data. Nevertheless, we noted that RNA-seq data sets for LCLs of two of the present HapMap populations were publicly available [33, 34]. We downloaded the normalised RNA-seq expression data for YRI and CEU LCL samples and confirmed that about 88% and 85% of the 10,720 currently selected genes were present in the RNA-seq datasets for YRI and CEU populations, respectively. This result supports the appropriateness of the current strategy to select expressed genes for the eQTL analysis from the microarray data.
With a genome-wide scan of gene-SNP associations, 1,043 eQTLs were detected for 988 genes. Concordant with previous studies (e.g., [7, 22]), we obtained an excess of trans over cis eQTLs. Moreover, we demonstrated in Additional file 9 that genes with or without eQTLs had a similar distribution in expression levels, removing the concern that gene expression levels had substantial impact on prediction of eQTLs. We estimated the fractions of genetic variations explained by eQTLs by using a simple approach which compares the variance component estimates between SNP-free and SNP-inclusive models. Compared to methods that require calculating allelic effect and allele frequency at individual locus (e.g., ), the present eQTL heritability estimation approach provides a simple alternative and allows multiple eQTLs to be jointly analyzed for both additive and interactive effects. We demonstrated that both categories of eQTL cumulatively explained comparable proportions of the total heritability despite that trans eQTLs individually have weaker effects than cis eQTLs, a result consistent with other eQTL studies . While heritability may be considered to evidence the genetic control of phenotypic variation (gene expression level), it is an obvious assumption that higher heritability implies a greater chance to map genetic variants responsible for variation in gene expression. However, we have shown that this assumption may be true for cis eQTLs but not for trans eQTLs, because while genes with cis eQTLs aggregated at high heritability levels, genes with trans eQTLs shared a similar heritability distribution to that of genes with no eQTLs at all. One possible explanation is that trans effects in LCL could be introduced by non-genetic variation of in vitro factors, which could mimic trans regulation but does not resemble true biologically heritable genetic regulation. Another potential reason is the lack of power to detect trans eQTLs in those genes without eQTLs in the present sample given stringent significance threshold and the small genetic effects conferred by the trans eQTLs. The implication of this finding is that heritability should not be used as a filter to screen genes for the detection of eQTLs.
A number of explanations have been suggested for the failure of associated variants to fully explain heritability in conventional complex trait association analyses; these include a lack of statistical power to detect loci with minor effects, the existence of interactions between genetic factors and/or between genes and environment, and the possibility of influence by epigenetic factors; the first of these explanations has been extensively studied . In this study, we investigated gene-gene interaction (epistasis) as a potential source to account for unexplained heritability in genetic association studies. Genetic studies have long identified specific instances of genetic interactions in model species [e.g., ]. A recent study in yeast Saccharomyces cerevisiae had confirmed interactions partially explained the heritability of complex traits missed by the additive genetic contributions . However, carrying out genome-wide interaction analysis in human is still not feasible because the prevalence of interactions in human dataset is still largely unknown [38, 39]. We take advantage of the fact that while predicting genetic interactions a priori from population data may be difficult computationally and of low power, it is much more straightforward to detect epistasis among variants a posteriori once they have been detected . For a number of genes with multiple eQTLs, we modelled multiplicative interaction effects and evaluated the contribution of gene-gene interactions to total gene expression heritability. Epistatic effects were detected in 5 genes through AIC model selection, leading to a substantial increase in the genetic variance explained by 10 ~ 35%. However, it needs to be emphasized that the number of genes predicted with multiple eQTLs could be underestimated possibly due to the small sample size and/or over-stringent significance threshold implemented in the association test. It would be certainly likely to detect such eQTLs when the significance confidence was lowered but this may enforce concern of the type 1 error. Hence the present finding about cis and trans eQTL interactions is clearly subject to variation due to use of small samples. Nonetheless, our results clearly show that the interaction is a crucial term for exploring and unravelling the mystery of missing heritability at gene expression level.
We implemented a variance component analysis for inferring the proportion of phenotypic variations explained by genetic factors for genome-wide gene expression traits from unrelated individuals. The study reveals that heritability of the genome-wide gene expression traits varies from 100% to almost zero which is common between four HapMap populations, that the cis- regulating expression traits usually have a larger heritability than trans- regulating expression traits, and that distribution of the expression traits heritability differs between the house-keeping and non-house keeping genes. The study illustrates that interaction between eQTLs contributes significantly to a missed fraction of heritability in the expression traits.
Computer program implementing the present variance component model analysis is available in Additional file 10. The microarray gene expression data analysed in this paper were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) through accession number GSE6536 .
We thank two anonymous reviewers for their suggestions which helped improve presentation of the paper. This study was supported by research grants from The National Basic Research Program of China (2012CB316505), the Leverhulme Trust (UK) and Biotechnology and Biological Science Research Council UK. ZL is also supported by China’s National Natural Science Foundation (81172006, 9123114).
- Maher B: Personal genomes: The case of the missing heritability. Nature. 2008, 456: 18-21.PubMedView ArticleGoogle Scholar
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al: Finding the missing heritability of complex diseases. Nature. 2009, 461: 747-753. 10.1038/nature08494.PubMed CentralPubMedView ArticleGoogle Scholar
- Visscher PM, Hill WG, Wray NR: Heritability in the genomics era - concepts and misconceptions. Nat Rev Genet. 2008, 9: 255-266.PubMedView ArticleGoogle Scholar
- Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, et al: Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010, 42: 565-569. 10.1038/ng.608.PubMed CentralPubMedView ArticleGoogle Scholar
- Keurentjes JJB, Fu J, Terpstra IR, Garcia JM, van den Ackerveken G, Snoek LB, Peeters AJM, Vreugdenhil D, Koornneef M, Jansen RC: Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Natl Acad Sci. 2007, 104: 1708-1713. 10.1073/pnas.0610429104.PubMed CentralPubMedView ArticleGoogle Scholar
- Petretto E, Mangion J, Dickens NJ, Cook SA, Kumaran MK, Lu H, Fischer J, Maatz H, Kren V, Pravenec M, et al: Heritability and tissue specificity of expression quantitative trait loci. PLoS Genet. 2006, 2: e172-10.1371/journal.pgen.0020172.PubMed CentralPubMedView ArticleGoogle Scholar
- Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG: Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 2007, 39: 226-231. 10.1038/ng1955.PubMed CentralPubMedView ArticleGoogle Scholar
- Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM: Gene-expression variation within and among human populations. Am J Hum Genet. 2007, 80: 502-509. 10.1086/512017.PubMed CentralPubMedView ArticleGoogle Scholar
- Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, et al: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007, 315: 848-853. 10.1126/science.1136678.PubMed CentralPubMedView ArticleGoogle Scholar
- Veyrieras J-B, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M, Pritchard JK: High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 2008, 4: e1000214-10.1371/journal.pgen.1000214.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, et al: Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet. 2008, 82: 631-640. 10.1016/j.ajhg.2007.12.015.PubMed CentralPubMedView ArticleGoogle Scholar
- Price AL, Helgason A, Thorleifsson G, McCarroll SA, Kong A, Stefansson K: Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 2011, 7: e1001317-10.1371/journal.pgen.1001317.PubMed CentralPubMedView ArticleGoogle Scholar
- Dunning MJ, Smith ML, Ritchie ME, Tavaré S: beadarray: R classes and methods for Illumina bead-based data. Bioinformatics. 2007, 23: 2183-2184. 10.1093/bioinformatics/btm311.PubMedView ArticleGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al: PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMed CentralPubMedView ArticleGoogle Scholar
- Johnson DL, Thompson R: Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. J Dairy Sci. 1995, 78: 449-456. 10.3168/jds.S0022-0302(95)76654-1.View ArticleGoogle Scholar
- Yu JM, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, et al: A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006, 38: 203-208. 10.1038/ng1702.PubMedView ArticleGoogle Scholar
- Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E: Efficient control of population structure in model organism association mapping. Genetics. 2008, 178: 1709-1723. 10.1534/genetics.107.080101.PubMed CentralPubMedView ArticleGoogle Scholar
- Roberson EDO, Pevsner J: Visualization of shared genomic regions and meiotic recombination in high-density SNP data. PLoS One. 2009, 4: e6711-10.1371/journal.pone.0006711.PubMed CentralPubMedView ArticleGoogle Scholar
- Powell JE, Henders AK, McRae AF, Wright MJ, Martin NG, Dermitzakis ET, Montgomery GW, Visscher PM: Genetic control of gene expression in whole blood and lymphoblastoid cell lines is largely independent. Genome Res. 2012, 22: 456-466. 10.1101/gr.126540.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhu J, He F, Song S, Wang J, Yu J: How many human genes can be defined as housekeeping with current expression data?. BMC Genomics. 2008, 9: 172-10.1186/1471-2164-9-172.PubMed CentralPubMedView ArticleGoogle Scholar
- Jiang N, Wang M, Jia T, Wang L, Leach L, Hackett C, Marshall D, Luo Z: A robust statistical method for association-based eqtl analysis. PLoS One. 2011, 6: e23192-10.1371/journal.pone.0023192.PubMed CentralPubMedView ArticleGoogle Scholar
- Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, et al: Population genomics of human gene expression. Nat Genet. 2007, 39: 1217-1224. 10.1038/ng2142.PubMed CentralPubMedView ArticleGoogle Scholar
- Cockram J, White J, Zuluaga DL, Smith D, Comadran J, Macaulay M, Luo Z, Kearsey MJ, Werner P, Harrap D, et al: Genome-wide association mapping to candidate polymorphism resolution in the unsequenced barley genome. Proc Natl Acad Sci. 2010, 107: 21611-21616. 10.1073/pnas.1010179107.PubMed CentralPubMedView ArticleGoogle Scholar
- Kudaravalli S, Veyrieras J-B, Stranger BE, Dermitzakis ET, Pritchard JK: Gene expression levels are a target of recent natural selection in the human genome. Mol Biol Evol. 2009, 26: 649-658.PubMed CentralPubMedView ArticleGoogle Scholar
- Hsiao C-L, Lian I-B, Hsieh A-R, Fann C: Modeling expression quantitative trait loci in data combining ethnic populations. BMC Bioinformatics. 2010, 11: 111-10.1186/1471-2105-11-111.PubMed CentralPubMedView ArticleGoogle Scholar
- Gaffney D, Veyrieras J-B, Degner J, Pique-Regi R, Pai A, Crawford G, Stephens M, Gilad Y, Pritchard J: Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 2012, 13: R7-10.1186/gb-2012-13-1-r7.PubMed CentralPubMedView ArticleGoogle Scholar
- Thompson R: Estimation of quantitative genetic parameters. Proc Roy Soc B-Biol Sci. 2008, 275: 679-686. 10.1098/rspb.2007.1417.View ArticleGoogle Scholar
- Li J, Liu Y, Kim T, Min R, Zhang Z: Gene expression variability within and between human populations and implications toward disease susceptibility. PLoS Comput Biol. 2010, 6: e1000910-10.1371/journal.pcbi.1000910.PubMed CentralPubMedView ArticleGoogle Scholar
- Hebenstreit D, Fang M, Gu M, Charoensawan V, van Oudenaarden A, Teichmannb SA: RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Mol Syst Biol. 2011, 7: 497-PubMed CentralPubMedView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralPubMedView ArticleGoogle Scholar
- Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A: Differential expression in RNA-seq: a matter of depth. Genome Res. 2011, 21: 2213-2223. 10.1101/gr.124321.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464 (7289): 768-772. 10.1038/nature08872.PubMed CentralPubMedView ArticleGoogle Scholar
- Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET: Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010, 464 (7289): 773-777. 10.1038/nature08903.PubMedView ArticleGoogle Scholar
- Montgomery SB, Dermitzakis ET: From expression QTLs to personalized transcriptomics. Nat Rev Genet. 2011, 12 (4): 277-282. 10.1038/nrg2969.PubMedView ArticleGoogle Scholar
- Brem RB, Storey JD, Whittle J, Kruglyak L: Genetic interactions between polymorphisms that affect gene expression in yeast. Nature. 2005, 436: 701-703. 10.1038/nature03865.PubMed CentralPubMedView ArticleGoogle Scholar
- Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, Kruglyak L: Finding the sources of missing heritability in a yeast cross. Nature. 2013, 494 (7436): 234-237. 10.1038/nature11867.PubMed CentralPubMedView ArticleGoogle Scholar
- Wu X, Dong H, Luo L, Zhu Y, Peng G, Reveille JD, Xiong M: A novel statistic for genome-wide interaction analysis. PLoS Genet. 2010, 6: e1001131-10.1371/journal.pgen.1001131.PubMed CentralPubMedView ArticleGoogle Scholar
- Zuk O, Hechter E, Sunyaev SR, Lander ES: The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci. 2012, 109: 1193-1198. 10.1073/pnas.1119675109.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.