We investigated the relationship between genetic variation, DNA methylation and gene expression in a sample of 148 healthy subjects using array-based data derived from whole blood. We found both negative (levels in opposite direction) and positive (levels in same direction) associations between cis-acting DNA methylation probes and corresponding gene expression levels, confirming previous reports that DNA methylation and gene expression located within a cis-region can be both positively and negatively associated, but are predominantly negative
In this study we applied FDR correction for multiple testing for cis associations between methylation and expression, but imposed a more stringent genome-wide significance threshold for trans effects since there is a considerable debate in the literature whether such relationships are reproducible
[29, 30]. This resulted in a limited number of trans associations that do survive this threshold but with relatively strong effect sizes. It is of note that such trans associations are enriched for positive correlations, whereas traditionally it is expected that methylation and expression are inversely correlated. We hypothesised that these involve genes involved in general methylation pathways, such as genes that induce the attachment of a methyl group. However, a gene ontology analysis did not show any overrepresented pathways (data not shown).
Furthermore, we observed that methylation probes with cis-acting effects on gene expression levels are less likely to be located in CpG islands and more likely to be present outside CGIs and shores insofar they were not regulated by genetic variation. Tissue- and cell type-specific methylation occurs much more often in gene bodies (outside island and shores) than in CpG island promoters
, indicating that methylation at CpG sites in CpG islands is much more static, which could explain the underrepresentation of CpG sites associated with expression (and SNPs) in CpG islands. Only for those CpGs that were associated with SNPs, we did concur with previous studies showing more frequent associations with expression in island shores
[2, 3]. CpG sites located in shores tend to be more variable among individuals and this might lead to an increased number of association findings. In addition, trans associations are less likely to be located in island shores and more likely to be positioned outside CGIs and shores. Also, trans associations are more likely to be positive (67%).
Identification of genetic variants (SNPs) influencing the methylation and expression levels showed that in more than 12% of methylation-expression cis-pairs, the methylation and/or the expression level was associated with a SNP in cis, suggesting genetic control of these levels.
Further analysis of genetic regulators (SNPs) of methylation and expression levels investigating the causality revealed three-way causal relationships. Previous studies have attempted to identify three-way associations in various tissues, with mixed results
[6, 7]. We used local structural equation models to calculate local edge orienting (LEO) scores based on using a cis-acting SNP as causal anchor
[19, 32]. We find that the traditional model of genetic variants regulating methylation, which in turn regulates gene expression to be most common in most of the three-way associations that showed significant evidence for causality (as was hypothesized in literature
). The set of genes for which the S→M→E model fits best does not exhibit significant enrichment for specific functions or pathways. Since the S→M→E model is expected to be ubiquitous, the lack of enrichment is not surprising. However, one of the genes that fit this model, PNMA3, is located on the X chromosome. Since inactivation in females may be a confounding factor when analyzing X chromosomes, we repeated the association analysis for all significant X chromosomes in males only. We observed no significant differences when using males-only, which confirms that the PNMA3 finding is likely to be true. Strikingly, the reverse model, in which a genetic variant primarily regulates gene expression, which in turn regulates DNA methylation, was the best causal model for a number of genes (including C21ORF56, HRASLS3, TACSTD2, WBSCR27, SRXN1, GSTM3, BTN3A2), although the model p-values of these LEO scores were small, indicating poor fit. For example, one of these genes, C21ORF56, was highlighted in a previous genome-wide study where a three-way association for this gene was identified. Additional experiments indicated that genetic variation in this gene affects chromatin structure in this region
. The gene itself may be involved in inter-individual differences in response to DNA damaging agents
. These mechanisms and our data suggest that loci whereby genetic variation influences expression and in turn methylation may exist and warrants further study. The methylation and expression probes that showed a causal direction in the LEO analysis were all present within the same gene. However, we observed that of all the 798 significant cis associations, only 155 (19%) involved probes that represent the same gene. This may suggest that the strongest (detectable) causal correlations between DNA methylation and gene expression are likely to be local events.
The systems level analysis afforded by WGCNA reveals that both transcriptome and methylome can usefully be organized into modules. Many co-methylation and co-expression modules are highly significantly enriched with gene ontology categories, which provides indirect evidence that these modules are biologically meaningful. Our module preservation analysis between expression and methylation data reveals that most co-expression modules are comprised of genes that do not form a module in the methylation data and vice versa. Only the largest co-expression module shows moderate to strong preservation and overlap with the largest co-methylation module. In other words, co-expression modules and co-methylation modules are largely composed of different genes. On the other hand, several pairs of expression and methylation eigengenes show highly significant positive and negative correlations. This suggests the existence of factors that affect expression and methylation of different sets of genes, i.e., trans effects at the module level.
A limiting factor of our study may be the fact that the Illumina 27k array covers only a selection of CpG sites and is enriched for promoter regions and CpG islands near genes. Another increasingly important issue is the potential difference between hydroxymethylation and DNA methylation that cannot be distinguished with current methylation arrays
[34, 35]. To date, the role of 5-hydroxymethylation is not fully understood but it is likely that 5-hydroxymethylation plays a role in demethylation
[34–38]. Although there is no reason to assume a systematic influence of 5-hydroxymethylation on our results, we cannot rule this out and further refinement of methylation levels is warranted. A third possible limitation is the use of whole blood comprised of different cell types for our analysis. Yet, although whole blood does not provide the optimal resolution, these cell types can be used to study general genetic mechanisms. Given the sample size we suspect that effects of blood cell composition are limited and do not play a major role in the outcome. We measured gene expression and DNA methylation from the same blood sample so that the composition of different cell types should not substantially affect the overall outcome and conclusions. Moreover, studies have shown that a majority of the strongest eQTLs overlaps between different tissues and cell types