Functional complementation between transcriptional methylation regulation and post-transcriptional microRNA regulation in the human genome
© Su et al. licensee BioMed Central Ltd 2011
Published: 23 December 2011
Skip to main content
© Su et al. licensee BioMed Central Ltd 2011
Published: 23 December 2011
DNA methylation in the 5' promoter regions of genes and microRNA (miRNA) regulation at the 3' untranslated regions (UTRs) are two major epigenetic regulation mechanisms in most eukaryotes. Both DNA methylation and miRNA regulation can suppress gene expression and their corresponding protein product; thus, they play critical roles in cellular processes. Although there have been numerous investigations of gene regulation by methylation changes and miRNAs, there is no systematic genome-wide examination of their coordinated effects in any organism.
In this study, we investigated the relationship between promoter methylation at the transcription level and miRNA regulation at the post-transcription level by taking advantage of recently released human methylome data and high quality miRNA and other gene annotation data. We found methylation level in the promoter regions and expression level was negatively correlated. Then, we showed that miRNAs tended to target the genes with a low DNA methylation level in their promoter regions. We further demonstrated that this observed pattern was not attributed to the gene expression level, expression broadness, or the number of transcription factor binding sites. Interestingly, we found miRNA target sites were significantly enriched in the genes located in differentially methylated regions or partially methylated domains. Finally, we explored the features of DNA methylation and miRNA regulation in cancer genes and found cancer genes tended to have low methylation level and more miRNA target sites.
This is the first genome-wide investigation of the combined regulation of gene expression. Our results supported a complementary regulation between DNA methylation (transcriptional level) and miRNA function (post-transcriptional level) in the human genome. The results were helpful for our understanding of the evolutionary forces towards organisms' complexity beyond traditional sequence level investigation.
Epigenetics refers to the heritable changes that modify DNA or associated proteins without changing the DNA sequence itself . It has been commonly accepted that both epigenetic mechanisms - DNA methylation modification at the gene's promoter regions (5' of the gene) and microRNA (miRNA) regulation at the 3' untranslated regions (3' UTRs) - are important in gene expression regulation. DNA methylation has been popularly investigated due to its heritable epigenetic modifications of the genome and has been implicated in the regulation of most cellular processes. These include embryonic development, transcription, chromatin structure, X chromosome inactivation, genomic imprinting and chromosome stability [[2–6]]. Aberrant DNA methylation has been frequently reported to influence gene expression and subsequently cause various human diseases, especially cancer [[7–9]]. The causal relationship between variation in promoter DNA methylation and difference in gene regulation has been well recognized [10, 11]. Recent work  revealed that hypermethylation at promoter CpG sites typically results in a lower transcription level of downstream genes. When methylation was experimentally removed from a gene's promoter region, its transcription level would often be higher . Among the ~28 million CpG dinucleotide sites that are susceptible to methylation in the human genome, approximately 10% are in the promoter regions of genes, in which they may physically obstruct the binding of transcriptional proteins to the gene or may be indirectly regulated by the recruitment of methyl-CpG-binding domain proteins through cytosine methylation [[14–16]]. The repression role in gene expression regulation by methylation modification in a gene's promoter region has been reinforced by current whole genome bisulfite sequencing of the methylomes of more than 20 eukaryotes .
miRNAs are a class of small noncoding RNA molecules that regulate eukaryotic gene expression at the post-transcriptional level. They specifically bind mRNAs in their 3' UTRs based on sequence complementation and lead to translational repression and gene silencing . According to release 17 (April 26, 2011) of the miRNA database miRBase , there are 16,772 miRNA gene loci in 153 species and 19,724 distinct mature miRNA sequences . Among them, the human genome encodes 1424 miRNA sequences, which may target approximately 60% of human protein-coding genes . This huge number of miRNAs discovered so far indicates that many biological processes, including cell cycle control, cell growth and differentiation, apoptosis, and embryo development, are controlled by miRNA-mediated gene expression regulation .
Although there have been many important advances in understanding gene silencing roles at the transcriptional level through DNA methylation modification and at the post-transcriptional level through miRNA regulation, it remains unclear how these two major mechanisms cooperate at the genome-wide level to influence cellular processes. Thus, a combinatory analysis of these two mechanisms is likely to reveal many important insights into a deeper understanding of gene regulation in cells. Considering that (1) DNA methylation acts on a gene's 5' promoter region, and transcription typically depends on demethylation of the promoter region, and (2) miRNAs target on 3' UTR to suppress gene's post-transcriptional activities, we hypothesized that there exists a functional complementation between transcriptional promoter region methylation regulation and post-transcriptional miRNA regulation. If this hypothesis is valid, we would infer that (1) miRNAs preferentially target genes with a low DNA methylation level at the promoter regions; (2) genes that are controlled by more miRNAs tend to have less promoter methylation regulation. We validated our hypothesis by deeply analyzing human methylome data in two cell lines. To the best of our knowledge, this is the first report of the complementary relationship between DNA methylation regulation and miRNA regulation in a eukaryotic genome. Furthermore, we found that cancer genes tended to be silenced by miRNAs and to escape from DNA methylation suppression, thereby supporting our hypothesis.
Human and mouse gene structure data was retrieved from the Ensembl database (version 54), including the information of Ensembl gene ID, Ensembl transcript ID, transcript start (bp), transcript end (bp), Ensembl protein ID, 3' UTR start, 3' UTR end, chromosome, and strand. We extracted the promoter region and 3' UTR position information from Gene structure data. If there are multiple transcripts for a gene, the transcription start site (TSS) and 3' UTR of the major transcript were used . We retained only those genes without distant alternative TSS (> 200 bp distance from the major TSS) and without ambiguous 3' UTR regions to avoid the potential inaccurate mapping of the gene expression data and gene structures.
The single-base resolution DNA methylation data was retrieved from Lister et al. (2009) , including whole genome bisulfite sequencing data for two human cell lines: H1 human embryonic stem cells and IMR90 fetal lung fibroblasts. The methylation information for each promoter was extracted by mapping the promoter region (in a range of -1000 to +200 bp from the TSS) to the genome-wide methylation data from the H1/IMR90 cell line.
Based on single-base resolution bisulfite sequencing data, we used methylation broadness to measure the DNA methylation level in specific genome regions, which was calculated as the proportion of methylated CpG sites among the total CpG sites in a sequence (we denote it as "mCG/CG" hereafter).
We also used "normalized" CpG content, the observed over expected CpG ratio (CpGO/E) in a sequence, to infer the pattern of DNA methylation in the human genome. CpGO/E is a robust measure of the level of DNA methylation on an evolutionary time scale due to specific mutational mechanisms of methylated cytosines . Briefly, methylated cytosines are hypermutable due to their vulnerability to spontaneous deamination, which causes a gradual depletion of CpG dinucleotides from methylated regions over evolutionary time. Consequently, genomic regions that are subject to strong germline DNA methylation (hypermethylated) would decrease the extent of CpG dinucleotide content over time and, thus, have lower-than-expected CpGO/E. In contrast, regions that undergo weak germline DNA methylation (hypomethylated) maintain high CpGO/E. This measure has been successfully used to indirectly measure historical DNA methylation levels. In particular, the pattern of DNA methylation inferred from CpGO/E corresponds well to the actual pattern of DNA methylation in such diverse taxa as human and sea squirt. CpGO/E was calculated as the frequency of CpG sites divided by the frequency of C and G . The pattern of DNA methylation inferred from CpGO/E corresponds well to the actual pattern of DNA methylation in human stem cells (H1 cell line) and fetal lung fibroblasts (IMR90) [14, 15]. Since the DNA methylation level of two strands in any given genomics region are highly correlated, here we used the sense strand to represent the DNA methylation level for a given gene promoter region. Similar results were obtained in this study when we used the methylation level of anti-sense (data not shown).
The miRNAs and their predicted targets were extracted from R package RmiR.hsa , including miRNA target site prediction results from 6 sources: miRBase, targetScan, miRanda, tarBase, mirTarget2 and PicTar. In this study, we used the target site prediction results from two approaches: mirTaeget2 and PicTar.
We obtained the expression data of 409 microarray experiments from McVicker and Green (2010) , which were collected from 12 studies [[12, 13, 27–36]], representing a wide variety of germ and somatic tissues. As these studies used two different platforms (Affymetrix microarrays hgu133plus2 and hgu133A), we only considered the probe sets shared by both arrays. The methods to process the raw intensity data and to assign the probe sets to genes were described in McVicker and Green (2010) . In total, we assigned an expression intensity of 9858 genes in 409 tissues. Among the 409 tissues, 64 containing germ cells were considered as germline tissues, with the exception of germ cell tumors, embryonic stem cells, and immortalized cell lines (see additional file 1).
Because the above data sets are highly redundant in terms of tissue or cell type, we only used Gene Expression Atlas data to estimate the relative expression broadness (EB, number of tissues where a gene is expressed). This data has been widely used to estimate gene expression broadness. The Affymetrix raw data was downloaded from the website of the authors in reference . It comprised 156 human (U133A/GNF1H) microchip experiments in 79 tissues. The expression level detected by each probe set was obtained as the average difference (AD) value computed from MAS 5.0 algorithm (MAS5) . The AD values were averaged among replicates. Using the annotation tables from the original study [36, 38] and the Ensembl EnsMart tool, we mapped the probe IDs used in the microarray experiments to Ensemble gene identifiers. In approximately 20% of the cases, multiple probes in the microarray targeted onto a single gene. The expression intensities of multiple probes that corresponded to one gene were averaged after discarding all the low-confidence probe sets (indicated by a suffix of ''_x_at'' or ''_s_at'' in the Affymetrix IDs) . In this study, we used an AD value of 200 as the threshold to calculate the EB, as we did in our previous work .
The gene expression data of two human cell lines H1 and IMR90 was obtained from reference . The expression data was generated by a whole RNA sequencing (RNA-Seq) approach. The reads per kilobase of transcript per million reads (RPKM) were used to represent the expression level of each gene.
We retrieved 427 human cancer genes and their annotations from the Cancer Gene Census database (CGC, 2010-03-30 version) . Since a cancer gene may act in a dominant or recessive manner [41, 42], we classified these 427 cancer genes as two groups, i.e., dominant gene group (337 genes) and recessive gene group (85 genes), according to their annotations in the CGC database. There were 5 genes with ambiguous classification in the database and they were excluded in this analysis.
We identified the human-specific indel events in 3' UTR regions as described in . The 17-way vertebrate alignment, i.e., multiple alignments of 16 vertebrate genomes to the human genome (hg18), was obtained from .
An in-house Perl script was used to extract the orthologous 3' UTR alignment information and to identify the human-specific indel events. Human-specific insertion event rate and deletion event rate in the 3' UTR regions were calculated based on percent nucleotide difference. The indel rate equals to the sum of the lengths of all indels in the aligned human sequences divided by the total length of the aligned sequences.
Although methylation of gene's promoter regions has long been considered a suppressor of gene expression [17, 45], it still remains unclear to which extent the promoter's DNA methylation contributes to the influence of gene expression level [45, 46]. For example, most promoters having CpG islands (CGIs) remain unmethylated even in cells that do not express the corresponding gene. On the contrary, most CpG-poor promoters are hypermethylated even in somatic cells where the genes are expressed . What is equally uncertain is the contribution of promoter methylation to the tissue-specific gene expression. Although many studies have shown the tissue-specific differentially methylated regions (T-DMRs) could connect to the gene expression reprogramming in different tissues or developmental stages, others failed to demonstrate such a connection based on the analysis of a small set of genes [48, 49].To better understand the relationship between DNA methylation regulation and the gene expression regulation through miRNA targeting, we explored to what extent promoter methylation affects the gene expression level using the genome-wide data set collected in this study. We used two independent measurements, i.e., methylation broadness and normalized CpG content (CpGO/E), to test the correlation of promoter methylation and gene expression level.
We next tested the hypothesis that a functional complementation exists between transcriptional promoter region methylation regulation and post-transcriptional microRNA regulation. We retrieved unique miRNAs and their target sites for each human gene based on the predicted miRNA binding sites using mirTarget2  and PicTar  algorithms. We chose these two algorithms because most of the randomly selected miRNA targets predicted by mirTarget2 and PicTar have been validated as true targets [50, 52]. Genes that have long 3' UTRs are likely to be regulated by more miRNAs ; thus, we treated the 3' UTR length as a proxy of the number of miRNA target sites for an additional correlation analysis.
Spearman's rank correlation coefficients (ρs) and partial correlations between gene's promoter methylation level and the number of microRNA target sites
Number of microRNA target sites by mirTarget2 (Nt)
Number of microRNA target sites by PicTar (Np)
3' UTR length (UL)
Promoter methylation level ( m )
Correlation ρ s ( m, Nt )
Partial correlation ρ s ( m,Nt|gene expression level)
Correlation ρ s ( m, Np )
Partial correlation ρs ( m,Np|gene expression level)
Correlation ρ s ( m, UL )
Partial correlation ρ s ( m,UL|gene expression level)
Promoter mCG/CG (IMR90)
We further used the 3' UTR length to approximately measure the number of miRNA target sites. Consistent with the above results, we found negative correlations between 3' UTR length and promoter methylation level in both human methylomes (H1 and IMR90) (Table 1). This analysis revealed that the genes with a higher promoter methylation level tended to have shorter 3' UTRs at the genome level.
We questioned whether the observed correlations above are unique in the human genome. Thus, we investigated the relationship between promoter DNA methylation level and the number of miRNA target sites in mice. We retrieved the corresponding gene structure data from the ENSEMBL database. The data processes that included the definition of TSS and estimation of 3' UTR length were the same as in humans, as described in the Methods section. We found a highly significant correlation between promoter CpGO/E and 3' UTR length (Spearman's ρ = 0.24; P < 10-15), indicating that the negative correlation pattern between promoter region methylation and number of miRNA target sites still holds in mice. Since mammalian genomes share many CpG island features in their promoter regions , it is likely that the observed correlation is common in mammals, or even in many vertebrates.
We next specifically investigated whether the above observed enrichment of miRNA targets among genes with a lower promoter methylation level was a by-product of ancillary features of the analyzed gene sets. The results from the following analyses indicated this was not the by-product.
First, we asked whether the relationship between DNA methylation and miRNA regulation could be explained by the underlying gene expression levels since the DNA methylation of a gene's promoter regions and gene expression level is correlated in the majority of eukaryotes, and gene expression level is often positively correlated with the number of miRNA target sites. We estimated partial correlations  between DNA methylation and number of miRNA target sites after removing the contributions of gene expression level. The corresponding corrections were still highly significant, suggesting that covariance between DNA methylation (or the number of miRNA target sites) and gene expression level could not account for the observed relationships between DNA methylation and the number of miRNA target sites. As shown in Table 1. Although the partial correlations between DNA methylation and miRNA regulation decreased after removing the effects of gene expression level, they still showed high significance
Second, broadly expressed genes tended to avoid miRNA regulation [55, 56], implying that the correlation between promoter methylation and miRNA regulation could have been affected by the greater chance of higher DNA methylation level in broadly expressed genes' promoter regions.
Third, recent studies found genes with more transcription factor binding sites (TFBS) have a higher probability to be controlled by miRNAs .We examined whether the promoter methylation levels are correlated with the number of TFBS. We extracted the TFBS data from . A total of 22,067 genes had both TFBS and promoter methylation data. We found the correlation between TFBS and promoter methylation was very weak (Spearman's ρ = -0.016 for TFBS and CpGOE; ρ = -0.07 for TFBS and mCG/CG using H1 mythylome data). This observation suggested that the correlations between promoter methylation level and the number of miRNA targets was not a side effect of the correlation of TFBS site number and the number of miRNA target sites.
Finally, a previous study found that gene evolutionary rates were negatively correlated with the number of their regulatory miRNAs . Therefore, we speculated genes with stronger promoter methylation repression (tend to be regulated by fewer miRNAs) might have evolved faster in their 3' UTRs and could have insertion or deletion bias. A possible mechanism of the negative correlation between promoter methylation and the number of regulatory miRNAs is that genes with hypermethylated promoters may in turn shorten their 3' UTRs to reduce possible miRNA regulation. We tested this hypothesis by the following analyses. We extracted the human-mouse one-to-one orthologous 3' UTR sequences from PACdb  and aligned these orthologous sequences using the computer program Clustal W . We calculated the substitution rates per site (termed as K 3u ) based on the Kimura's two-parameter model . We found a weak positive correlation between K 3u and the promoter methylation level (Spearman's ρ = 0.15, P < 10-15 between K 3u and mCG/CG using H1 mythylome data; ρ = -0.1, P < 10-10 between K 3u and CpGO/E), indicating promoter hypermethylated genes tended to evolve faster in their 3' UTRs. We identified the human-specific insertion rate and deletion rate for the 3' UTRs of all genes (see Methods). However, there was no evidence to show that promoter hypermethylated genes tended to shorten their 3' UTR length (P > 0.1). Further studies of promoter methylation and 3' UTR evolution will be needed to uncover the underlying mechanisms of the connection between promoter methylation level and the number of miRNA target sites.
Summary of microRNA target sites and methylation data in gene's promoter regions
Mean of microRNA target sites (gene number)
Mean of mCG/CG (gene number*)
Mean of CpGO/E (gene number)
CGI number in promoter region (gene number)
We next compared the features in two major groups of cancer genes: dominant and recessive cancer genes. Among the 427 cancer genes, there were 337 dominant cancer genes and 85 recessive cancer genes based on their annotations in the CGC database. We analyzed their DNA methylation levels and number of miRNA target sites. For a normalized methylation level and CpGO/E, no significant difference was detected between the dominant and recessive cancer genes. However, the number of miRNA target sites in the dominant cancer genes (19.18) was larger than that of recessive cancer genes (16.16). Finally, the number of CGIs in the promoter regions of the dominant cancer genes (0.73) was significantly smaller than that of the recessive cancer genes (0.87, χ 2 test, P<10-15). These comparisons suggested the different inheritable mechanisms of the dominant and recessive cancer genes in cancer, as we recently examined in the protein-protein interaction level .
Collectively, we observed that the promoter region methylation level in cancer genes was negatively correlated with their number of miRNA target sites. This observation still held after filtering the potential confounding effects from gene expression level or expression broadness. This analysis indicated that the cancer genes tended to be silenced by miRNA genes but could escape from DNA methylation suppression.
To understand how DNA methylation and miRNA regulate the expression of their target genes, many previous exploratory studies have been reported, but all of them focused on the effect of each mechanism on the expression of target genes. In this study, we investigated the relationship between promoter methylation and miRNA regulation at the genome level by taking advantage of recently released human methylome data and high quality miRNA and other gene annotation data. Our results suggested that there is a functional complementation between promoter methylation regulation at the transcription level and miRNA regulation at the post-transcriptional level. Specifically, the genes that are under stronger promoter DNA methylation control tend to avoid miRNA regulation by having fewer miRNA target sites, and vice versa.
From an evolutionary perspective, both recruitment of DNA methylation in a gene's promoter region and the advent of new miRNA genes during the transition from invertebrate to vertebrate contributed to the high complexity of vertebrate organisms and cell types [[63–65]]. Although many recent studies have greatly improved our understanding of the evolutionary adaptations and conservation of DNA methylation and miRNA regulation, the relationship between DNA methylation and miRNA regulation, and how these two mechanisms dynamically influence each other's evolution and function, remain poorly understood. The results supporting complementary regulation between DNA methylation and miRNA function in this study provided the first attempt to uncover such an important and complex regulation system, which will help us understand the evolutionary forces towards organisms' complexity beyond traditional sequence level investigation.
We thank Graham McVicker and Phil Green (University of Washington) for their help in microarray data analysis and providing their processed data. We thank members of the Zhao lab for useful discussion and suggestions and Rebecca Posey for English polishing of the manuscript. This work was partially supported by NIH grant (LM009598) from the National Library of Medicine. Z. Zhao received additional support from Vanderbilt's Specialized Program of Research Excellence in GI Cancer grant (50CA95103) and the VICC Cancer Center Core grant (P30-CA68485).
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.