Relating gene expression evolution with CpG content changes
© Yang et al.; licensee BioMed Central Ltd. 2014
Received: 21 February 2013
Accepted: 15 August 2014
Published: 20 August 2014
Previous studies have shown that CpG dinucleotides are enriched in a subset of promoters and the CpG content of promoters is positively correlated with gene expression levels. But the relationship between divergence of CpG content and gene expression evolution has not been investigated. Here we calculate the normalized CpG (nCpG) content in DNA regions around transcription start site (TSS) and transcription terminal site (TTS) of genes in nine organisms, and relate them with expression levels measured by RNA-seq.
The nCpG content of TSS shows a bimodal distribution in all organisms except platypus, whereas the nCpG content of TTS only has a single peak. When the nCpG contents are compared between different organisms, we observe a different evolution pattern between TSS and TTS: compared with TTS, TSS exhibits a faster divergence rate between closely related species but are more conserved between distant species. More importantly, we demonstrate the link between gene expression evolution and nCpG content changes: up-/down- regulation of genes in an organism is accompanied by the nCpG content increase/decrease in their TSS and TTS proximal regions.
Our results suggest that gene expression changes between different organisms are correlated with the alterations in normalized CpG contents of promoters. Our analyses provide evidences for the impact of nCpG content on gene expression evolution.
In vertebrates, CpG dinucleotides are substantially depleted compared to what would be expected by chance . This is caused by the relatively high mutation rate from CpG to TpG. Deamination of cytosine gives rise to uracil, which, as a “foreign” nucleotide, is easy to be recognized and corrected by DNA repair system. However, when the cytosine in CpG sites is methylated, deamination of methylcytosine produces thymine, which cannot be recognized as foreign and thus less likely to be repaired . As a consequence, hypermethylated DNA regions are more likely to lose CpG dinucleotides. In vertebrates, DNA methylation serves as an important mechanism for regulating gene expression, and a large fraction of CpG sites are methylated [3, 4], leading to an overall depletion of CpG dinucleotides in the genome . In some DNA regions, however, the CpG sites are not methylated in germline cells and therefore are preserved or even over-represented [6–8]. These regions are termed as CpG islands (CGIs), which typically occur at or near the transcription start site of genes, particularly, in the vicinity of housekeeping genes . In addition to DNA methylation, other evolutionary processes, such as biased gene conversion [9–11], have also been proposed to explain the evolution of GC% as well as the generation and maintenance of CGIs.
Paradoxically, there is still no satisfying definition for CGI. To identify them in a genome, arbitrary thresholds have been used . For example, a widely applied definition of CGI is a region with ≥200 bp, GC% > 50%, and an observed-to-expected CpG ratio > 60% . Based on the presence of CGI in the vicinity of promoters, genes can be divided into CGI-associated and non-associated. But again, there is no satisfying way to associate CGIs with genes. To address this issue in the context of promoter studies, Saxonov et al. defined a metric called normalized CpG (nCpG) content-- the ratio of the observed number of CpG dinucleotide to the expected number within a 3 kb region around the TSS of genes . They found that human promoters displayed a bimodal distribution in their nCpG content, and therefore could be divided into two classes: high CpG promoters (HCPs) and low CpG promoters (LCP).
The relationship between GC% of genes and gene expression levels has been studied, which showed only a weak correlation [8, 14–16]. The normalized CpG content, however, has been reported to be highly predictive to the activities of promoters measured by systematic luciferase assays . Normalized CpG content alone predicted the activities of ‘ubiquitously’ expressed promoters with high accuracy (R = 0.75, R is the correlation coefficient between predicted and actual activities). In our previous studies, we also found a high correlation between nCpG content of promoters and expression level of TSSs quantified by Cap Analysis of Gene Expression (CAGE) in human cell lines .
To understand phenotypic evolution, gene expression changes in different species have been studied based on microarray data [19–22] and more recently based on RNA-seq data . It has been suggested that the divergence of gene expression is largely driven by the evolution of transcription factor binding sites [24–26]. Giving the high correlation between expression level and normalized CpG content of genes, we hypothesize that the expression divergence of genes should be reflected by the changes of CpG content in their promoters.
To test this hypothesis, we utilize the RNA-seq expression data in nine organisms and correlate the expression changes with nCpG content difference between different organisms. Our results suggest a positive correlation between them when two distantly related organisms are compared, e.g. human versus mouse. TSSs show a bimodal distribution in their nCpG contents diving them into high CpG and low CpG promoters, while there is only a single peak in the distribution of TTS nCpG content. We also observe different evolution patterns between TSS and TTS in their nCpG contents: TSSs exhibit faster divergence rates than TTSs in the nCpG content between closely related species, but are more conserved when distantly related species are compared. Our analysis provides new insights into the impact of nCpG content on gene expression evolution.
Normalized CpG content of promoters in nine species
The normalized CpG contents of high CpG and low CpG promoters in nine organisms
Conservation of normalized CpG content
In addition, we examine the conservation of HCP/LCP gene category between organisms. Specifically, for each pair of the eight organisms (excluding platypus) we select the orthologous gene pairs with only a single TSS in both organisms, and count the number of pairs that are HCP in both (HH), LCP in both (LL), and HCP in one but LCP in the other (HL and LH). Our results indicate that the HCP/LCP category is very conserved during the evolution (Additional file 3). As an example, for human versus mouse there are 277 HH pairs and 132 LL pairs, but only 54 HL pairs and 18 LH pairs. Namely, the majority of genes (85%) have a conserved HCP/LCP category between human and mouse (P = 7e-50, χ2 test).
Correlation between normalized CpG contents and gene expression levels
It has been reported previously that nCpG content is correlated with expression level of genes [13, 18]. The availability of gene expression data in nine organisms enables us to make a more systematic investigation on this issue. We compare the expression levels of HCP and LCP genes in all tissues of the eight organisms (platypus is excluded) and confirm that HCP genes have significantly higher expression levels than LCP genes (Additional file 4). Compared to the HCP class, the LCP class has a larger fraction of non-expressed genes (expression is not detected by RNA-seq). Even after the non-expressed genes are excluded from comparison, HCP genes still show significantly higher expression levels than LCP genes.
Correlation of gene expression levels with nCpG contents of TSSs and TTSs
We next extend our correlation analysis to human and mouse microarray data. Again, we observe positive correlations between CpG content of TSS and gene expression levels in all of the 79 human tissues and the 61 mouse tissues. But compared to the RNA-seq data, the correlations in microarray data are much lower, with the largest correlation coefficient r = 0.287 in human (Additional file 5) and r = 0.346 in mouse (Additional file 6). This might reflect the quality difference between RNA-seq and microarray expression data: RNA-seq data is known to be more sensitive and more accurate than microarray data [31, 32].
Relationship between normalized CpG difference and gene expression evolution
We perform the same analysis for all pair of organisms and confirm the relationship between CpG content change and gene expression divergence (Additional file 7). Such a relationship can be observed for TSS and TTS in all distantly related organism pairs. However, when two organisms are closely related (e.g. within the primate group), the trend is hardly detected, presumably, due to short divergence time.When we identify the differentially expressed genes between human and mouse using two-fold as the threshold, we find that genes highly expressed in human have significantly larger nCpG content difference (human versus mouse) for both TSS and TTS (Figure 5C and 5D), which again confirms the relationship between CpG divergence and gene expression change. Note that due to a global increase of nCpG content of TSS in human relative to mouse, even genes lowly expressed in human tend to have higher nCpG content in their TSS proximal regions (dCpG > 0).
A similar trend analysis shown in Figure 5A is also performed by comparing human and mouse microarray expression data in matched tissues. However, when microarray data are used, we cannot detect the relationship between nCpG content difference and gene expression change described above (Additional file 8). The up-regulated group and the down-regulated group in human versus mouse identified based on microarray data do not show significant difference in their normalized CpG contents.
To study the impact of CpG islands (CGIs) on gene expression, most previous studies associated genes with nearby CpG islands to divide genes into two categories: CGI associated and non-associated. It is often tricky and arbitrary to determine the cut-off values for identifying CGIs and for associating them with genes. Here, we choose a different strategy by focusing on the TSS and TTS proximal DNA regions of genes. Generally, regulatory elements are highly enriched in TSS but not in TTS regions . Here we include TTS as a control for TSS, since the TTS and TSS often share similar sequence features-- as shown by the high correlation in nCpG content between TSS and TTS in platypus. In eight of the nine organisms we observe a bimodal distribution of TSS nCpG content, suggesting that there are two different promoter classes: HCP and LCP. HCPs are enriched for CpG dinucleotide and in most cases are associated with a nearby CGI. In contrast to the bimodal distribution of TSS, there is only a single peak in the distribution of TTS nCpG content. In addition, We observe quite different evolution patterns between TSS and TTS in their nCpG content (Figure 3): between closely related species TSS diverged in a higher rate than TTS, while in distantly related species TSS are more conserved. These results reveal a dual character of promoters during evolution: they exert more impact on gene divergence, and meanwhile, they are subject to more selective constraints. This idea may be extended to CGIs, since they are the major contributors to high CpG content of HCPs. In line with this, CGIs have been shown to harbor many regulatory elements and are active regulators for transcription .
In the nine organisms, platypus exhibits a very different evolutionary pattern. First, the CpG content of platypus TSS does not show a bimodal distribution: the HCP peak is missing. Second, the correlation of nCpG contents between TSS and TTS in platypus is 0.689, much higher than all the other organisms. Third, in platypus TSS and TTS CpG contents have comparable correlations with gene expression levels; while in other organism TSS show a much higher correlation than TTS. Together with the fact that platypus has an extremely higher G + C% content (45.5%) and a smaller number of CGIs , this may suggest that the regulatory function and mechanism of DNA methylation in platypus is different from other species.
Our analysis shows a clear relationship between gene expression change and nCpG content divergence in two distantly related species, such as human versus mouse. Compared to down-regulated genes, genes up-regulated in human tend to have higher nCpG content relative to mouse in both TSS and TTS proximal DNA regions. Such a relationship is observed when RNA-seq is used to measure gene expression levels. However, the same analysis using microarray data fails to show such a relationship. Moreover, the correlation between microarray expression level of genes and nCpG content of promoters is very weak. The expression changes of orthologous genes in different species are often subtle and are complicated by many confounding factors issues such as cross-species normalization. For this reason, the relationship between gene expression change and nCpG divergence can only be revealed by RNA-seq data, which is more sensitive and precise than microarray data. On the other hand, the nCpG divergence between two species requires a long period of time for accumulating mutations. Thus the relationship can only be observed between distantly related species.
If the occurrence of CGIs and HCPs is merely a consequence of low DNA methylation rate of these DNA regions in germline cells, one may expect the correlation between nCpG content and gene expression levels to be observed only in germline cells. However, our study shows that such a correlation can be observed in all of the six tissues. This is because (1) expression profiles in different tissues are highly correlated and thus gene expression in non-germline tissues is overall similar to expression in germline cells; (2) more importantly, CGIs and HCPs are enriched for functional elements, which directly affect the expression level of genes. For example, the CpG binding protein CFP1 regulates histone modification through binding to DNA containing unmethylated CpG motifs and consequently affects gene expression . CGIs are associated with specific DNA sequence features that are critical for their roles in regulating gene expression. On one hand, DNA sequence features associated with CGIs facilitate the formation of a transcriptionally permissive chromatin state in CGI associated promoters by destabilizing nucleosomes and attracting proteins . In fact, most housekeeping genes are associated with CGIs in their promoters and these CGIs are generally unmethylated, whereas tissue specific promoters usually are not associated with CGIs. On the other hand, CGI associated promoters can be silenced through dense CpG methylation  or polycomb recruitment [29, 38], again using their distinctive DNA sequence composition.
It has been suggested that DNA methylation in promoter regions represses gene expression . We calculated the correlation coefficients between gene expression and promoter methylation (from TSS to 200 bp upstream) across all transcribed genes in hESC and IMR90 cells using ENCODE data. We observed weak correlations with r = −0.37 in hESC and r = −0.22 in IMR90, which are much lower than the correlation coefficient between normalized CpG contents for TSS and gene expression levels in human. Many highly methylated genes are transcribed with high expression levels. Consistent with our observations, Du et al. reported a weak negative correlation between gene expression and promoter methylation in H1 cell line with r = −0.24 . In addition, more recent studies have demonstrated that the across individual methylation-gene expression associations can be either positive or negative, even for DNA methylation sites in promoter regions [41, 42]. Despite the correlation between gene expression and DNA methylation, it remains unclear whether DNA methylation is the cause or the consequence of altered gene expression. In fact, recent studies showed that DNA methylation might be a passive reflection of transcription factor binding or a consequence of gene repression [43, 44]. This is supported by the negative correlation between transcription factor expression and the methylation levels of their binding sites , and by the depletion of cytosines within transcription factor binding sites . In this study, we demonstrate a correlation between gene expression change and nCpG content divergence between distant species. It would be interesting to investigate whether and how DNA methylation is involved in such a relationship.
In conclusion, comparative analysis in nine vertebrate organisms suggests that gene expression changes between organisms are correlated with the alterations in the normalized CpG contents of promoters. It provides evidences that support the impact of nCpG content change on gene expression evolution.
Gene expression data and DNA sequences
RNA-seq gene expression data were downloaded from Brawand et al., which measured transcript and gene expression levels in six tissues (brain, cerebrum, heart, liver, kidney and testis) of nine organisms: human, chimpanzee, gorilla, orangutan, macaque, mouse, opossum, platypus and chicken . Gene expression levels were represented as RPKM (reads per kilobase per million mapped reads) and were normalized so that levels of orthologous genes in different organisms are directly comparable . For most tissues, expression levels in multiple samples were available in each organism. In these cases, we calculated their average at the log scale (log2 RPKM) to obtain the final expression levels. Microarray gene expression data for human and mouse were available from Su et al. , which contained expression levels of genes in 79 human tissues and 61 mouse tissues.
DNA sequences around TSS and TTS (−1.5 kb ~ 1.5 kb) of genes were extract from whole genome sequences. The genomic locations of transcripts in the nine organisms were determined based gene annotation from Ensembl database . The Ensembl 57 assembly was used. The orthologous genes pairs are determined by referring to Brawand et al. .
Calculation of normalized CpG content
For each transcript, normalized CpG contents (nCpG) of TSS and TTS were calculated based on DNA sequences of 3 kb (1.5 kb upstream to 1.5 kb downstream of a TSS/TTS). Normalized CpG content was defined as the ratio of observed number of CpG dinucleotide (observed CpG) to the expected number (expected CpG), and was calculated using the method described in Saxonov et al. . Expected CpG was calculated as (GC content/2)2. Some genes possess multiple transcripts, which may have different TSS and/or TTS. In these cases, we used the average nCpG of these TSS/TTSs to represent the nCpG of the genes. Alternatively, the maximum nCpG contents of these TSS/TTSs were used to represent the nCpG content of the genes. These two definitions of TSS/TTS nCpG contents for genes resulted in consistent results and conclusions.
With the exception of platypus, the TSS nCpG contents in all organisms demonstrate a bimodal distribution. To define high CpG promoters (HCPs) and low CpG promoters (LCPs), we set the threshold in an organism as the nCpG contents at the lowest density between the two peaks in the distribution, with promoters on the right side as HCPs and promoters on the left side as LCPs.
Calculation of tissue specificity score for genes
where fi is the ratio of the gene expression level in tissue i to its sum total expression level across all tissues, and pi = 1/n for all tissues (n = 79 for human and n = 61 for mouse, which is the total number of tissues), is the fractional expression of a gene under a null model assuming uniform expression across all tissues. A larger TSPS value suggests more specific expression of a gene in a single or a few tissues, whereas a TSPS value of zero suggests uniform expression of the gene.
Calculation of correlation coefficients
The Spearman correlation coefficient between TSS/TTS nCpG content and gene expression levels are calculated based on all genes in each organism. Similarly, the correlation coefficient r is calculated between TSS nCPG content and TTS nCpG content for each organism. The cross-organism spearman correlation coefficient of TSS or TTS nCpG content was calculated based on all orthologous gene pairs between two organisms.
The significance for a given correlation coefficient r is estimated based on the Fisher z-transformation. Specifically, we calculated , in which N is the total number of samples (e.g. the total number of genes for calculating correlation coefficient between TSS nCpG content and gene expression level in an organism). The p-value was then calculated by referring z to a standard normal distribution.
This work was supported by the Centers of Biomedical Research Excellence (COBRE) grant GM103534, and the start-up funding package provided to CC by the Geisel School of Medicine at Dartmouth College. HY was supported by the National Natural Science Foundation of China (Grant No. 81300467). We thank Dr. Zhigang Li at Dartmouth College for constructive discussion. We thank the anonymous reviewers for the very useful comments.
- Bird AP: DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980, 8 (7): 1499-1504.PubMed CentralPubMedView ArticleGoogle Scholar
- Duncan BK, Miller JH: Mutagenic deamination of cytosine residues in DNA. Nature. 1980, 287 (5782): 560-561.PubMedView ArticleGoogle Scholar
- Jones PA, Takai D: The role of DNA methylation in mammalian epigenetics. Science. 2001, 293 (5532): 1068-1070.PubMedView ArticleGoogle Scholar
- Fazzari MJ, Greally JM: Epigenomics: beyond CpG islands. Nat Rev Genet. 2004, 5 (6): 446-455.PubMedView ArticleGoogle Scholar
- Sved J, Bird A: The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc Natl Acad Sci U S A. 1990, 87 (12): 4692-4696.PubMed CentralPubMedView ArticleGoogle Scholar
- Yoder JA, Walsh CP, Bestor TH: Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997, 13 (8): 335-340.PubMedView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921.PubMedView ArticleGoogle Scholar
- Ponger L, Duret L, Mouchiroud D: Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res. 2001, 11 (11): 1854-1860.PubMed CentralPubMedGoogle Scholar
- Cohen NM, Kenigsberg E, Tanay A: Primate CpG islands are maintained by heterogeneous evolutionary regimes involving minimal selection. Cell. 2011, 145 (5): 773-786.PubMedView ArticleGoogle Scholar
- Meunier J, Duret L: Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. 2004, 21 (6): 984-990.PubMedView ArticleGoogle Scholar
- Galtier N, Piganeau G, Mouchiroud D, Duret L: GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics. 2001, 159 (2): 907-911.PubMed CentralPubMedGoogle Scholar
- Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol. 1987, 196 (2): 261-282.PubMedView ArticleGoogle Scholar
- Saxonov S, Berg P, Brutlag DL: A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci U S A. 2006, 103 (5): 1412-1417.PubMed CentralPubMedView ArticleGoogle Scholar
- Semon M, Mouchiroud D, Duret L: Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet. 2005, 14 (3): 421-427.PubMedView ArticleGoogle Scholar
- Urrutia AO, Hurst LD: The signature of selection mediated by expression on human genes. Genome Res. 2003, 13 (10): 2260-2264.PubMed CentralPubMedView ArticleGoogle Scholar
- Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 2003, 13 (9): 1998-2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Landolin JM, Johnson DS, Trinklein ND, Aldred SF, Medina C, Shulha H, Weng Z, Myers RM: Sequence features that drive human promoter function and tissue specificity. Genome Res. 2010, 20 (7): 890-898.PubMed CentralPubMedView ArticleGoogle Scholar
- Cheng C, Alexander R, Min R, Leng J, Yip KY, Rozowsky J, Yan KK, Dong X, Djebali S, Ruan Y, Davis CA, Carninci P, Lassman T, Gingeras TR, Guigo R, Birney E, Weng Z, Snyder M, Gerstein M: Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res. 2012, 22 (9): 1658-1667.PubMed CentralPubMedView ArticleGoogle Scholar
- Khaitovich P, Hellmann I, Enard W, Nowick K, Leinweber M, Franz H, Weiss G, Lachmann M, Paabo S: Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science. 2005, 309 (5742): 1850-1854.PubMedView ArticleGoogle Scholar
- Liao BY, Zhang J: Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol. 2006, 23 (3): 530-540.PubMedView ArticleGoogle Scholar
- Liao BY, Zhang J: Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol. 2006, 23 (6): 1119-1128.PubMedView ArticleGoogle Scholar
- Yang J, Su AI, Li WH: Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. Mol Biol Evol. 2005, 22 (10): 2113-2118.PubMedView ArticleGoogle Scholar
- Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, Albert FW, Zeller U, Khaitovich P, Grutzner F, Bergmann S, Nielsen R, Paabo S, Kaessmann H: The evolution of gene expression levels in mammalian organs. Nature. 2011, 478 (7369): 343-348.PubMedView ArticleGoogle Scholar
- Dermitzakis ET, Clark AG: Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002, 19 (7): 1114-1121.PubMedView ArticleGoogle Scholar
- Borneman AR, Zhang ZD, Rozowsky J, Seringhaus MR, Gerstein M, Snyder M: Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms. Funct Integr Genomics. 2007, 7 (4): 335-345.PubMedView ArticleGoogle Scholar
- Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E: Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat Genet. 2007, 39 (6): 730-732.PubMed CentralPubMedView ArticleGoogle Scholar
- Pask AJ, Papenfuss AT, Ager EI, McColl KA, Speed TP, Renfree MB: Analysis of the platypus genome suggests a transposon origin for mammalian imprinting. Genome Biol. 2009, 10 (1): R1-PubMed CentralPubMedView ArticleGoogle Scholar
- Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP, Grutzner F, Belov K, Miller W, Clarke L, Chinwalla AT, Yang SP, Heger A, Locke DP, Miethke P, Waters PD, Veyrunes F, Fulton L, Fulton B, Graves T, Wallis J, Puente XS, Lopez-Otin C, Ordonez GR, Eichler EE, Chen L, Cheng Z, Deakin JE, Alsop A, Thompson K, Kirby P, et al: Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008, 453 (7192): 175-183.PubMed CentralPubMedView ArticleGoogle Scholar
- Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SM, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, et al: Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature. 2007, 447 (7141): 167-177.PubMedView ArticleGoogle Scholar
- Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, Khatun J, Lajoie BR, Landt SG, Lee BK, Pauli F, Rosenbloom KR, Sabo P, Safi A, Sanyal A, Shoresh N, Simon JM, Song L, Trinklein ND, Altshuler RC, Birney E, Brown JB, Cheng C, Djebali S, Dong X, Dunham I, et al: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489 (7414): 57-74.View ArticleGoogle Scholar
- Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011, 12 (2): 87-98.PubMed CentralPubMedView ArticleGoogle Scholar
- Wilhelm BT, Landry JR: RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods. 2009, 48 (3): 249-257.PubMedView ArticleGoogle Scholar
- Consortium EP, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489 (7414): 57-74.View ArticleGoogle Scholar
- Deaton AM, Bird A: CpG islands and the regulation of transcription. Genes Dev. 2011, 25 (10): 1010-1022.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee JH, Skalnik DG: CpG-binding protein (CXXC finger protein 1) is a component of the mammalian Set1 histone H3-Lys4 methyltransferase complex, the analogue of the yeast Set1/COMPASS complex. J Biol Chem. 2005, 280 (50): 41725-41731.PubMedView ArticleGoogle Scholar
- Ramirez-Carrozzi VR, Braas D, Bhatt DM, Cheng CS, Hong C, Doty KR, Black JC, Hoffmann A, Carey M, Smale ST: A unifying model for the selective regulation of inducible transcription by CpG islands and nucleosome remodeling. Cell. 2009, 138 (1): 114-128.PubMed CentralPubMedView ArticleGoogle Scholar
- Mohn F, Weber M, Rebhan M, Roloff TC, Richter J, Stadler MB, Bibel M, Schubeler D: Lineage-specific polycomb targets and de novo DNA methylation define restriction and potential of neuronal progenitors. Mol Cell. 2008, 30 (6): 755-766.PubMedView ArticleGoogle Scholar
- Ku M, Koche RP, Rheinbay E, Mendenhall EM, Endoh M, Mikkelsen TS, Presser A, Nusbaum C, Xie X, Chi AS, Adli M, Kasif S, Ptaszek LM, Cowan CA, Lander ES, Koseki H, Bernstein BE: Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 2008, 4 (10): e1000242-PubMed CentralPubMedView ArticleGoogle Scholar
- Jones PA: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012, 13 (7): 484-492.PubMedView ArticleGoogle Scholar
- Du X, Han L, Guo AY, Zhao Z: Features of methylation and gene expression in the promoter-associated CpG islands using human methylome data. Comp Funct Genomics. 2012, 2012: 598987-PubMed CentralPubMedView ArticleGoogle Scholar
- Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, Buil A, Ongen H, Yurovsky A, Bryois J, Giger T, Romano L, Planchon A, Falconnet E, Bielser D, Gagnebin M, Padioleau I, Borel C, Letourneau A, Makrythanasis P, Guipponi M, Gehrig C, Antonarakis SE, Dermitzakis ET: Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife. 2013, 2: e00523-PubMed CentralPubMedGoogle Scholar
- Ung M, Ma X, Johnson KC, Christensen BC, Cheng C: Effect of estrogen receptor alpha binding on functional DNA methylation in breast cancer. Epigenetics. 2014, 9 (4): 523-532.PubMed CentralPubMedView ArticleGoogle Scholar
- Medvedeva YA, Khamis AM, Kulakovskiy IV, Ba-Alawi W, Bhuyan MS, Kawaji H, Lassmann T, Harbers M, Forrest AR, Bajic VB, Consortium F: Effects of cytosine methylation on transcription factor binding sites. BMC Genomics. 2014, 15 (1): 119-PubMed CentralPubMedView ArticleGoogle Scholar
- Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, John S, Sandstrom R, Bates D, Boatman L, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, et al: The accessible chromatin landscape of the human genome. Nature. 2012, 489 (7414): 75-82.PubMed CentralPubMedView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004, 101 (16): 6062-6067.PubMed CentralPubMedView ArticleGoogle Scholar
- Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kahari AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al: Ensembl 2012. Nucleic Acids Res. 2012, 40 (Database issue): D84-D90.PubMed CentralPubMedView ArticleGoogle Scholar
- Ravasi T, Suzuki H, Cannistraci CV, Katayama S, Bajic VB, Tan K, Akalin A, Schmeier S, Kanamori-Katayama M, Bertin N, Carninci P, Daub CO, Forrest AR, Gough J, Grimmond S, Han JH, Hashimoto T, Hide W, Hofmann O, Kamburov A, Kaur M, Kawaji H, Kubosaki A, Lassmann T, van Nimwegen E, MacPherson CR, Ogawa C, Radovanovic A, Schwartz A, Teasdale RD, et al: An atlas of combinatorial transcriptional regulation in mouse and man. Cell. 2010, 140 (5): 744-752.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.