- Open Access
Prediction of promoters and enhancers using multiple DNA methylation-associated features
© Hwang et al.; licensee BioMed Central Ltd. 2015
Published: 11 June 2015
Regulatory regions (e.g. promoters and enhancers) play an essential role in human development and disease. Many computational approaches have been developed to predict the regulatory regions using various genomic features such as sequence motifs and evolutionary conservation. However, these DNA sequence-based approaches do not reflect the tissue-specific nature of the regulatory regions. In this work, we propose to predict regulatory regions using multiple features derived from DNA methylation profile.
We discovered several interesting features of the methylated CpG (mCpG) sites within regulatory regions. First, a hypomethylation status of CpGs within regulatory regions, compared to the genomic background methylation level, extended out >1000 bp from the center of the regulatory regions, demonstrating a high degree of correlation between the methylation statuses of neighboring mCpG sites. Second, when a regulatory region was inactive, as determined by histone mark differences between cell lines, methylation level of the mCpG site increased from a hypomethylated state to a hypermethylated state, the level of which was even higher than the genomic background. Third, a distinct set of sequence motifs was overrepresented surrounding mCpG sites within regulatory regions. Using 5 types of features derived from DNA methylation profiles, we were able to predict promoters and enhancers using machine-learning approach (support vector machine). The performances for prediction of promoters and enhancers are quite well, showing an area under the ROC curve (AUC) of 0.992 and 0.817, respectively, which is better than that simply based on methylation level, especially for prediction of enhancers.
Our study suggests that DNA methylation features of mCpG sites can be used to predict regulatory regions.
Transcriptional regulation plays an important role in most of biological processes. The interactions between transcription factors and regulatory regions, such as promoters and enhancers, are essential in transcriptional regulation. Therefore, identification of regulatory regions will provide mechanistic insight into various biological processes. Experimental and computational approaches have been developed to identify the regulatory regions on a genome-wide scale. For example, evolutionary conservation, sequence motifs, and clustering of transcription factor binding motifs (or cis-regulatory modules) can be used to predict regulatory regions [1–4]. However, these approaches are based purely on DNA sequences, which do not reflect the tissue-specific nature of the regulatory regions.
Recently, histone marks were measured on a genome-wide scale using the ChIP-seq technique [5–7]. These histone marks are good predictors for regulatory regions. For example, H3K4me1 is associated with active enhancers, while H3K4me3 is related to active promoters . In addition, DNase I hypersensitivity sites (DHS) also localize to open chromatin regions, which are likely regulatory regions [9, 10]. The ENCODE project has generated histone marks and DHS profiles in multiple cell lines and tissues [8, 9, 11, 12], which provide valuable information to understand the organization and dynamics of the regulatory regions in the cells.
In this work, we propose to predict regulatory regions by utilizing the DNA methylation patterns. DNA methylation, the addition of a methyl group to the fifth carbon of a cytosine residue adjacent to a guanine (a CpG site), is a well-studied epigenetic modification. While DNA methylation is considered stable and heritable in differentiated somatic cells, it can also change dynamically during the lifespan of a cell and is susceptible to diet and other environmental influences [13, 14]. DNA methylation is known to play a role in gene regulation. It is well accepted that DNA methylation in promoter regions represses the expression of the genes . High-throughput technologies, such as whole genome bisulfite sequencing and array-based methods, have enabled the mapping of DNA methylation patterns on a genomic scale, identifying hundreds of millions of methylated cytosines [16–19].
DNA methylation-based approach has been developed to predict regulatory regions in recent work [20, 21]. In these studies, low methylation regions are associated with distal regulatory elements, as they are enriched for active histone marks (e.g. H3K4me1), DNase-hypersensitive sites, and transcription factor binding sites (TFBS). In this study, we performed a comprehensive survey of mCpGs in regulatory regions and explore whether we can extract a set of DNA methylation dependent features besides the methylation level to improve the prediction. By examining these properties of the mCpG sites across different cell lines, we discovered that these sites did demonstrate specific genomic properties. Using these genomic features, we were able to predict regulatory regions using support vector machine approach. The paper is organized as follows. We first defined the positive and negative sets for regulatory region prediction. We then described the novel features derived from DNA methylation profiles. Finally, we utilized machine-learning approach (support vector machine) to predict regulatory regions (promoters and enhancers separately) based on the features we obtained. Our results demonstrate that the performance of the prediction based on multiple DNA methylation-associated features is better than the prediction solely based on methylation level.
Selection and assessment of positive and negative datasets
We used the previously established definition of a regulatory region as determined by genome-wide histone modification signatures . For example, H3K4me3 is known to be associated with promoters, while H3K4me1 with enhancers. Ernst et al. predicted different types of regulatory regions using a hidden Markov model. In this paper, we focus on two major regulatory regions: promoters and enhancers.
We also selected the same numbers of the mCpG sites in random genomic regions as negative datasets. To exclude the effect of differential methylation due to genomic location, we chose random genomic regions with the same relative distance to the nearest transcription start site (TSS) or exon-intron boundary for each type of regulatory regions (see Methods for details).
Compared to the mCpG sites in random genomic regions, the methylation level of the mCpG sites in the regulatory regions was significantly more negatively correlated with the expression of the target genes (p < 1.0E-15; Kolmogorov-Smirnov test) (Figure 1B and Figure 1C). This is consistent with the notion that DNA methylation represses gene expression. We would like to point out that the conclusion still holds if we used Spearman's rank correlation, rather than Pearson correlation coefficient (Additional file 1: Figure. S1A, B).
Overall, our result suggests that the regulatory regions predicted by histone marks and the random genomic regions with similar relative genomic locations are of reasonable quality and are suitable to serve as the positive and negative sets of this study.
mCpGs in regulatory regions are hypomethylated
Hypomethylation in regulatory regions extends across a long range
Methylation levels of neighboring regulatory mCpG sites are highly correlated
One explanation of the extended hypomethylation is correlation between neighboring regulatory mCpG sites. To test this hypothesis, we calculated the autocorrelation of methylation profiles within regulatory regions. Specifically, we computed the correlation of methylation levels at two methylation sites as a function of their genomic distances (Figure 3D, see Methods for detail). As expected, mCpG sites within close proximity showed higher correlation than those located distally. Based on the autocorrelation of methylation profile, we observed that the correlation of methylation between mCpG sites was significantly stronger in regulatory regions than in non-regulatory regions (p < 1.0E-15 based on Kolmogorov-Smirnov test; Figure 3D). In regulatory regions, the correlation of methylation levels extended across distances of up to 326 and 231bp for promoters and enhancers, respectively, whilst in random genomic regions the correlation disappeared by 41bp (see also Additional file 1: Figure S1C).
We speculated that a high density of CpG sites could be the underlying mechanism for correlation between methylation levels, as sites in close proximity might be co-regulated by DNA methyltransferases. We examined the CpG densities in the regulatory and random regions and found that regulatory CpG sites were much denser than those in random regions, especially within active promoters (Figure 3E). This observation was true even if the CpG islands in these regions were excluded (Additional file 1: Figs S1D, E). Thus, the hypomethylation in regulatory regions and/or TF binding sites can be attributed to both the high correlation between methylation levels and the high CpG density within these regions.
Inactive regulatory regions are hypermethylated
Using our comparative analysis between H1 and GM12878 cells, we found that the methylation levels at mCpG sites within inactive promoters and enhancers resembled neither the active nor the background (random region) levels (Figure 4B). Interestingly, the methylation levels in inactive regulatory regions were even higher than background levels obtained from random regions (p < 1.0E-15; Kolmogorov-Smirnov test) (Figure 4B). This phenomenon was robust as it was present when we compared H1 cells with other different cell lines (additional file: 2: Figure S2 A-E). Furthermore, the finding still held if we grouped the regulatory regions based on whether they overlap with CpG islands (Figure 4B, additional file 2: Figs S2 A-E, 3: Figure. S3). Finally, rather than using predicted active and inactive regulatory regions, if we used the raw histone marks (e.g. H3k9ac), we made the same observation in multiple cell lines (Additional file 4: Figure S4). Our findings demonstrate that inactive regulatory regions show hypermethylation relative to the genomic background, distinguishing inactive regulatory regions from background methylation.
Methylation level in regulatory regions can vary considerably between cells
The difference in methylation levels between active and inactive regions suggested that regulatory mCpG sites might have a greater range of possible methylation levels than other mCpGs. To test this hypothesis, we calculated the variability of each mCpG site across the cell lines whose genome-wide methylation profiles are publically available . The variance of the methylation levels for the regulatory regions predicted in H1 cells were then compared to those from random genomic regions (Figure 4C, 4D). About 40% of the regulatory mCpG sites have a variance larger than 0.2, whereas only 15% of mCpG sites in other genomic regions showed the same variance, suggesting that the regulatory mCpG sites have a larger range of potential methylation levels.
We then directly related the methylation level of mCpGs to the regulatory state (active versus inactive). For those mCpG sites that were present in an active regulatory region in both cell lines, the difference in methylation level between the two cell lines was small (Additional file 2: Fig. S2F). In fact, the vast majority (~75%) of the regulatory mCpG sites showed very small methylation level differences in the range of -0.1 to 0.1 between the two different cell lines. This difference was similar to the difference in methylation levels of mCpG sites in random genomic regions. In contrast, the regulatory mCpG sites present in both active and inactive regulatory regions in two cell lines showed a greater methylation difference; less than a half (42%) of the active-inactive regulatory regions had methylation level differences between -0.1 and 0.1. Taken together, the large range of potential methylation levels in the regulatory regions can be associated with the status changes of the regulatory regions.
Distinct sequence motifs are associated with regulatory mCpG sites
Interestingly, the nucleotide composition of the overrepresented 8-mers was different from that of the underrepresented 8-mers. For example, the GC content of overrepresented 8-mers obtained from the promoters and enhancers are 0.68 and 0.60, respectively. In contrast, the GC contents of the underrepresented 8-mers from the promoters and enhancers are only 0.47 and 0.31, respectively. This distinct nucleotide composition difference became more significant when considering only the 2 bases directly adjacent to the CpG sites. Over 77% and 68% of the overrepresented 8-mers in the promoters and enhancers, respectively, had either cytosines or guanines as a direct neighbor of the CpG site. In contrast, only 31% and 28% of the underrepresented 8-mers in these two types of regulatory regions had either cytosines or guanines at these positions. Cytosines or guanines were simultaneously observed in both direct neighbors of the overrepresented motifs (56% and 45% in the promoters and enhancers, respectively), which occur much less frequently in the underrepresented motifs (13% and 11% in the promoters and enhancers, respectively).
Regulatory regions are predictable
Distinct genomic features of the regulatory mCpGs, which distinguish them from other mCpGs in negative sets, were used to predict regulatory regions. We predicted the regulatory regions using these features, including methylation level, CpG density, autocorrelation of methylation levels, variance of methylation levels among different cell lines and sequence motifs (significance (-log(P)) of the occurrence of 8-mer sequence motifs surrounding the mCpGs). The computation of these features is described in detail in Methods section. As comparison, we predicted the regulatory regions solely based on methylation level.
Some features showed more significant contribution to the prediction than other features. Therefore, the ability of each feature to discriminate regulatory and non-regulatory regions was analyzed by the information gain of each feature (Figure 6B). Information gain (IG) of a given feature F with respect to the classes (e.g., regulatory or random regions) is the entropy reduction of the sample set when we know the feature F (see Methods in detail). Interestingly, the most informative features are different when predicting promoters and enhancers. For the promoters, the features showing the largest information gain are methylation level and methylation variance, while for enhancers, the most useful features are CpG density and methylation autocorrelation (Figure 6B). Some features show strikingly different contribution in predicting promoters and enhances. For example, the feature of methylation variance is quite informative in promoter prediction, with information gain of 0.67, while it has limited contribution when predicting enhancers, with information gain of 0.06. This result suggests that a set of methylation-associated features are needed for predicting regulatory regions and these features have different predictive power for promoters and enhancers.
Although low methylation has been found to be associated with regulatory regions [16, 21], we found that a low methylation level was not sufficient to predict regulatory regions, especially for enhancers. Additional features of mCpG sites were required to predict regulatory regions, and we were able to elucidate some of these features and successfully use them for prediction. For example, in regulatory regions, we found that a low methylation level extended across a range that was often longer than 1000 bp, and that the correlation of methylation levels between methylation sites was much stronger than outside of regulatory regions. Furthermore, we found a larger variation in methylation levels within regulatory regions compared to non-regulatory regions. Therefore, our work provides novel insights regarding the DNA methylation status in regulatory regions.
Since CpG islands are often lowly (or not at all) methylated and are considered to play an important role in gene regulation, the hypomethylated state is regarded to be important in positive gene regulation. In this study, we found that regulatory mCpG sites demonstrate distinct features beside hypomethylation. Specifically, we found that the methylation levels are highly correlated between neighboring mCpG sites in regulatory regions. Furthermore, when regulatory regions are inactive in other cell types, these methylation levels did not simply return to the background level but were instead hypermethylated, suggesting that in regulatory regions, a higher level of methylation is required to maintain an inactive state.
Correlation of the DNA methylation status of neighboring CpG sites [25, 26] has been previously observed; however, in these studies, the correlation was not analyzed in the context of regulatory regions. For example, Eckhardt et al. found an overall correlation between neighboring CpG sites in the human genome . Our work revealed that the correlation primarily stems from the regulatory regions as the correlation in regulatory regions was much stronger than that in random genomic regions.
The overrepresented 8-mer motifs in regulatory regions are predicted as potential transcription factor binding sites. Our prediction suggested a distinct set of transcription factors might interact with these motifs in a methylation-dependent fashion since the overrepresented motifs had a higher GC content. While the chromatin structure was previously considered as one mediator of transcription factor-DNA interaction, our finding indicates that DNA methylation can also serve as a "switch" for protein-DNA interactions [27, 28]. Indeed, the chromatin structure and DNA methylation can influence each other. For example, nucleosome occupancy can direct DNA methylation [29, 30], and in some cases, DNA methylation can determine nucleosome occupancy .
In this study, we demonstrated that regulatory regions are predictable by their methylation patterns; however, our prediction did not perfectly separate regulatory and non-regulatory regions, especially at enhancers. One possible reason is that we did not have sufficient methylome datasets for the prediction model. Since these behaviors required methylation levels from multiple cell types, additional methylation data from a range of cell types should help to better describe these distinctive behaviors. We expect such datasets will become available in near future and will enable better prediction of the enhancers. Another possibility is that DNA methylation is also associated with other functional elements other than promoters and enhancers. For example, recent studies suggested that DNA methylation is also involved in alternative splicing regulation [32–34]. We need additional information to distinguish different types of functional elements to improve our prediction.
We proposed a set of novel methylation associated features that are informative to predict regulatory regions. These features greatly improve the prediction compared to the prediction solely based on methylation level. Our results suggest that the regulatory "grammar" is encoded in complex DNA methylation patterns and identification of these features will provide biological insight on the methylation-mediated gene regulation.
Human DNA methylation data as measured by bisulfite sequencing or Illumina 450k array available for 15 cell lines was obtained from SALK and ENCODE databases. The regulatory regions predicted from chromatin marks were downloaded from ENCODE database . Gene expression data by RNA-Seq, available in 4 cell lines, was imported from SALK database . RPKM (Reads per kilo base per million) values was used for transcripts' expression . Transcription factor binding is based on ChiP-seq or computational predictions [23, 35]. Gene annotation from UCSC Genome database was used .
CpG sites in the study
The methylation level of a CpG site was measured as the ratio of the number of methylated cytosines to the total number of sequences on that position in the bisulfite sequencing data. To take into account only the sites with reliable measurement, we only consider the CpG sites covered with more than 4 sequences.
Random region selection
Autocorrelation of methylation profile
where x i is methylation level of a mCpG at position i, x i+k is methylation level of the mCpG distant k nucleotides from position i, and is mean methylation level of the mCpGs in all regions of interest. We considered the autocorrelation disappeared when the value reached 0.05.
CpG density and CG content
CpG density was calculated as the number of CpGs in a region normalized by its length. CG content in a region was measured as the number of cytosines and guanines in the region normalized by its total length.
Sequence motif discovery
where p is probability that an 8-mer is found in the random regions, and k is the number of occurrences of the 8-mer of interest and n is the number of all 8-mers in the regulatory regions. P-value was corrected for multiple testing using Bonferroni method.
Regulatory region prediction
Support Vector Machine (SVM) was used to predict regulatory regions based on the genomic features of the mCpGs in the regions. To apply SVM to our dataset, a number of features that represent the entities (regions) in the dataset should be identified and transformed into feature vectors, i.e. multi-dimensional vectors in which each element is a selected feature. SVM builds a set of hyperplanes that separate the entities into specified classes utilizing the provided feature vectors. In this research, the test data set for prediction includes the predicted regulatory regions and the same number of random regions generated as we described in the previous section. Five features were used to form the feature vector, including mean methylation level, mean methylation variance among 15 cell lines, mean methylation level autocorrelation between two mCpGs, CpG density, and 8-mer sequence motif P-value around mCpGs in a genomic region. 10-fold cross validation was used to measure the prediction accuracy. In k-fold cross validation, the dataset is randomly partitioned into k equal size of subsets. k-1 subsets are used to train the prediction model and the remaining 1 subset is used to test the model. This cross validation process is repeated k times for each subset. For the SVM, polynomial kernel with the soft margin of 10 and the degree of 2 was used. The area under the ROC curve (AUC) was used to evaluate the prediction performance.
where p i is the probability that a sample in S is classified as class i. Discretization was adapted for continuous attributes.
This work was supported by funding from the National Institutes of Health [R01EY024580, R01GM111514 to J.Q., R21EY018703 to S.M, R01EY023188 to J.Q. and S.M.]; the Wilmer Core Grant [5P30EY001765 to J.Q.]; unrestricted funding from the Research to Prevent Blindness; the National Research Foundation of Korea grant [2011-0030810]; and the generosity of A. Nixon.
The publication costs for this article were funded by the corresponding author.
This article has been published as part of BMC Genomics Volume 16 Supplement 7, 2015: Selected articles from The International Conference on Intelligent Biology and Medicine (ICIBM) 2014: Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/16/S7.
- Lee D, Karchin R, Beer M: Discriminative prediction of mammalian enhancers from DNA sequence. Genome Research. 2011, 21: 2167-2180. 10.1101/gr.121905.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Berman B, Pfeiffer B, Laverty T, Salzberg S, Rubin G, Eisen M, Celniker S: Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biology. 2004, 5: R61-10.1186/gb-2004-5-9-r61.PubMed CentralView ArticlePubMedGoogle Scholar
- Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, et al: In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006, 444: 499-502. 10.1038/nature05295.View ArticlePubMedGoogle Scholar
- Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natil Acad Sci USA. 2001, 99 (2): 757-762.View ArticleGoogle Scholar
- Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science. 2007, 316 (5830): 1497-1502. 10.1126/science.1141319.View ArticlePubMedGoogle Scholar
- Park PJ: ChIP-seq: advantages and challenges of a maturing technology. Nature Reviews Genetics. 2009, 10: 669-680. 10.1038/nrg2641.PubMed CentralView ArticlePubMedGoogle Scholar
- Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, et al: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research. 2012, 22: 1813-1831. 10.1101/gr.136184.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al: Mapping and analysis of chromatin state dynamics in nine cell types. Nature. 2011, 473: 43-51. 10.1038/nature09906.PubMed CentralView ArticlePubMedGoogle Scholar
- Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, et al: The accessible chromatin landscape of the human genome. Nature. 2012, 489 (7414): 75-82. 10.1038/nature11232.PubMed CentralView ArticlePubMedGoogle Scholar
- Sheffield NC, Thurman RE, Song L, Safi A, Stamatoyannopoulos JA, Lenhard B, Crawford GE, Furey TS: Patterns of regulatory activity across diverse human cell-types predict tissue identity, transcription factor binding, and long-range interactions. Genome Research. 2013, 23: 777-788. 10.1101/gr.152140.112.PubMed CentralView ArticlePubMedGoogle Scholar
- Ram O, Goren A, Amit I, Shoresh N, Yosef N, Ernst J, Kellis M, Gymrek M, Issner R, Coyne M, et al: Combinatorial Patterning of Chromatin Regulators Uncovered by Genome-wide Location Analysis in Human Cells. Cell. 2011, 147 (7): 1628-1639. 10.1016/j.cell.2011.09.057.PubMed CentralView ArticlePubMedGoogle Scholar
- Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, et al: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007, 448: 553-560. 10.1038/nature06008.PubMed CentralView ArticlePubMedGoogle Scholar
- Pelizzola M, Ecker J: The DNA methylome. FEBS Letters. 2011, 585 (13): 1994-2000. 10.1016/j.febslet.2010.10.061.PubMed CentralView ArticlePubMedGoogle Scholar
- Reik W, Dean W, Walter J: Epigenetic Repeogramming in Mammian Development. Science. 2001, 293: 1089-1093. 10.1126/science.1063443.View ArticlePubMedGoogle Scholar
- Baylin S: DNA methylation and gene silencing in cancer. Nature Clinical Practice Oncology. 2005, 2: s4-s11.View ArticlePubMedGoogle Scholar
- Lister R, Pelizzola M, Dowen RH, Hawkins D, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, et al: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009, 462: 315-322. 10.1038/nature08514.PubMed CentralView ArticlePubMedGoogle Scholar
- Meissner A, Gnirke A, Bell GW, Ramsahoye B, Lander ES, Jaenisch R: Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Research. 2005, 33: 5868-5877. 10.1093/nar/gki901.PubMed CentralView ArticlePubMedGoogle Scholar
- Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, Rongione M, Webster M, et al: The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nature Genetics. 2009, 41 (2): 178-186. 10.1038/ng.298.PubMed CentralView ArticlePubMedGoogle Scholar
- Touleimat N, Tost J: Complete pipeline for Infinium Human Methylation 450k BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Future Medicine. 2012, 4 (3): 325-341.Google Scholar
- Burger L, Gaidatzis D, Schübeler D, Stadler MB: Identification of active regulatory regions from DNA methylation data. Nucleic Acids Research. 2013, 41 (16): e155-10.1093/nar/gkt599.PubMed CentralView ArticlePubMedGoogle Scholar
- Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Schöler A, Nimwegen Ev, Wirbelauer C, Oakeley EJ, Gaidatzis D, et al: DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 2011, 480: 490-495.PubMedGoogle Scholar
- Lister R, Pelizzola M, Kida YS, Hawkins D, Nery JR, Hon G, Antosiewicz-Bourget J, O'Malley R, Castanon R, Klugman S, et al: Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature. 2011, 471: 68-75. 10.1038/nature09798.PubMed CentralView ArticlePubMedGoogle Scholar
- Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK: Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Research. 2010, 21: 447-455.View ArticlePubMedGoogle Scholar
- Choy MK, Movassagh M, Goh HG, Bennett MR, Down TA, Foo RS: Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated. BMC Genomics. 2010, doi: 10.1186/1471-2164-1111-1519Google Scholar
- Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA, et al: DNA methylation profiling of human chromosomes 6, 20 and 22. Nature Genetics. 2006, 38: 1378-1385. 10.1038/ng1909.PubMed CentralView ArticlePubMedGoogle Scholar
- Hodges E, Smith AD, Kendall J, Xuan Z, Ravi K, Rooks M, Zhang MQ, Ye K, Bhattacharjee A, Brizuela L, et al: High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Research. 2009, 19: 1593-1605. 10.1101/gr.095190.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Spruijt CG, Gnerlich F, Smit AH, Pfaffeneder T, Jansen PWTC, Bauer C, Münzel M, Wagner M, Müller M, Khan F, et al: Dynamic Readers for 5-(Hydroxy)Methylcytosine and Its Oxidized Derivatives. Cell. 2013, 152 (5): 1146-1159. 10.1016/j.cell.2013.02.004.View ArticlePubMedGoogle Scholar
- Hu S, Wan J, Su Y, Song Q, Zeng Y, Nguyen HN, Shin J, Cox E, Rho HS, Woodard C, et al: DNA methylation presents distinct binding sites for human transcription factors. eLife. 2013, 2: e00726-PubMed CentralPubMedGoogle Scholar
- Chodavarapu RK, Feng S, Bernatavichute YV, Chen PY, Stroud H, Yu Y, Hetzel JA, Kuo F, Kim J, Cokus SJ, et al: Relationship between nucleosome positioning and DNA methylation. Nature. 2010, 466: 388-392. 10.1038/nature09147.PubMed CentralView ArticlePubMedGoogle Scholar
- Felle M, Hoffmeister H, Rothammer J, Fuchs A, Exler JH, Längst G: Nucleosomes protect DNA from DNA methylation in vivo and in vitro. Nucleic Acids Res. 2011, 39 (16): 6956-6696. 10.1093/nar/gkr263.PubMed CentralView ArticlePubMedGoogle Scholar
- Portela A, Liz J, Nogales V, Setién F, Villanueva A, Esteller M: DNA methylation determines nucleosome occupancy in the 5′-CpG islands of tumor suppressor genes. Oncogene. 2013, doi:10.1038/onc.2013.1162Google Scholar
- Shukla S, Kavak E, Gregory M, Imashimizu M, Shutinoski B, Kashlev M, Oberdoerffer P, Sandberg R, Oberdoerffer S: CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature. 2011, 479: 74-79. 10.1038/nature10442.View ArticlePubMedGoogle Scholar
- Wan J, Oliver V, Zhu H, Zack D, Qian J, Merbs S: Integrative analysis of tissue-specific methylation and alternative splicing identifies conserved transcription factor binding motifs. Nucleic Acids Research. 2013, 41 (18): 8503-8514. 10.1093/nar/gkt652.PubMed CentralView ArticlePubMedGoogle Scholar
- Maunakea AK, Chepelev I, Cui K, Zhao K: Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Research. 2013, 23: 1256-1269. 10.1038/cr.2013.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, et al: ENCODE whole-genome data in the UCSC genome browser (2011 update). Nucleic Acids Research. 2011, 39: D871-D875. 10.1093/nar/gkq1017.PubMed CentralView ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, et al: The UCSC Genome Browser database: extensions and updates 2013. Nuleic Acids Research. 2012, 41 (D1): D64-D69.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.