Analysis method of epigenetic DNA methylation to dynamically investigate the functional activity of transcription factors in gene expression
© Feng et al.; licensee BioMed Central Ltd. 2012
Received: 18 May 2012
Accepted: 28 September 2012
Published: 5 October 2012
DNA methylation is a fundamental component of epigenetic modification, which is intimately involved in the regulation of gene expression. One important DNA methylation pathway reduces the abilities of transcription factors to bind to gene promoter regions. Although many experiments have been designed to measure genome-wide DNA methylation levels at high resolution, the meaning of these different DNA methylation levels on transcription factor binding abilities remains poorly understood. We have, therefore, developed a method to quantitatively explore the extent to which DNA methylation levels can significantly reduce or even abolish the binding of certain transcription factors, resulting in reduced or non-expression of flanking genes. This method allows transcription factors that are functionally active in gene expression to be investigated.
The method is based on a general model that depicts the relationship between DNA methylation and transcription factor binding ability based on intrinsic component properties, and the model parameters can be optimized through relative analysis of recognized transcription factor binding status and gene expression profiling. With fixed models, transcription factors functionally active in the regulation of gene expression and affected by epigenetic DNA methylation can be identified and subsequently confirmed. The method identified eleven apparently functionally active transcriptional factors in SH-SY5Y neuroblastoma cells.
Compared with gene regulatory elements, epigenetic modifications are able to change to dynamically respond to signals from physical, biological and social environments. Our proposed method is therefore designed to provide a dynamic assessment to investigate functionally active transcription factors. With the information deduced from our method, we can predict transcription factor binding status in promoter regions to further investigate how a particular gene is regulated by a specific group of transcription factors organized in a particular pattern. This will be helpful in the diagnosis and development of treatment for numerous diseases, including cancer. Although the method only investigates DNA methylation, it has the potential to be applied to more epigenetic factors, such as histone modification.
KeywordsMethod Epigenetics DNA methylation Activity Transcription factor Gene expression
Epigenetics is defined as the study of heritable modifications to gene function that occur without alterations in DNA sequences. Epigenetic modifications consist mainly of DNA methylation, histone modifications, chromatin reconstruction, and expression of non-coding RNA. Epigenetic modifications are widely recognized to regulate tissue-specific gene expression, genomic imprinting and X-chromosome inactivation. In addition, the key role of epigenetic modifications during cellular differentiation, development and organogenesis has been highlighted by the identification of many epigenetic biomarkers in human diseases [1, 2], such as neuroblastic tumors .
The occurrence of many human cancers results from the accumulation of both genetic and epigenetic alterations. While genetic alterations are nearly impossible to reverse, epigenetic alterations can dynamically respond to signals from physical, biological and social environments [4–6]. This characteristic confers the importance of epigenetic research in various cellular processes, especially in gene expression regulation. Although epidemiological data provide evidence that there are direct interactions between epigenetic modifications and the environment to influence gene expression, the mechanism of epigenetic induced modulations of gene expression is still poorly understood.
Regulation of gene expression by transcription factors is a fundamental mechanism. Through the interplay with transcription factors, epigenetic modification such as DNA methylation is able to regulate gene expression [7–11]. For example, high methylation levels in promoter regions always weaken the binding ability of associated transcription factors and cause reduced expression of adjacent genes [12, 13]. Although there are many qualitative observations about the effect of DNA methylation on gene regulation, few methods have been developed to assess the effect in a measureable way. Here, we propose a method to evaluate how each transcription factor affects gene expression under a specific pattern of epigenetic DNA methylation levels, which is then used to determine the functional activity of the transcription factor. We describe a general model of how epigenetic DNA methylation affects transcription factor binding ability where several model parameters provide sufficient freedom for different circumstances. Through the relative analysis of recognized transcription factor binding status and gene expression profiling, a model for each transcription factor can be fixed with concrete parameter values. Then, with the deduced models, transcription factors affected by DNA methylation and functionally active in gene expression can be investigated. The proposed method has the capacity to dynamically reflect functions of transcription factors in a temporal and spatial manner.
In addition to gene sequence-driven gene regulatory mechanisms, epigenetic modifications, such as DNA methylation, also participate in the regulation of gene expression induced by signals from the environment. Here, based on genome-wide DNA methylation profiling in gene promoter regions, we present a method to investigate transcription factors that are affected by DNA methylation and that are functionally active in gene expression.
Transcription factor match score
As a functional protein, a transcription factor has the intrinsic tendency to combine with specific DNA sequences, and we define a value termed ‘transcription factor match score’ to evaluate such binding ability for each transcription factor in the promoter region of each gene. In the TRANSFAC database produced by BIOBASE, position weight matrices (PWMs) for every transcription factor are provided. In these matrices, each row consists of four weights representing different capabilities to combine with nucleotides A, C, G and T, respectively. Using these PWMs, each gene promoter region can be scanned nucleotide by nucleotide with a smooth window to compute transcription factor match scores.
Hence, for one transcription factor, a collection of match scores can be calculated with respect to every gene promoter region.
Although match scores can approximate the opportunity for a transcription factor to bind to a gene promoter region, it is also meaningful to determine a threshold for match scores to evaluate whether the transcription factor binds and regulates the transcription of specific genes.
Transcription factor match score threshold
As described in the method proposed by Hertzberg , for a given transcription factor, a Z-score, which considers the relationship between transcription factor match scores and gene expression levels, was calculated to infer the match score threshold.
The Z-score reflects the extent to which average expression of the selected target genes differ from average expression of all genes. In other words, a larger absolute Z-score value means a higher relationship between transcription factor match scores and expression of selected genes, and that these genes are more likely to be regulated by the same transcription factor. Therefore, with different thresholds for transcription factor match scores, we can obtain different groups of transcription factor target genes and subsequently different Z-score values. Finally, the best threshold can be determined when the maximal Z-score value (if positive) or the minimal Z-score value (if negative) is found, where the corresponding Z-score for the particular transcription factor is called Z m .
However, without considering the effects of epigenetic modifications, the match score defined above only considers DNA sequences to decide whether a transcription factor binds and regulates the expression of certain genes. This undoubtedly makes subsequent Z-score values inaccurate in the evaluation of transcription factor binding status in gene promoter regions. Hence, we have improved the calculation of the transcription factor match score by adding the effect of epigenetic modifications. However, among the many epigenetic modifications, only DNA methylation was considered because of the requirement for high precision and high resolution data.
General model of DNA methylation effect
Transcription factor binding score
where A ijk is the sequences match value of the ith transcription factor and E ijk is the effect of DNA methylation on the binding ability of the ith transcription factor at the kth putative binding site in the promoter region of the jth gene.
Similar to the transcription factor match score, by threshold analysis of the transcription factor binding score, a maximal Z-score value (if positive) or a minimal Z-score value (if negative), known as Z m , can also be calculated based on the relative analysis of transcription factor binding scores and gene expression profiles. However, in contrast to only one Z m value based on the match score, there are many Z m values for a transcription factor when different compositions of parameters C and S are selected in the model to calculate different binding scores. Then, when parameters C and S of the model are fixed to obtain an optimized Z m value, the effect of methylation on transcription factor binding ability can be quantitatively determined.
Functionally active transcription factors
According to different ways of describing DNA methylation effects on transcription factor binding ability, three Z m values can be calculated to investigate functionally active transcription factors. Without considering a DNA methylation effect, Z m-o is computed when transcription factor match scores are adopted. In contrast, with the consideration of a DNA methylation effect using our proposed model, Z m-p is analyzed from transcription factor binding scores from the sense orientation and Z m-q is calculated from transcription factor binding scores from the antisense orientation. Furthermore, with different compositions of model parameters, a group of Z m-p and Z m-q values can be calculated for each transcription factor. Then, if absolute Z m-p values are found to be obviously larger than the absolute Z m-o value and absolute Z m-q values are always less than the absolute Zm-o value, the transcription factor is considered to be affected by DNA methylation and functionally active.
Results and discussion
Rett syndrome, a condition frequently seen in cases of developed neuroblastoma, is caused by abnormal interactions between binding proteins and methylated DNA in promoter regions. To evaluate the utility of our proposed method for the investigation of active transcription factors with respect to DNA methylation, a dataset from the SH-SY5Y thrice-cloned neuroblastoma cell line (made by ATCC) was used. As indicated, the dataset includes two parts:
Part 1: DNA methylation levels in promoter regions of SH-SY5Y neuroblastoma cells, assayed with the NimbleGen-1500b-Promoter-Array in the MeDIP experiment, were retrieved (GSE9568). The promoter regions covered by the array are 1200 bps upstream and 300 bps downstream of gene transcriptional start sites. Log2-ratios of the Cy5-labeled test sample versus the Cy3-labeled reference sample were calculated to represent DNA methylation levels. Then, methylation levels of every specific transcription factor binding site (about 10 bps) in all promoter regions were calculated using the Batman algorithm .
Part 2: Gene expression levels in the same SH-SY5Y cell line under the same conditions, measured using Affymetrix-HG-U133-plus2.0 GeneChips, were retrieved (GSE4600). Using human RefSeq gene annotations downloaded from the server at UCSC, 10065 available gene expression levels were identified. To enhance observation of the interaction between DNA methylation and transcription factors in the regulation of gene expression, we filtered out genes with very low expression levels.
Differences in transcription factor binding abilities with and without consideration of a methylation effect
PWMs of 459 human transcription factors were extracted from the TRANSFAC database. With these transcription factor PWMs, match scores for each transcription factor in all gene promoter regions were calculated on human DNA sequences from UCSC. Then, based on DNA methylation levels and gene expression data in SH-SY5Y cells, for each transcription factor we calculated the Z m-o value, without consideration of the DNA methylation effect, and Z m-p and Z m-q values, with consideration of DNA methylation effect, using our proposed sense and antisense models, respectively. Differences among Z m-o values and extreme Z m-p values of all transcription factors, with and without consideration of a methylation effect, were found to be significant (Wilcox, P-Value < 2.2e-16), and showed a different distribution of extreme Z m-p values compared with that of Z m-o values when considering a DNA methylation effect on transcription factor binding ability.
Investigation of functionally active transcription factors
Among 459 human transcription factors, E2F1 was reported to be rich in SH-SY5Y cells and to be affected by DNA methylation ; therefore, first we show the analysis process of E2F1 in detail.
The effect of methylation with the sense part of model was then considered and Z m-p values were calculated along with adjustment of model parameters, and the extreme Z m-p value was found as 13.923 when parameters of the model were selected as C=-0.25 and S=0.01. The searching process of Z m-p values is shown in Figure 2b (solid line).
In Figure 2b, the X axis is the center C of the general model, from -2 to 2 and stepped by 0.05, and the Y axis is the corresponding Z m values (as steepness S of the general model contributes little to the effect of the model compared with C, to clearly exhibit the searching process of Z m values, S is set at a fixed value of 0.01). The horizontal solid line at 12.30 indicates the Z m-o value without consideration of a methylation effect.
In Figure 2b, while increasing the value of C from -2 to gradually strengthen the methylation effect, the Z m-p value begins to rise and soon becomes greater than the Z m-o value. This means a more reasonable result is obtained when considering a methylation effect on transcription factor binding ability with the sense part of model. The Z m-p value reaches its highest value at 13.92 (13% higher than the Z m-o value) when C is 0.25. After that, the Z m-p value drops rapidly when the effect of DNA methylation is further increased.
We also used the antisense part of the model to analyze Z m-q values along with adjustment of model parameters. The result is shown in Figure 2b (dashed line). While increasing the value of C from -2 to gradually weaken the methylation effect, the Z m-q value reduces and remains lower than the Z m-o value. This means the antisense part depicts the effect of methylation on E2F1 in an incorrect way. According to our proposed method, we determined that E2F1 was affected by DNA methylation and was functionally active in gene expression in SH-SY5Y cells.
Top 10 positive transcription factors in SH-SY5Y cells
Negative transcription factor in SH-SY5Y cells
In the method described here, methylation effects on binding abilities of different transcription factors need to be described for each transcription factor; a model is designed with particular independent parameters for each transcription factor. In future research, we will improve the performance of the method by considering transcription factor clustering and multiple transcription factors acting at their binding sites in modules.
In this study, we have proposed a method to detect active transcription factors in specific cell types by analyzing the interactions between epigenetic methylation patterns in gene promoter regions and the expected binding of transcription factors. In the method, we designed a general model to quantitatively analyze the effect of methylation to suppress transcription factor binding ability in promoter regions, where an inverse S-function was adopted to depict the effect of methylation and the model parameters were fixed through calculation of the relationship between transcription factor binding scores in promoter regions and gene expression levels. Based on the model, the case analysis of data from a neuroblastoma cell line successfully showed that 11 transcription factors were obviously affected by methylation of promoter regions and were functionally active in gene expression.
Besides detection of active transcription factors, information deduced from the model can indicate transcription factor binding status in promoter regions to further investigate how a particular gene is regulated by a specific group of transcription factors organized in a particular pattern. This should be helpful for diagnosis and for the development of treatments for numerous diseases, including various cancers. The prediction of transcription factor binding sites produces many false positives; however, by combining static genetic and dynamic epigenetic information together, our proposed approach is capable of effectively decreasing the false positive rate.
So far, we have only considered DNA methylation in the proposed method because of the requirement for high precision and high resolution data. But, the method has the potential to consider more epigenetic factors, such as histone modifications, when the quality of data improves with the development of experimental technology.
We thank Yunlong Liu on discussion of this research. This work was funded by grants from the National Natural Science Foundation of China (61071174) and from Fundamental Research Funds for the Central Universities (HEUCFT1102).
- Robertson KD: DNA methylation and human disease. Nat Rev Genet. 2005, 6 (8): 597-610. 10.1038/nrg1655.View ArticlePubMed
- Guerrero-Bosagna C, Skinner MK: Environmentally induced epigenetic transgenerational inheritance of phenotype and disease. Mol Cell Endocrinol. 2012, 354 (1–2): 3-8.PubMed CentralView ArticlePubMed
- Banelli B, Di Vinci A, Gelvi I, Casciano I, Allemanni G, Bonassi S: DNA methylation in neuroblastic tumors. Cancer Lett. 2005, 228 (1–2): 37-41.View ArticlePubMed
- Bollati V, Baccarelli A: Environmental epigenetics. Heredity. 2010, 105 (1): 105-112. 10.1038/hdy.2010.2.PubMed CentralView ArticlePubMed
- Davies PC: The epigenome and top-down causation. Interface Focus. 2012, 2 (1): 42-48. 10.1098/rsfs.2011.0070.PubMed CentralView ArticlePubMed
- Costenbader KH, Gay S, Riquelme ME, Iaccarino L, Doria A: Genes, epigenetic regulation and environmental factors: which is the most relevant in developing autoimmune diseases?. Autoimmun Rev. in press
- Yamagata Y, Szabó P, Szüts D, Bacquet C, Arànyi T, Páldi A: Rapid turnover of DNA methylation in human cells. Epigenetics. 2012, 7 (2): 141-145. 10.4161/epi.7.2.18906.PubMed CentralView ArticlePubMed
- Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S: Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet. 2007, 39 (1): 61-69. 10.1038/ng1929.View ArticlePubMed
- Palacios D, Summerbell D, Rigby PW, Boyes J: Interplay between DNA methylation and transcription factor availability: implications for developmental activation of the mouse myogenin gene. Mol Cell Biol. 2010, 30 (15): 3805-3815. 10.1128/MCB.00050-10.PubMed CentralView ArticlePubMed
- Lienert F, Wirbelauer C, Som I, Dean A, Mohn F, Schübeler D: Identification of genetic elements that autonomously determine DNA methylation states. Nat Genet. 2011, 43 (11): 1091-1097. 10.1038/ng.946.View ArticlePubMed
- Choy MK, Movassagh M, Goh HG, Bennett MR, Down TA, Foo RS: Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated. BMC Genomics. 2010, 11: 519-528. 10.1186/1471-2164-11-519.PubMed CentralView ArticlePubMed
- Jones PA, Takai D: The role of DNA methylation in mammalian epigenetics. Science. 2001, 293 (5532): 1068-1070. 10.1126/science.1063852.View ArticlePubMed
- Kapoor A, Agius F, Zhu JK: Preventing transcriptional gene silencing by active DNA demethylation. FEBS Lett. 2005, 579 (26): 5889-5898. 10.1016/j.febslet.2005.08.039.View ArticlePubMed
- Hertzberg L, Izraeli S, Domany E: STOP: searching for transcription factor motifs using gene expression. Bioinformatics. 2007, 23 (14): 1737-1743. 10.1093/bioinformatics/btm249.View ArticlePubMed
- Gibney ER, Nolan CM: Epigenetics and gene expression. Heredity. 2010, 105: 4-13. 10.1038/hdy.2010.54.View ArticlePubMed
- Campanero MR, Armstrong MI, Flemington EK: CpG methylation as a mechanism for the regulation of E2F activity. PNAS. 2000, 97 (12): 6481-6486. 10.1073/pnas.100340697.PubMed CentralView ArticlePubMed
- Comb M, Goodman HM: CpG methylation inhibits proenkephalin gene expression and binding of the transcription factor AP-2. Nucleic Acids Res. 1990, 18 (13): 3975-3982. 10.1093/nar/18.13.3975.PubMed CentralView ArticlePubMed
- Mancini DN, Singh SM, Archer TK, Rodenhiser DI: Site-specific DNA methylation in the neurofibromatosis (NF1) promoter interferes with binding of CREB and SP1 transcription factors. Oncogene. 1999, 18 (28): 4108-4119. 10.1038/sj.onc.1202764.View ArticlePubMed
- Radtke F, Hug M, Georgiev O, Matsuo K, Schaffner : Differential sensitivity of zinc finger transcription factors MTF-1, Sp1 and Krox-20 to CpG methylation of their binding sites. Biol Chem Hoppe Seyler. 1996, 377 (1): 47-56. 10.1515/bchm3.1996.377.1.47.View ArticlePubMed
- Fang F, Fan S, Zhang X, Zhang MQ: Predicting methylation status of CpG islands in the human brain. Bioinformatics. 2006, 22 (18): 2204-2209. 10.1093/bioinformatics/btl377.View ArticlePubMed
- Yin H, Blanchard KL: DNA methylation represses the expression of the human erythropoietin gene by two different mechanisms. Blood. 2000, 95 (1): 111-119.PubMed
- Gu P, Le Menuet D, Chung AC, Cooney AJ: Differential recruitment of methylated CpG binding domains by the orphan receptor GCNF initiates the repression and silencing of Oct4 expression. Mol Cell Biol. 2006, 26 (24): 9471-9483. 10.1128/MCB.00898-06.PubMed CentralView ArticlePubMed
- Zechel C: The Germ Cell Nuclear Factor (GCNF). Mol Reprod Dev. 2005, 72: 550-556. 10.1002/mrd.20377.View ArticlePubMed
- Thomas AD, Vardhman KR, Daniel JT, et al: A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol. 2008, 7: 779-785.
- Kim K, Oh M, Ki H, Wang T, Bareiss S, Fini ME: Identification of E2F1 as a positive transcriptional regulator for δ-catenin. Biochem Biophys Res Commun. 2008, 369 (2): 414-420. 10.1016/j.bbrc.2008.02.069.PubMed CentralView ArticlePubMed
- Bozek K, Relógio A, Kielbasa SM, Heine M, Dame C, Kramer A: Regulation of clock-controlled genes in mammals. PLoS One. 2009, 4 (3): e4882-10.1371/journal.pone.0004882.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.