Characterizing nucleosome dynamics from genomic and epigenetic information using rule induction learning
© Le et al; licensee BioMed Central Ltd. 2009
Published: 3 December 2009
Eukaryotic genomes are packaged into chromatin, a compact structure containing fundamental repeating units, the nucleosomes. The mobility of nucleosomes plays important roles in many DNA-related processes by regulating the accessibility of regulatory elements to biological machineries. Although it has been known that various factors, such as DNA sequences, histone modifications, and chromatin remodelling complexes, could affect nucleosome stability, the mechanisms of how they regulate this stability are still unclear.
In this paper, we propose a novel computational method based on rule induction learning to characterize nucleosome dynamics using both genomic and histone modification information. When applied on S. cerevisiae data, our method produced totally 98 rules characterizing nucleosome dynamics on chromosome III and promoter regions. Analyzing these rules we discovered that, some DNA motifs and post-translational modifications of histone proteins play significant roles in regulating nucleosome stability. Notably, these DNA motifs are strong determinants for nucleosome forming and inhibiting potential; and these histone modifications have strong relation with transcriptional activities, i.e. activation and repression. We also found some new patterns which may reflect the cooperation between these two factors in regulating the stability of nucleosomes.
DNA motifs and histone modifications can individually and, in some cases, cooperatively regulate nucleosome stability. This suggests additional insights into mechanisms by which cells control important biological processes, such as transcription, replication, and DNA repair.
Genetic materials of eukaryotic organisms are packaged into chromatin inside cell nucleus. This compact structure has the form like a bead-on-string fiber containing fundamental repeating units, the nucleosomes. Each nucleosome is composed of 147 bp of DNA wrapped 1.65 turns around an octamer of histone proteins consisting of a central (H 3 - H 4)2 tetramer flanked on both side by two H 2A - H 2B dimers . Since it was first recognized , there have been increasing evidences showing that chromatin plays a much more important role far beyond DNA compaction. By burying cis-regulatory elements under histone proteins and/or modifying related epigenetic information, chromatin imposes ubiquitous and profound effects on many DNA-based processes, including transcription, DNA repair and replication. To ensure faithfully copy both genetic and epigenetic information during replication or to facilitate the binding of Transcription Factors (TFs) to regulatory elements during transcription in the context of chromatin, cells have developed complicated biological pathways . In these pathways, by regulating nucleosome stability cells can control the accessibility of underlying DNA sequences to biological machineries. For example, in replication, during the process known as parental histone segregation, pre-existing nucleosomes located ahead of replication forks are transiently disrupted from parental DNA strands and later transferred onto nascent DNA [3, 5]. In transcription, moving nucleosomes to different translational positions is known as one way to change the accessibility of nucleosomal DNA to TFs . Also, promoter regions of actively transcribed genes are usually free of nucleosomes [7, 8]. So, understanding how cells regulate nucleosome stability will bring us additional insights into mechanisms of many important biological processes.
Nucleosome stability can be regulated by many factors, such as DNA sequences, histone modifications and histone variants, and chromatin remodelling complexes . For example, DNA sequence is known as a reliable determinant for nucleosome preference, which can be used to predict nearly 50% of nucleosome positions , so it is likely to be an important factor in favouring or disfavouring nucleosome eviction. Histone variant H2A.Z (Htz1) is found to be preferentially enriched at promoters where some nucleosomes have to be quickly removed upon transcriptional activation . Also, acetylated histones are shown to be easily dissociated from DNA [11, 12]. Chromatin remodelling complexes, such as Swi/Snf, act in concert with histone chaperones (e.g Asf1, Nap1) to displace histones from their original positions . Although the complete list of factors has been fairly known, the mechanisms of how they act to mobilize nucleosome are still unclear.
Owing to recent advanced profiling techniques, such as ChIP-on-Chip and ChIP-Seq, we now have increasing amount of information about how nucleosomes and various kinds of histone modifications are distributed over the genomes of many organisms, including yeast, drosophila, and human [13, 8]. This opens up a chance for thorough investigation of nucleosome organization, its regulatory mechanisms and functions. Until now, there have been many works, both experimental and computational, concentrating on revealing the effects of factors stated above on nuclesome distribution [10, 13, 19, 20] but most of them have some common drawbacks. First, they mainly considered the effect of each factor separately while bypassing their combinatorial effects on nucleosome distribution. Second, although the distribution of destabilized nucleosomes is usually inhomogeneous throughout the genome and is known to have strong relation with transcriptional activities , it is still not well-characterized compared with that of stable nucleosomes.
There are several efforts trying to overcome these limitations. For example, Rippe et al.  and Schnitzler  investigated co-effects of DNA sequences and chromatin remodelling complexes; Widlund et al.  and Yang et al.  investigated co-effects of histone tails and DNA sequences on nucleosome distribution. Most of them, however, were based on experimental methods. More recently, Dai et al.  used both transcriptional interaction and genomic sequence information to computationally identify dynamic nucleosome distribution, but the number of works like this is still limit.
Enthused by these facts, in this paper, we propose a novel method for computationally characterizing nucleosome dynamics from both genomic sequences and histone modification profiles. Our method is based on induction rule learning adapted for subgroup discovery, which can discover sufficiently large and statistically meaningful subsets of population as shown in , so it is well suited for characterizing inhomogeneous distribution of destabilized nucleosomes. Moreover, by combining both genetic sequence and histone modification information, our method can discover the combinatorial nature of these two factors in regulating nucleosome stability. Our results on S. cerevisiae show that, some DNA motifs, which are reliable determinants for nucleosome forming/inhibiting potential, and post-translational modifications of histone proteins, which have strong relation with transcriptional activities, are likely to be more significant to nucleosome dynamics. We also found some patterns of cooperation between these DNA motifs and histone modifications in regulating nucleosome stability. Our results give additional insights into mechanisms of how cells regulate important biological processes, such as transcription, DNA repair and replication.
Results and discussion
Potentially significant motifs to nucleosome dynamics
Significant DNA motifs on chromosome III given by WordSpy
Significant DNA motifs on promoter regions given by WordSpy
Discriminative motifs ranked by F-scores
Significant histone modifications to nucleosome dynamics
Histone modifications ranked by F-scores
Effects of DNA sequences and histone modifications on nucleosome dynamics
Selected rules characterizing nucleosome dynamics
AA, ATT = enr ∧ H 3K 9Ac = neutral → State = Well
ATT = enr ∧ H 3K 4Me 3 = hyper → State = Well
AT, GC = enr ∧ CC = low → State = Well
AT, CC = enr ∧ GC = low → State = Well
AT = low ∧ H 3K 9Ac = neutral ∧ H 4K 12Ac = hyper → State = Well
AT, TC = low ∧ ATT = enr → State = Well
CT, TG, GA, AT, CTT, GAG, ATT = low ∧ H 3K 18Ac, H 3K 4Me3 = hyper → State = Del
GA, TT, GG = low ∧ H 3K 9Ac = hyper ∧ H 3K 4Me3 = hypo → State = Del
AA = low ∧ GT, ATT = enr → State = Well
ATT = enr ∧ H 3K 9Ac = hyper → State = Well
GA, AG, ATT = low ∧ H 2BK 16Ac = neutral ∧ H 4K 12Ac = hypo → State = Del
AT = enr ∧ TA, TAA = low ∧ H 3K 9Ac = neutral ∧ H 4K 12Ac = hypo → State = Del
Nucleosome dynamics plays important roles in many DNA-based processes and is regulated by many factors, such as DNA sequences, post-translational modifications of histone proteins, and chromatin remodelling complexes. However, most of the previous works only investigated the effect of individual factor while bypassing their combinatorial effects on the distribution of stable nucleosomes. In this paper, we proposed a novel method based on induction rule learning to computationally characterize nucleosome dynamics from both genomic and histone modification information. Our method is shown to be suitable for characterizing inhomogeneous distributions like that of destabilized nucleosomes; and by combining both genomic and histone modification information, it can discover potential co-effects of these two factors on nucleosome dynamics.
Our results on S. cerevisiae show that, some DNA motifs and histone modifications are more important in stabilizing and destabilizing nucleosomes. These DNA motifs and histone modifications are known to have strong relations with nucleosome forming/inhibiting potential and transcriptional activities, correspondingly. They not only act individually but also cooperate with each other by some specific patterns to combinatorially affect nucleosome stability.
Although our method is efficient in characterizing nucleosome dynamics, it produces a larger number of rules, of which many may be irrelevant. In the future, we need to develop a better method for filtering these uninteresting rules.
We used experimental data from Yuan et al.  and Liu et al. , which covered nearly 4% of yeast genome including chromosome III and 223 additional promoter regions, for our experiments. Data from Yuan contained 50-base DNA fragments tiled every 20 base pairs, and for each fragment we extracted its genomic sequence and HMM inferred state showing that it is nucleosomal sequence or not. Data extracted from Liu contained 12 different histone modification levels corresponding to DNA fragments above, including acetylations of H3K9, H3K14, H3K18, H4K5, H4K8, H4K12, H4K16, H2AK7, H2BK16 and mono-, di- and tri-methylations of H3K4. To investigate whether there exists any difference in characteristics of nucleosome dynamics between regulatory regions and genomic regions, we separated the data above into two datasets, corresponding to chromosome III and promoter regions. For each dataset, we filtered out data of linker regions to keep only nucleosomal data. Each nucleosome was assigned either as Well-positioned if it stretched from 6 to 8 fragments or as Delocalized if it stretched more than 9 fragments. Nucleosomes which had no histone modification values or delocalized nucleosomes whose lengths were longer than 350 base pairs were also treated as noise and removed. After these preprocessing steps, the dataset of chromosome III contained 997 well-positioned nucleosomes and 154 delocalized nucleosomes, the dataset of promoter regions contained 995 well-positioned nucleosomes and 69 delocalized nucleosomes. These two datasets were used for further analysis.
Feature selection with Fisher criterion
Feature selection is a process of selecting a subset of relevant features available from the data that most contribute to distinguishing instances from different classes. In our method, significant sequence and histone modification features related to two states of nucleosomes, Well-positioned and Delocalized, were identified and ranked by their Fisher scores (or F-score in short). This is one of statistical criteria that is simple, effective and independent of the choice of classification method. Because our method only concentrated on identifying features with highly discriminative strength instead of building any concrete classifiers so we chose F-score as the selection criterion. The discriminative strength of each feature is defined as following:
The numerator indicates the discrimination between two classes, and the denominator indicates the scatter within each class. The larger the F-score is, the more likely this feature is more discriminative.
We consider this problem as a subgroup discovery problem and use a rule-based learning method for inducing rules. The problem of subgroup discovery can be defined as follows: given a population of individuals and a property of them, we are interested in finding population subgroups that are interesting with respect to the property of interest . The induced rules usually have the form Cond → Class, where Class is a value of the property of interest, and Cond is a conjunction of attribute-value pairs selected from the features describing the training instances. In our work, Class has two values, Delocalized and Well-positioned. Attributes are significant histone modifications and DNA motifs as described above (Section Method overview).
Though the CN2-SD rule induction system uses a weighted covering strategy to restrict the redundancy of learned rules and guarantee the scanning of the whole search space, uninteresting rules are still produced [26, 36]. Let us assume that our rule r has a form: IF [Cond] THEN [ClassDistribution]. Where Cond = [motif1 = motifV al1 ∧ ... ∧ motif m = motifV al m ∧ histoneMod1 = hisV al1 ∧ ... ∧ histoneMod n = hisV al n ] with motif i is a DNA motif, motifV al i is enriched or low, histoneMod j is one kind of histone modification and hisV al j is hyper or neutral or hypo; ClassDistribution = [p, q] with p and q are the number of Well-positioned and Delocalized nucleosomes covered by r, respectively. We used several heuristics to filter out unexpected rules: rules that cover less than 2 positive examples or p/(p + q) < 0.8 if positive class is Delocalized and rules that cover less than 10 positive examples or q/(p + q) < 0.8 if positive class is Well-positioned (Positive class is the class characterized by the rule).
Other papers from the meeting have been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, available online at http://www.biomedcentral.com/1471-2105/10?issue=S15.
We would like to gratefully thank Prof. Nada Lavrac and Dr. Branko Kavsek for sharing CN2-SD software. The first and the third authors have been supported by Japanese Government Scholarship (Monbukagakusho) to study in Japan.
This article has been published as part of BMC Genomics Volume 10 Supplement 3, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/10?issue=S3.
- Luger K, Mader AW, Richmond AK, Sargent DF, Richmond TJ: Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature. 1997, 389: 251-260. 10.1038/38444.View ArticlePubMedGoogle Scholar
- Kornberg RD, Thomas JO: Chromatin structure; oligomers of the histones. Science. 1974, 184 (139): 865-868. 10.1126/science.184.4139.865.View ArticlePubMedGoogle Scholar
- Groth A, Rocha W, Verreault A, Almouzni G: Chromatin Challenges during DNA Replication and Repair. Cell. 2007, 128 (4): 721-733. 10.1016/j.cell.2007.01.030.View ArticlePubMedGoogle Scholar
- Li B, Carey M, Workman JL: The Role of Chromatin during Transcription. Cell. 2007, 128 (4): 707-719. 10.1016/j.cell.2007.01.015.View ArticlePubMedGoogle Scholar
- Corpet A, Almouzni G: Making copies of chromatin: the challenge of nucleosomal organization and epigenetic information. Trends in Cell Biology. 2008, 19: 29-41. 10.1016/j.tcb.2008.10.002.View ArticlePubMedGoogle Scholar
- Probst AV, Dunleavy E, Almouzni G: Epigenetic inheritance during the cell cycle. Nature Reviews Molecular Cell Biology. 2009, 10: 192-206. 10.1038/nrm2640.View ArticlePubMedGoogle Scholar
- Lee CK, Shibata Y, Rao B, Strah BD, Lieb JD: Evidence for nucleosome depletion at active regulatory regions genome-wide. Nature Genetics. 2004, 36: 900-905. 10.1038/ng1400.View ArticlePubMedGoogle Scholar
- Henikoff S: Nucleosomes at active promoters: unforgettable loss. Cancer cell. 2007, 12 (5): 407-409. 10.1016/j.ccr.2007.10.024.View ArticlePubMedGoogle Scholar
- Henikoff S: Nucleosome destabilization in the epigenetic regulation of gene expression. Nature Reviews Genetics. 2008, 9: 15-26. 10.1038/nrg2206.View ArticlePubMedGoogle Scholar
- Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JPZ, Widom J: A genomic code for nucleosome positioning. Nature. 2006, 442 (7104): 772-778. 10.1038/nature04979.PubMed CentralView ArticlePubMedGoogle Scholar
- Reinke H, Horz W: Histones Are First Hyperacetylated and Then Lose Contact with the Activated PHO5 Promoter. Molecular Cell. 2003, 11 (6): 1599-1607. 10.1016/S1097-2765(03)00186-2.View ArticlePubMedGoogle Scholar
- Zhao J, Diaz JH, Gross DS: Domain-Wide Displacement of Histones by Activated Heat Shock Factor Occurs Independently of Swi/Snf and Is Not Correlated with RNA Polymerase II Density. Molecular and Cellular Biology. 2005, 25 (20): 8985-8999. 10.1128/MCB.25.20.8985-8999.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Yuan GC, Liu YJ, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ: Genome-Scale Identification of Nucleosome Positions in S. cerevisiae. Science. 2005, 309 (5734): 626-630. 10.1126/science.1112178.View ArticlePubMedGoogle Scholar
- Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C: A high-resolution atlas of nucleosome occupancy in yeast. Nature Genetics. 2007, 39 (10): 1235-1244. 10.1038/ng2117.View ArticlePubMedGoogle Scholar
- Mavrich TN, Jiang C, Ioshikhes IP, Li X, Venters BJ, Zanton SJ, Tomsho LP, Qi J, Glaser RL, Schuster SC, Gilmour DS, IstvanAlbert , Pugh BF: Nucleosome organization in the Drosophila genome. Nature. 2008, 453: 358-362. 10.1038/nature06929.PubMed CentralView ArticlePubMedGoogle Scholar
- Pokholok DK, Harbison CT, Levine S, Cole M, Hannett NM, Lee TI, Bell GW, Walker K, Rolfe PA, Herbolsheimer E, Zeitlinger J, Lewitter F, Gifford DK, Young RA: Genome-wide map of nucleosome acetylation and methylation in yeast. Cell. 2005, 122 (4): 517-527. 10.1016/j.cell.2005.06.026.View ArticlePubMedGoogle Scholar
- Liu CL, Kaplan T, Kim M, Buratowski S, Schreiber SL, Friedman N, Rando OJ: Single-nucleosome mapping of histone modifications in S. cerevisiae. PLoS Biology. 2005, 3 (10): 10.1371/journal.pbio.0030328.Google Scholar
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-Resolution Profiling of Histone Methylations in the Human Genome. Cell. 2007, 129 (4): 823-837. 10.1016/j.cell.2007.05.009.View ArticlePubMedGoogle Scholar
- Peckham HE, Thurman RE, Fu Y, Stamatoyannopoulos JA, Noble WS, Struhl K, Weng Z: Nucleosome positioning signals in genomic DNA. Genome Research. 2007, 17 (8): 1170-1177. 10.1101/gr.6101007.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Shin H, Song JS, Lei Y, Liu XS: Identifying Positioned Nucleosomes with Epigenetic Marks in Human from ChIP-Seq. BMC Genomics. 2008, 9: 537-10.1186/1471-2164-9-537.PubMed CentralView ArticlePubMedGoogle Scholar
- Rippe K, Schrader A, Riede P, Strohner R, Lehmann E, Langst G: DNA sequence- and conformation-directed positioning of nucleosomes by chromatin-remodeling complexes. Proceedings of the National Academy of Sciences of the United States of America. 2007, 104 (40): 15635-15640. 10.1073/pnas.0702430104.PubMed CentralView ArticlePubMedGoogle Scholar
- Schnitzler GR: Control of Nucleosome Positions by DNA Sequence and Remodeling Machines. Cell Biochemistry and Biophysics. 2008, 51 (2-3): 67-80. 10.1007/s12013-008-9015-6.View ArticlePubMedGoogle Scholar
- Widlund HR, Vitolo M, Thiriet C, Hayes JJ: DNA sequence-dependent contributions of core histone tails to nucleosome stability: differential effects of acetylation and proteolytic tail removal. Biochemistry. 2000, 39 (13): 3835-3841. 10.1021/bi991957l.View ArticlePubMedGoogle Scholar
- Yang Z, Zheng C, Hayes JJ: The core histone tail domains contribute to sequence-dependent nucleosome positioning. Journal of Biological Chemistry. 2007, 282 (11): 7930-7938. 10.1074/jbc.M610584200.View ArticlePubMedGoogle Scholar
- Dai Z, Dai X, Xiang Q, Feng J, Deng Y, Wang J, He C: Transcriptional interaction-assisted identification of dynamic nucleosome positioning. BMC Bioinformatics. 2009, 10 (Suppl l): S31-10.1186/1471-2105-10-S1-S31.PubMed CentralView ArticlePubMedGoogle Scholar
- Lavrac N, Kavsek B, Flach P, Todorovski L: Subgroup discovery with CN2-SD. Journal of Machine Learning Research. 2004, 5: 153-188.Google Scholar
- Wang G, Zhang W: A steganalysis-based approach to comprehensive identification and characterization of functional regulatory elements. Genome Biology. 2006, 7 (6):Google Scholar
- Das MK, Dai HK: A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007, 8 (Suppl 7): S21-10.1186/1471-2105-8-S7-S21.PubMed CentralView ArticlePubMedGoogle Scholar
- Kurdistani SK, Tavazoie S, Grunstein M: Mapping global histone acetylation patterns to gene expression. Cell. 2004, 117 (6): 721-733. 10.1016/j.cell.2004.05.023.View ArticlePubMedGoogle Scholar
- Hebbes TR, Thorne AW, Crane-Robinson C: A direct link between core histone acetylation and transcriptionally active chromatin. The EMBO Journal. 1988, 7 (5): 1395-1402.PubMed CentralPubMedGoogle Scholar
- Wang A, Kurdistani SK, Grunstein M: Requirement of Hos2 histone deacetylase for gene activity in yeast. Science. 2002, 298 (5597): 1412-1414. 10.1126/science.1077790.View ArticlePubMedGoogle Scholar
- de Nadal E, Zapater M, Alepuz PM, Sumoy L, Mas G, Posas F: The MAPK Hog1 recruits Rpd3 histone deacetylase to activate osmoresponsive genes. Nature. 2004, 427 (6972): 370-374. 10.1038/nature02258.View ArticlePubMedGoogle Scholar
- Jiang C, Pugh BF: Nucleosome positioning and gene regulation: advances through genomics. Nature Reviews Genetics. 2009, 10: 161-172. 10.1038/nrg2522.View ArticlePubMedGoogle Scholar
- Pavlidis P, Wapinski I, Noble WS: Support vector machine classification on the web. Bioinformatics. 2004, 20: 586-587. 10.1093/bioinformatics/btg461.View ArticlePubMedGoogle Scholar
- Clark P, Nibblet T: The CN2 induction algorithm. Machine Learning. 1989, 3: 261-283.Google Scholar
- Pham TH, Clemente JC, Satou K, Ho TB: Computational discovery of transcriptional regulatory rules. Bioinformatics. 2005, 21: ii101-ii107. 10.1093/bioinformatics/bti1117.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.