Map of open and closed chromatin domains in Drosophila genome
© Milon et al.; licensee BioMed Central Ltd. 2014
Received: 12 May 2014
Accepted: 23 October 2014
Published: 18 November 2014
Chromatin compactness has been considered a major determinant of gene activity and has been associated with specific chromatin modifications in studies on a few individual genetic loci. At the same time, genome-wide patterns of open and closed chromatin have been understudied, and are at present largely predicted from chromatin modification and gene expression data. However the universal applicability of such predictions is not self-evident, and requires experimental verification.
We developed and implemented a high-throughput analysis for general chromatin sensitivity to DNase I which provides a comprehensive epigenomic assessment in a single assay. Contiguous domains of open and closed chromatin were identified by computational analysis of the data, and correlated to other genome annotations including predicted chromatin “states”, individual chromatin modifications, nuclear lamina interactions, and gene expression. While showing that the widely trusted predictions of chromatin structure are correct in the majority of cases, we detected diverse “exceptions” from the conventional rules. We found a profound paucity of chromatin modifications in a major fraction of closed chromatin, and identified a number of loci where chromatin configuration is opposite to that expected from modification and gene expression patterns. Further, we observed that chromatin of large introns tends to be closed even when the genes are expressed, and that a significant proportion of active genes including their promoters are located in closed chromatin.
These findings reveal limitations of the existing predictive models, indicate novel mechanisms of epigenetic regulation, and provide important insights into genome organization and function.
Chromatin compactness is the key feature of chromatin that reflects its accessibility to transcription machinery. Tightly packed closed chromatin is considered a hallmark of gene silencing, and chromatin opening precedes lineage-specific gene expression thus providing an excellent indicator of cell fate commitment [1, 2]. However, genome-wide analyses of chromatin configuration have been focused not on direct assessment of chromatin compactness, but on predictions based on chromatin marks such as DNA methylation and histone modifications. Predictive models recognize numerous chromatin “states” presumably indicating regulatory elements, gene activity, and other aspects of genome biology [3, 4], thereby deducing chromatin configuration from our knowledge of gene expression and chromatin marks. Such predictive approach intrinsically limits the discovery of novel mechanistic links between chromatin configuration and gene expression as well as chromatin modifications, necessitating development of the alternative, more direct means to analyze genome-wide patterns of open and closed chromatin. Moreover, although the models of predicted “states” are excellent tools for basic research, they require examination of numerous chromatin marks in multiplicity of assays and thus are not readily applicable to routine analysis of small clinical samples.
Results and discussion
To measure GCSDI across genome we combined a brief DNase I treatment of permeabilized cells with random amplification of the DNase I-nicked genomic DNA, followed with analysis of sequence representation in amplified material by a high-throughput method. DNase I preferentially nicks DNA in open chromatin, rendering these regions inefficient template for amplification and thus predisposing them for under-representation in amplified material. The difference in representation between known open and closed chromatin loci can be reliably detected by the GCSDI assay after treatment with diverse amounts of DNase I, and thus is a reliable analysis outcome that is not overly sensitive to DNase I treatment conditions (Additional file 1: Figure S1).
To generate a GCSDI profile across genome, amplified DNA samples from DNase I-treated and control untreated cells (n = 2) were hybridized with tiling Affymetrix microarrays, signal intensities for each probe were averaged within the experimental groups and fold differences between the groups were calculated. Positive log2 values were assigned to the sequences underrepresented in DNase I-treated sample (open) and negative values - to the sequences overrepresented in DNase I-treated sample (closed). The identified open and closed chromatin regions were extensive and contiguous, consistent with previous low-throughput studies [5–8] and in contrast with the narrow discrete regions detected by the DHS assay [4, 12] (Figure 1, GCSDI versus DHSs tracks).
Two segmentation models of chromatin compactness were created using the genome-wide GCSDI profiles. (i) We used a sliding window algorithm to identify transition points and to segment genome into contiguous series of open or closed chromatin domains, with the mean size of 15 kb and ranging up to 500 kb (Additional file 2: Figure S2A). Resulting Two-Configuration Model (referred hereafter as 2CM) is well compatible with other large-scale genome features such as lamina-associated domains (LADs)  but did not provide sufficient resolution for analysis of some gene-dense regions with small genes. This problem was overcome by implementing another type of analysis (ii) using HMM to identify positive or negative peaks of differential signals, and consolidating clusters of such peaks into domains. This approach to identification of closed and open chromatin domains was more selective, but at a cost of assigning about one-third of genome to domains that are neither open nor closed, thereby defined as “neutral”. Thus, the outcome of such analysis was a Three-Configuration Model (3CM) of domains with the mean size of 3–10 kb (Additional file 2: Figure S2B). Further analyses showed similar results for 2CM and 3CM. We present findings for 2CM in the main figures and the majority of results for 3CM, are shown in Additional file figures.
Intriguingly, we also found that active genes with closed promoters showed lower transcript levels than conventionally expected active gene with open promoters, indicating that chromatin compactness may serve to modulate active gene expression. The mechanisms underlying this type of regulation warrant further inquiry, as do the other unexpected trends identified in our study, such as the general paucity of known chromatin marks positively identifying closed chromatin and the tendency of large introns to stay in closed configuration even when the genes are expressed. We expect that the novel analysis of epigenomic regulation with a straightforward and sensitive assay described here will contribute an empirical approach supplementing predictive chromatin structure assessments, thereby advancing both basic and biomedical research in chromatin biology.
General chromatin sensitivity to DNase I
DNase I treatment of cells was performed as previously described  with minor modifications. 1×106 S2 cells were permeabilized with 0.05% NP40 and resuspended in DNase I Buffer (40 mM Tris–HCl, 0.4 mM EDTA, 10 mM MgCl2, 10 mM CaCl2, 0.1 mg/ml BSA). Part of each sample was set aside and later used as non-digested control. DNase I (Promega) digestion was performed at 37°C for 10 or 15 minutes with diverse amounts of the enzyme to optimize the procedure (0.1U, 0.5U, 1U, 2U and 5U); for further analysis cells were treated with 0.5U DNase I for 10 minutes. 20 ng of DNA purified from the treated cells using the DNeasy Blood & Tissue Kit (Qiagen) were used as a template for whole genome amplification. The library preparation step using GenomePlex WGA2 Kit was followed by amplification with the GenomePlex WGA 3 Reamplification Kit (Sigma). dUTP was incorporated at the amplification step to enable the probe fragmentation procedure according to Affymetrix recommendations. The amplification product was purified with the Wizard SV Gel and PCR Clean-Up System (Promega), fragmented and labeled using the GeneChip WT Double-Stranded DNA Labeling Kit (Affymetrix), and hybridized with GeneChip Drosophila Tiling 2.0R Array following the manufacturer’s instructions.
GCSDI data analysis and genome segmentation
Raw data from microarray CEL files were normalized using CisGenome  and log2 differential signals were calculated for each probe.
To generate two-configuration model (2CM), probes with log2 differential signal within 1 standard deviation of the mean were discarded and the signals for the remaining probes were capped at 1 for positive and -1 for negative. A sliding window was used to determine transition points between open (positive) and closed (negative) segments. The difference (d) between the mean log2 differential signals of flanking regions was calculated for every probe. The cutoff d values were established by analyzing 100 permutations of probes, with the requirement that real genome data is significantly different from permuted models (p < 0.05). A series of significant d values and flanking region sizes (n) were tested to determine the model most discriminating between real genome and random permutation. The results presented here are based on analysis using n = 48 and d = 0.8, which identified 2244 transition probes in the real dataset and only 72 on average in permutated controls.
To generate three-configuration model (3CM), all probes were analyzed for presence of differential signal peaks using an HMM algorithm built in CisGenome, with a posterior probability greater than 0.5. An FDR value of 0.1 was used to filter the detected peaks which were further consolidated into domains as following: two adjacent peaks are joined if the distance between them is less that threshold value (16721 bp, which is 95th percentile of inter-peak distances in fly genome), and if they are of the same sign (either both positive or both negative). Otherwise, the segment between the peaks is assigned neutral state. 2CM and 3CM domain coordinates in BDGP5 genome annotation are provided as tables in Additional files 9 and 10.
Association of segments with lamina-associated domains (LADs), histone modifications, and chromatin states
LAD coordinates  were downloaded from NCBI GEO (GSE20313). The coordinates for both “Binding Sites” and “Depleted” regions for 49 individual histone modifications were obtained from the modENCODE project (http://www.modencode.org) ; specifically, the following datasets were used: H1.S2; H2AV_9751.S2; H2BK5ac.S2; H2B.ubiq.NRO3..S2; H3 antibody2.S2; H3K9ac.S2; H3K18ac.S2; H3K23ac.S2; H3K27Ac.S2; H3K27me1.S2; H3K27me2_TJ.S2; H3K27me3.Abcam2..S2; H3K36me1.S2; H3K36me3.S2; H3K4me1.S2; H3K4me2.ab.S2; H3K4Me2.Millipore.S2; H3K4me3_S2; H3K79me1.S2; H3K79Me2.S2; H3K79Me.S2; H3K9acS10P_.new_lot..S2; H3K9ac.S2; H3K9me1_Diagenode.S2; H3K9me1.S2; H3K9me2.Ab2.new_lot.S2; H3K9me2Antibody2.S2; H3K9me3.S2; K3K9me3_clone_6F12_H4S2; H4.S2; H4K5ac.S2; H4K8ac.S2; H4K12ac.S2; H4K16ac(L).S2; H4K16ac(M).S2; H4AcTetra.S2; H4K20me.S2; Hp1a_552.S2; HP1a_hinge.S2; HP1a_wa184.S2; HP1a_wa191.S2. DHS data were obtained from the on-line resource http://compbio.med.harvard.edu/flychromatin/data.html. Coordinates for the 9 predicted chromatin states were obtained from the Modencode progect . All datasets were converted to the BDGP5 Drosophila genome annotation as needed. In order to determine the association of chromatin compactness and each of the above genome annotations, we calculated the cumulative overlap between the open, closed, and neutral segments and the previously characterized LADs, histone modification enriched/depleted regions, and predicted chromatin states. The analyses were performed for whole genome as well as for individual chromosomes and their heterochromatic compartments.
Association of segments with gene expression
We downloaded short-read (Illumina) sequences for 5 massively parallel mRNA sequencing experiments on S2 cells from two GEO datasets (GSM390063 and GSM390064), aligned these reads to the Drosophila reference genome (BDGP5) using TopHat , and calculated the Reads Per Kilobase of transcript per Million mapped reads (RPKM) for each gene reported in the BDGP5 reference annotation. We used the criteria of genes with an RPKM value of at least 1 in all five samples to classify genes as either active or inactive. The cumulative overlaps of open, closed, and neutral segments with active and inactive genes were computed to determine the association of chromatin compactness with gene expression. Similar analysis was conducted to determine associations with gene length and different functional regions of a gene (promoter, exon, intron, 5′UTR and 3′UTR regions). To analyze the link between gene/intron size and chromatin structure, the genes were separated into three categories (less than 1 kb, 1–4 kb, and more than 4 kb) and the introns were separated into 5 categories (81 bp or less, 81–200 bp, 201 bp – 1 kb, 1–10 kb, and larger than 10 kb). Cumulative overlaps of these categories of genes and introns with open, closed, and neutral segments were computed. In order to determine the profile of chromatin compactness within introns larger than 10 kb, the introns were divided into non-overlapping windows of 100 bp. These windows were pooled from all introns according to their position relative to the 5′ and 3′ intron ends, and the cumulative overlaps with chromatin compactness segments were computed. The ratio of the open to close states was derived separately for the long medium intron (1 kb – 10 kb) and long intron (>10 kb) to illustrate the transition of DNAse I hypersensitivity states within intronic regions of active and inactive genes.
This work was supported by NSF grant 0842797 and NIH grant GM061549. Dr. Maria Nurminskaya is currently working at National Institutes of Health. This work was prepared while she was employed at the University of Maryland. The opinions expressed in this article are the author’s own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government.
- Krawetz SA, Kramer JA, McCarrey JR: Reprogramming the male gamete genome: a window to successful gene therapy. Gene. 1999, 234: 1-9. 10.1016/S0378-1119(99)00147-X.PubMedView ArticleGoogle Scholar
- Tollervey JR, Lunyak VV: Epigenetics: judge, jury and executioner of stem cell fate. Epigenetics. 2012, B: 823-840.View ArticleGoogle Scholar
- Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010, 28: 817-825. 10.1038/nbt.1662.PubMed CentralPubMedView ArticleGoogle Scholar
- Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, Linder-Basso D, Plachetka A, Shanower G, Tolstorukov MY, Luquette LJ, Xi R, Jung YL, Park RW, Bishop EP, Canfield TK, Sandstrom R, Thurman RE, MacAlpine DM, Stamatoyannopoulos JA, Kellis M, Elgin SC, Kuroda MI, Pirrotta V, Karpen GH, Park PJ: Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature. 2011, 471: 480-485. 10.1038/nature09725.PubMed CentralPubMedView ArticleGoogle Scholar
- Hebbes TR, Clayton AL, Thorne AW, Crane-Robinson C: Core histone hyperacetylation co-maps with generalized DNase I sensitivity in the chicken beta-globin chromosomal domain. EMBO J. 1994, 13: 1823-1830.PubMed CentralPubMedGoogle Scholar
- Choudhary SK, Wykes SM, Kramer JA, Mohamed AN, Koppitch F, Nelson JE, Krawetz SA: A haploid expressed gene cluster exists as a single chromatin domain in human sperm. J Biol Chem. 1995, 270: 8755-8762. 10.1074/jbc.270.15.8755.PubMedView ArticleGoogle Scholar
- Bulger M, Schübeler D, Bender MA, Hamilton J, Farrell CM, Hardison RC, Groudine M: A complex chromatin landscape revealed by patterns of nuclease sensitivity and histone modification within the mouse beta-globin locus. Mol Cell Biol. 2003, 23: 5234-5244. 10.1128/MCB.23.15.5234-5244.2003.PubMed CentralPubMedView ArticleGoogle Scholar
- Kalmykova AI, Nurminsky DI, Ryzhov DV, Shevelyov YY: Regulated chromatin domain comprising cluster of co-expressed genes in Drosophila melanogaster. Nucleic Acids Res. 2005, 33: 1435-1444. 10.1093/nar/gki281.PubMed CentralPubMedView ArticleGoogle Scholar
- Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ: Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013, 10: 1213-1218. 10.1038/nmeth.2688.PubMed CentralPubMedView ArticleGoogle Scholar
- Rizzo JM, Sinha S: Analyzing the global chromatin structure of keratinocytes by MNase-Seq. Methods Mol Biol. 2014, [Epub ahead of print]Google Scholar
- He HH, Meyer CA, Hu SS, Chen MW, Zang C, Liu Y, Rao PK, Fei T, Xu H, Long H, Liu XS, Brown M: Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat Methods. 2014, 11: 73-78.PubMed CentralPubMedView ArticleGoogle Scholar
- Song L, Crawford GE: DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc. 2010, 2010 (2): pdb.prot5384-PubMed CentralPubMedView ArticleGoogle Scholar
- Riddle NC, Shaffer CD, Elgin SC: A lot about a little dot - lessons learned from Drosophila melanogaster chromosome 4. Biochem Cell Biol. 2009, 87: 229-241. 10.1139/O08-119.PubMed CentralPubMedView ArticleGoogle Scholar
- van Bemmel JG, Pagie L, Braunschweig U, Brugman W, Meuleman W, Kerkhoven RM, van Steensel B: The insulator protein SU(HW) fine-tunes nuclear lamina interactions of the Drosophila genome. PLoS ONE. 2010, 5: e15013-10.1371/journal.pone.0015013.PubMed CentralPubMedView ArticleGoogle Scholar
- Shevelyov YY, Nurminsky DI: The nuclear lamina as a gene-silencing hub. Curr Issues Mol Biol. 2012, 14: 27-38.PubMedGoogle Scholar
- Brodsky AS, Meyer CA, Swinburne IA, Hall G, Keenan BJ, Liu XS, Fox EA, Silver PA: Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells. Genome Biol. 2005, 6: R64-10.1186/gb-2005-6-8-r64.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee YC, Chang HH: The evolution and functional significance of nested gene structures in Drosophila melanogaster. Genome Biol Evol. 2013, 5: 1978-1985. 10.1093/gbe/evt149.PubMed CentralPubMedView ArticleGoogle Scholar
- Boutanaev AM, Kalmykova AI, Shevelyov YY, Nurminsky DI: Large clusters of co-expressed genes in the Drosophila genome. Nature. 2002, 420: 666-669. 10.1038/nature01216.PubMedView ArticleGoogle Scholar
- Shevelyov YY, Lavrov SA, Mikhaylova LM, Nurminsky ID, Kulathinal RJ, Egorova KS, Rozovsky YM, Nurminsky DI: The B-type lamin is required for somatic repression of testis-specific gene clusters. Proc Natl Acad Sci U S A. 2009, 106: 3282-3287. 10.1073/pnas.0811933106.PubMed CentralPubMedView ArticleGoogle Scholar
- Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, Bar-Even A, Horn-Saban S, Safran M, Domany E, Lancet D, Shmueli O: Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005, 21: 650-659. 10.1093/bioinformatics/bti042.PubMedView ArticleGoogle Scholar
- Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH: An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol. 2008, 26: 1293-1300. 10.1038/nbt.1505.PubMed CentralPubMedView ArticleGoogle Scholar
- modENCODE Consortium: Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010, 330: 1787-1797.View ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.