To measure GCSDI across genome we combined a brief DNase I treatment of permeabilized cells with random amplification of the DNase I-nicked genomic DNA, followed with analysis of sequence representation in amplified material by a high-throughput method. DNase I preferentially nicks DNA in open chromatin, rendering these regions inefficient template for amplification and thus predisposing them for under-representation in amplified material. The difference in representation between known open and closed chromatin loci can be reliably detected by the GCSDI assay after treatment with diverse amounts of DNase I, and thus is a reliable analysis outcome that is not overly sensitive to DNase I treatment conditions (Additional file 1: Figure S1).
To generate a GCSDI profile across genome, amplified DNA samples from DNase I-treated and control untreated cells (n = 2) were hybridized with tiling Affymetrix microarrays, signal intensities for each probe were averaged within the experimental groups and fold differences between the groups were calculated. Positive log2 values were assigned to the sequences underrepresented in DNase I-treated sample (open) and negative values - to the sequences overrepresented in DNase I-treated sample (closed). The identified open and closed chromatin regions were extensive and contiguous, consistent with previous low-throughput studies [5–8] and in contrast with the narrow discrete regions detected by the DHS assay [4, 12] (Figure 1, GCSDI versus DHSs tracks).
Two segmentation models of chromatin compactness were created using the genome-wide GCSDI profiles. (i) We used a sliding window algorithm to identify transition points and to segment genome into contiguous series of open or closed chromatin domains, with the mean size of 15 kb and ranging up to 500 kb (Additional file 2: Figure S2A). Resulting Two-Configuration Model (referred hereafter as 2CM) is well compatible with other large-scale genome features such as lamina-associated domains (LADs) [10] but did not provide sufficient resolution for analysis of some gene-dense regions with small genes. This problem was overcome by implementing another type of analysis (ii) using HMM to identify positive or negative peaks of differential signals, and consolidating clusters of such peaks into domains. This approach to identification of closed and open chromatin domains was more selective, but at a cost of assigning about one-third of genome to domains that are neither open nor closed, thereby defined as “neutral”. Thus, the outcome of such analysis was a Three-Configuration Model (3CM) of domains with the mean size of 3–10 kb (Additional file 2: Figure S2B). Further analyses showed similar results for 2CM and 3CM. We present findings for 2CM in the main figures and the majority of results for 3CM, are shown in Additional file figures.
Domains of open, closed, and neutral chromatin identified by both models appeared interspersed across genome (Figure 2, Additional file 3: Figure S3). In euchromatin, 2CM detected approximately 60% of genome in closed chromatin and 40% - in open (Figure 3A), while 3CM detected 37% of genome in closed, 23% in open, and 40% in neutral chromatin (Additional file 4: Figure S4A). Both the pericentromeric heterochromatin regions and chromosome 4 were heavily enriched with neutral chromatin in 3CM, however heterochromatin only (not chromosome 4) showed an overabundance of closed chromatin in 2CM. Therefore, chromosome 4 appears to share overall similarities with both euchromatin and heterochromatin, consistent with known interspersion of unique sequences and repeat clusters in this genome region [13].
Next, we analyzed the relationships between identified open and closed chromatin domains and chromatin modifications, beginning with the nine major chromatin predicted “states” [4]. In general, predictions were confirmed in agreement with previous research linking chromatin opening with cis-regulation and gene expression [5–9] as states 1 through 3 (regulatory and transcribed sequences) mostly corresponded to the open chromatin and the states 6, 8, and 9 (Polycomb-mediated repression, intercalated heterochromatin, and silent intergenic regions) were predominantly identified as closed (Figure 3B, Additional file 4: Figure S4B). State 7 (pericentromeric heterochromatin) was mostly identified as closed in 2CM and neutral in 3CM consistent with enrichment of the pericentromeric regions with these configurations (Figure 2, Additional file 3: Figure S3). In a complementary analysis, we found that a major proportion of open chromatin has been predicted as one of the “active” chromatin states 1 through 3, and the majority of closed chromatin - as “inactive” states 6 through 9 (Figure 4, Additional file 5: Figure S5). Although the state 5 (active genes on the X chromosome) appeared to be similarly represented by open and closed chromatin in the whole-genome study, analysis focusing on the X chromosome showed this state representing about 40% of open and a lesser fraction of closed chromatin. Thus, overall results of our analysis of chromatin compactness were consistent with the chromatin state predictions based on chromatin modification marks, providing cross-validation of these two approaches. However there were a number of discrepancies as well. We were unable to find specific correlations between chromatin compactness and state 4 (active gene introns) which was equally distributed between open and closed chromatin and contributed about 10% to all chromatin configurations (Figures 3 and 4, Additional file 4: Figure S4 and Additional file 5: Figure S5); noteworthy, this promiscuous distribution pertained to all four distinct sub-states 18, 19, 20 and 21 which have been consolidated in state 4 [4] (Additional file 4: Figure S4C). This finding reflected a peculiar relationship between gene expression and intron chromatin structure, described in more detail below. Also, a relatively minor portion (23%) of the closed chromatin detected in our genome-wide analysis has been predicted as “active” states 1 through 3 and a similar fraction of open chromatin (26%) – as “inactive” states 6 through 9 (Figure 4). A visual inspection of the GCSDI signal distribution showed that at least some of these mismatches represented genuine differences between the direct and the predictive approaches to chromatin structure analysis. Figure 1 provides an example: yellow arrows indicate open chromatin detected in the regions predicted as “heterochromatin” (blue) and “Polycomb-repressed” (black). These findings identify cases of potentially novel unconventional epigenetic regulation which warrant further mechanistic inquiries.
We also analyzed distribution of individual modifications which have been traditionally linked to certain chromatin structure predictions, and intriguingly found that while open chromatin was associated with numerous abundant chromatin modifications, closed chromatin was largely unmodified. This was true in whole-genome analysis (Figure 5) and also when the major autosome euchromatin, X chromosome, heterochromatin, and chromosome 4 were analyzed separately (Additional file 6: Figure S6). Histone acetylation (with single exception of H3K23), ubiquitination, phosphorylation, and methylation at H3K4, H3K36, and H3K79 were indicators of open chromatin, and depletion of these modifications was characteristic of closed domains. Among the positive indicators of closed chromatin, dimethylation of H3K9 and especially trimethylation of H3K27 were prominent, but still enrichment with these modifications accounted only for less than one-quarter of closed chromatin in 2CM and one-third in 3CM analysis. Thus, the prevalent mechanisms underlying chromatin closing do not appear to extensively rely on known chromatin marks, indicating that yet unknown chromatin compaction-related modifications may exist - or perhaps that “closed” is the default state of unmodified chromatin (note that abundant DNA methylation is lacking in Drosophila, hence it has little direct contribution to chromatin structure).
Taking into account that morphologically dense heterochromatin is often situated at nuclear periphery, we proposed that a significant proportion of closed chromatin is included in lamina-associated domains (LADs). Indeed, comparison of our GCSDI analysis with the LAD map of Drosophila genome [14] revealed that LADs were predominantly closed (Figure 6A, Additional file 7: Figure S7A) and approximately one-half of the closed chromatin in the genome was included in LADs (Figure 6B, Additional file 7: Figure S7B). Considering the emerging major role for lamina in gene repression [15] these findings were consistent with the model in which chromatin compaction is a feature of gene silencing, prompting further inquiry into the relationship between chromatin configuration and gene expression.
While intergenic spacers were mostly closed or neutral, actively expressed genes were predominantly open and silent genes were generally closed across the genome (Figure 7A,B, Additional file 8: Figure S8A,B). However, this analysis unexpectedly identified a substantial fraction (one-third in 2CM and one-tenth in 3CM) of active gene chromatin in closed configuration. Intriguingly, the gene size appeared a major determinant, with larger active genes displaying more closed chromatin (Figure 7C, Additional file 8: Figure S8C). Structural elements of the active genes were predominantly open with a single exception of introns that were equally represented by the open and closed chromatin, relevant to the aforementioned promiscuous distribution of the predicted chromatin state 4. Interestingly, the proportion of introns with closed chromatin configuration increased rapidly as intron length exceeded 1 kbp (Figure 7D, Additional file 8: Figure S8D). Within the long introns of active genes, the closed chromatin content was the highest in the middle and gradually decreased over several kbp toward the exon/intron borders (Figure 7E, Additional file 8: Figure S8E). Taking into account a rapid transition of RNA polymerase across large introns [16], it can be proposed that chromatin in these regions can quickly condense once the transcription complex has passed. This apparent disconnect between the activity of the genes with large introns and the intron chromatin structure probably underlies regulation of interleaved gene arrangements, where small nested genes often show little correlation with expression of larger including genes that harbor them in introns [17].
Another intriguing finding was the presence of closed chromatin in some active gene promoters (17% in 2CM and 3% in 3CM) and open chromatin in silent gene promoters (one-third in 2CM and 17% in 3CM). We first sought to rule out the trivial explanations such as frequent presence of alternative inactive promoters in active genes, as well as imprecision of chromatin analysis or incorrect selection of the promoter regions. In these cases, even though a promoter may appear in “odd” configuration, the chromatin structure of the gene body would match its expression status. We found just the opposite - the chromatin configuration of the gene body followed that of the promoter (Figure 8A) indicating that some genes can be active in closed chromatin and also that some silent genes are open. The sets of genes defined as active or silent were still clearly distinct in their expression levels regardless of their promoter chromatin configuration, however while silent genes with closed promoters showed essentially no detectable expression at all, a significant fraction of their counterparts with open promoters demonstrated very low but noticeable expression (Figure 8B) consistent with the model in which chromatin compaction completely shuts down expression of silenced genes while opening (“potentiation”) of chromatin exposes genes to transcriptional machinery [1]. Closing of chromatin domains may be used for strict control of tissue-specific genes, especially those organized in large clusters on chromosomes [18, 19]. To test this suggestion, we analyzed 66 clusters of three or more testis-biased genes [19] and found that 28 of them represented uninterrupted domains of closed chromatin, 23 - continuous domains of open chromatin, and only 15 had a transition between open and closed domains within the cluster. We further analyzed cluster genes from the uninterrupted domains of open or closed chromatin. Genes from closed clusters (n = 122) indeed had higher tissue specificity and thus tighter transcriptional control than their counterparts from open clusters (n = 72) as their expression breadth metric tau[20] was significantly higher (p < 10-8, t-test and U-test) (Figure 8C).