The ChIP protocol, which is previously described in ([10]; http://younglab.wi.mit.edu/hESRegulation/Young_Protocol.doc), is outlined in Fig. 1a (top). A ChIP experiment typically yields a complex array of DNA sequences that are selectively immunoprecipitated from entire genomes in a population of cells (‘purified ChIPed gDNA’). The abundance of each sequence reflects the frequency at which the protein or modification is found at that position in the cell population. During analysis, sequence abundance is most commonly measured using a candidate approach with specific primers, or genome-wide by quantitative sequencing. In either case, a sample of the chromatin used for ChIP is included as control (‘purified input gDNA’).
Overview of the 2C-ChIP technology
The 2C-ChIP method fills the need for a versatile, low-cost, and high-throughput approach to quantify ChIP signals at defined genomic regions. 2C-ChIP detects immunoprecipitated DNA sequences by multiplex ligation-mediated amplification (LMA; Fig. 1b). During LMA, primer pairs are annealed next to each other on the same DNA strand, and only contiguously annealed primers can be ligated to quantitatively detect target sequences. The 2C-ChIP primers feature universal tails such that products from individual libraries can be PCR-amplified in a single step. The amplification primers also contain barcode sequences, which allows multiplexing of the 2C-ChIP libraries themselves prior to sequencing. As with any other LMA-based approach ([6] and references therein), 2C-ChIP can be conducted with thousands of primers in single reactions to detect target sequences at genome-scale without compromising linearity.
To analyze target regions by 2C-ChIP, a mixture of adjacent forward and reverse primer pairs is first annealed to ChIPed material. As usual in LMA, only reverse primers are 5′ end-phosphorylated to enable ligation. The primer pairs are designed to tile the region(s) of interest at a density selected to achieve the coverage and resolution desired. The annealed primer pairs are next ligated with Taq DNA ligase, which selectively ligates hybridized oligos only when positioned next to each other without overlap or gap between them. This step effectively produces a quantitative imprint or “carbon copy” of the probed sequences in immunoprecipitated samples. The resulting 2C-ChIP libraries are then PCR-amplified with sequencing primers of the platform of choice, featuring barcodes that are designed against the universal tails of the 2C-ChIP primers (Fig. 1b).
The HOXA gene cluster as a model system
We developed and validated the 2C-ChIP approach by characterizing the HOXA gene cluster region in NT2-D1 cells (Fig. 2a, b). The HOX genes are homeobox family members that encode helix-turn-helix transcription factors [11, 12]. In vertebrates, there are 39 HOX genes organized into 4 genomic clusters (A, B, C, and D) located on different chromosomes, with the human HOXA cluster residing on chromosome 7. HOX genes are highly conserved throughout evolution and play important roles both during development and adulthood.
During development, HOX genes are master regulators of anterior-posterior (A-P) body patterning, and are involved in organogenesis and the formation of limbs [13,14,15,16]. During body plan formation, HOX spatial and temporal expression is observed to follow the order of their positions along their respective chromosomes, with those at the cluster 3′ end expressed more anteriorly and earlier than those at the 5′ end. Similarly, during limb formation, 3′ end genes are expressed earlier and in the proximal limb region compared to those within the cluster’s 5′ end, which are turned on later and more distally in the limb. This spatio-temporal collinearity appears to emerge from many different types of control mechanisms, including the modulations of chromatin landscape and of three-dimensional chromatin organization. These underlying mechanisms have been scrutinized for over three decades and still remain under intense investigation.
An interesting feature of the human HOXA gene cluster is that it resides in a bivalent domain when transcriptionally silent [17]. The HOXA cluster contains 9 developmentally regulated protein-coding genes encoded on the same DNA strand and 4 intergenic lncRNAs transcribed from the opposite strand (Fig. 2a). Bivalent chromatin regions are DNA segments featuring both activating (H3K4me3) and repressing (H3K27me3) epigenetic marks. HOX, like many other genes that regulate cellular differentiation, are thought to exist in this state to keep them inactive but poised for rapid activation [18]. Whether transcription of a gene will be turned on appears to depend on the removal of polycomb group proteins (PcG) and their modifications [19]. Bivalent domains are also often regulated by lncRNAs that recruit enzyme complexes responsible for activating or repressing histone marks [7, 20, 21].
Interestingly, at least one of these lncRNAs can regulate three-dimensional chromatin organization as a mechanism to control the expression of genes. We recently demonstrated that HOTAIRM1 is required for proper collinear activation of the proximal HOXA genes by RA [7]. HOTAIRM1 is a lncRNA encoded between HOXA1 and 2 that is rapidly induced upon cell differentiation by RA in NT2-D1 cells. It binds chromatin-modifying complexes, and prevents premature expression of HOXA genes during the early stages of induction. Indeed, preventing HOTAIRM1 accumulation by RNAi results in failure to physically uncouple the HOXA1/2 and HOXA5/6/7 subTADs, leading to the premature expression of the latter genes. Thus, analyzing HOXA induction by RA in NT2-D1 cells, which recapitulate the cluster’s temporal collinearity observed in development [22, 23], allows for the integration of 2C-ChIP results with that of three-dimensional chromatin organization, lncRNA activity, and gene expression.
Validating 2C-ChIP against ChIP-qPCR and ChIP-seq data
The NT2-D1 cell model (Fig. 2b) is used extensively to study HOX gene regulation as it recapitulates their induction pattern in developing axial systems upon RA treatment [24,25,26,27]. Others and we have previously characterized the various changes in chromatin landscape accompanying this process [7, 26, 28, 29]. To develop 2C-ChIP with this differentiation system, we first measured gene expression in sample sets using quantitative real-time RT-PCR (RT-qPCR) (Fig. 2c, d; Additional file 1: Table S1) with previously validated primers ([7, 27, 30]; Additional file 2: Table S2). As expected [7], HOXA gene levels are nominal prior to induction and the expression of proximal genes is greatly increased upon RA treatment (Fig. 2c). LncRNA levels behave similarly, with HOTAIRM1 as the most induced transcript after the 3-day treatment (Fig. 2d). Using fixed chromatin, we then measured the levels of H3K4me3 and H3K27me3 by ChIP-qPCR at the promoter and within the body of strategically selected genes to assess the extent of epigenetic changes in matched sample sets (Additional file 3: Table S3). As expected [7], we find that H3K4me3 is present preferentially at the promoter of proximal genes before and after induction, and that it increases slightly following RA treatment (Fig. 2e, top). In contrast, H3K27me3 is found both at promoters and within gene bodies, and sharply decreases specifically at the proximal genes after RA induction (Fig. 2e, bottom).
Having verified that sample sets display the expected epigenetic changes upon treatment with the morphogen, we then used these fixed chromatin preparations to measure changes in H3K4me3 and H3K27me3 levels by 2C-ChIP (Fig. 2f). We designed 160 primer pairs covering the HOXA cluster at high density, as well as surrounding upstream and downstream regions at lower density to serve as negative controls (Additional file 4: Table S4; Additional file 5: Table S5). As would be expected from the ChIP-qPCR results, we observed higher levels of activating H3K4me3 at the promoter of proximal genes, accompanied with lower levels of H3K27me3 after 3 days of RA induction (Fig. 2f; Additional file 6: BED file 1–4). To compare the ChIP-qPCR and 2C-ChIP data, we calculated the average score of all 2C-ChIP probes located within a 1-kb window centered at the corresponding ChIP-qPCR amplicon, and calculated a Spearman-ranked correlation between the two data types. We find that the changes measured by 2C-ChIP correlate well with those evaluated by ChIP-qPCR (Fig. 2g). We also obtained similar ChIP-qPCR and 2C-ChIP results for SUZ12, a component of the polycomb repressive complex 2 (PRC2) responsible for the silencing H3K27me3 mark (Additional file 7: Figure S1; Additional file 6: BED file 5, 6).
We next compared our 2C-ChIP profiles with available ChIP-seq datasets from untreated NT2-D1 cells where the HOXA cluster is transcriptionally silent, covered with H3K27me3, and gene promoters are marked with H3K4me3 (Fig. 3a, b, left). ChIP-seq data for SUZ12, which like the modification it catalyzes on histones covered the entire cluster region, was also available (Fig. 3c, left). With our primer design (Additional file 8: Figure S2a), 2C-ChIP recapitulated the distribution of histone marks and SUZ12 throughout the region. Similar patterns of peaks and valleys were observed with both approaches. In this case, to directly compare the two data types, we added all the ChIP-seq reads in the raw datasets spanning the corresponding probed 2C-ChIP sequences, and calculated the Spearman-ranked correlation. When quantified in this fashion, 2C-ChIP and ChIP-seq displayed strong positive correlations (H3K4me3: ρs = 0.74; H3K27me3: ρs = 0.68; SUZ12: ρs = 0.80), despite the fact that different antibodies were used and ChIP samples were prepared by different laboratories (Fig. 3a, b, c, right panels). Together, these results show that 2C-ChIP quantitatively recapitulates data obtained either by ChIP-qPCR or ChIP-seq, two current benchmark ChIP signal analysis approaches.
Exploring the linear detection range of 2C-ChIP
The 2C-ChIP data presented above was generated with optimal DNA amounts (see below and Methods section). While optimizing the 2C-ChIP protocol, we noticed that the amount of DNA used to perform the LMA step of 2C-ChIP could impact results. Indeed, using too much gDNA sometimes led to inconsistent yields (Additional file 8: Figure S2b). This is particularly important when quantitative results are desired, especially since input samples tend to be concentrated and are used for 2C-ChIP normalization (see Methods section). To determine the optimal detection range, we titrated the amount of gDNA in 2C-ChIP assays using our HOXA region primer set to find that linearity is observed ranging from 0.16 ng to 1.6 ng and recommend using 1–2 ng as it provides very reproducible yields as quantified by TaqMan (Additional file 8: Figure S2a, c, d). We sequenced these 2C-ChIP samples to assess how gDNA concentrations might affect the quality of libraries, and mapped the reads to our expected forward-reverse pairings. We observe that both the total read number and percentage of mapped 2C-ChIP products are maintained in this nanogram range (Additional file 8: Figure S2e), while a very low quantity of gDNA generates fewer reads and a greater percentage of non-specific ligation products between forward and reverse primers (“off-diagonal pairs”; Additional file 8: Figure S2e, f, g). Based on these results, we chose to use approximately 1.6 ng of input gDNA for each of our 2C-ChIP experiments.
Quantitative detection of epigenomic changes at the HOXA cluster by 2C-ChIP during cell differentiation
Having optimized 2C-ChIP, we then applied it to an NT2-D1 differentiation time course (Fig. 4a). Our goal was to exploit 2C-ChIP’s flexibility to analyze numerous ChIP libraries simultaneously, and explored this by probing six different chromatin components (below) across four time points. As done when developing and optimizing 2C-ChIP, we first measured gene expression at the HOXA cluster in our sample sets by RT-qPCR (Fig. 4b). We observed that treatment with RA dramatically induced the proximal HOXA and lncRNA genes, and that induction patterns in these datasets correlate well with the first datasets used to devise 2C-ChIP (Additional file 9: Figure S3). Notably, we found that gene induction proceeded in a manner that is consistent with the known collinear activation of HOXA genes during development. Indeed, while the most 3′ proximal genes (HOXA1, and HOTAIRM1) were highly and rapidly expressed after the addition of RA (6 h), other proximal genes were induced more progressively – not yet reaching a plateau even 7 days post-induction while HOXA1 and HOTAIRM1 expression had started to decline by that time. This is in contrast to distal HOXA transcripts, which accumulated over time in this dataset.
We tested whether expected epigenetic changes accompanied gene induction in these sample sets using RT-qPCR and found higher H3K4me3 at proximal gene promoters accompanied with decreased H3K27me3 levels specifically at induced genes over the time course (Additional file 10: Figure S4a). These changes correlated well with those measured by RT-qPCR in the first RA-induced datasets (Additional file 10: Figure S4b). We then used the fixed chromatin preparations to measure changes in the levels of H3K4me3, Ash2L, H3K27me3, SUZ12, CTCF, and UTX over the HOXA cluster region by 2C-ChIP (Fig. 4c-h; Additional file 6: BED file 7–30). As with our first datasets, the 2C-ChIP results obtained for H3K4me3 and H3K27me3 in the time course correlated highly with data measured by ChIP-qPCR, and with available ChIP-seq datasets from untreated NT2-D1 samples (Additional file 10: Figure S4c, d). Importantly, there was a very strong correlation between 2C-ChIP biological replicates (R2 ranging from 0.82 to 0.96), except for Ash2L and UTX, which upon closer inspection exhibited a high background within regions residing outside of the HOXA cluster serving as a negative control (Additional file 11: Figure S5; Additional file 12: Figure S6; Additional file 6: BED file 31–36).
2C-ChIP analysis identifies a HOXA cluster region slow to demethylate
From the 2C-ChIP analysis of our differentiation time course, we found that rapidly activated genes (HOXA1 and HOTAIRM1) acquire higher levels of H3K4me3 at their promoters and along gene bodies compared to those at the proximal region induced later by RA (Fig. 4c). We also observed that while the PRC2 component SUZ12 is rapidly ejected from the cluster (6 h), the removal of associated silencing H3K27me3 mark occurs at a slower pace (Fig. 4e, f). The chromatin architectural protein CTCF conversely did not display notable changes when probed with our 2C-ChIP primer design, suggesting it remains stably anchored across cellular differentiation (Fig. 4g).
Analyzing 2C-ChIP data in greater details, we noticed that a region within HOXA seemed less prone to loss of H3K27me3, (Fig. 4e, dashed black box), and even SUZ12 albeit to a lesser extent. Interestingly, sequences from this region have previously been shown to repress transcription when inserted in the drosophila genome [9]. This H3K27me3 “island” overlaps perfectly with the HOXA-AS2 lncRNA gene induced later in the time course, and after HOTAIRM1. These observations suggest a functional relationship between the repressive nature of this H3K27-methylated genomic sequence and the expression of HOXA-AS2, perhaps through combinations of chromatin structure and conformation changes (see below).
Integrating gene expression, 2C-ChIP, and 5C data
We then set to study how changes in H3K27me3 and other chromatin marks translate into three-dimensional chromatin organization. To accomplish this, we performed 5C-seq analysis of the HOXA cluster in our time course samples (Fig. 5; Additional file 13: 5C dataset 1–4; Additional file 14: Table S6). As we reported previously, RA induction changes spatial chromatin organization at the HOXA cluster [7]. As before, the proximal and distal parts of the cluster appeared physically separated and confined to their own topologically associating domains (TADs). Both regions were also organized into several partially overlapping subdomains (subTADs). We observed long-range contacts between early RA-induced proximal genes (HOXA1/2) and those destined for induction at a later time (HOXA5/6/7). However, in this dataset, the contact in untreated cells (0 h) was weaker than previously seen [7], and seemingly depends on the state of the actively growing population. Such variability in the strength of this long-range contact in untreated cells can lead to differences in whether or not it is captured more frequently before or after induction with RA (Additional file 15: Figure S7a). Nevertheless, we saw progressively less long-range contacts within the proximal HOXA region as differentiation ensued, which was accompanied by greater short-range interaction frequencies, locally where genes were most expressed (Additional file 15: Figure S7). In fact, the greatest gain of contact was within the subTAD containing HOXA1 and HOTAIRM1, which are the two most induced genes of the time course. Given that the greatest increase in H3K4me3 is at this subdomain, this result suggests that here, 5C captures more contacts when genes have high H3K4me3 levels.
While the expression of some genes began tapering off seven days after RA induction (7d), we noticed that H3K4me3 continues to increase slightly over the time course, and more noticeably at HOXA5/6/7. This coincides with the continued induction of these genes at that time, including HOXA-AS2, which resides within a subdomain that progressively gains contacts during induction despite the fact that H3K27me3 continues to decrease (Fig. 5; Additional file 15: Figure S7c). Interestingly, this subdomain seemingly avoids any major interactions with neighboring genes throughout the time course (Fig. 5a-c). Considering how this area also persists in maintaining both H3K27me3 and SUZ12 after 3 days (3d), we propose that “looping out” of this region might somehow protect it from the rapid removal of PRC2. Meanwhile, at the other end of the cluster, the distal genes are kept away from the rearrangements occurring at the 3′ end – perhaps helped by the activity of CTCF bound between HOXA6 and HOXA7 [29] – as seen by the continuous loss of long-range contacts.
HOXA-AS2 as a modulator of HOXA cluster expression and organization
We wondered what role – if any – might the HOXA-AS2 lncRNA play during HOXA gene induction by RA. We probed this question by examining gene expression, chromatin landscape, and three-dimensional chromatin organization of the HOXA cluster region in RA-induced samples depleted of the lncRNA (Fig. 6). RNAi knockdown using an siRNA against the transcript’s 3′ end considerably reduced the accumulation of HOXA-AS2 during a 5-day RA treatment. We found that this drastically reduces the expression of all the proximal HOXA genes (Fig. 6a, b, c). 5C-seq analysis of matching knockdown samples revealed that curbing HOXA-AS2 accumulation led to greater contact frequencies within its own subTAD, and much fewer long-range interactions within the proximal region, perhaps contributing to the lower HOXA levels by distancing enhancers from genes (Fig. 6d, e, top; Additional file 13: 5C dataset 5, 6). Moreover, 2C-ChIP analysis shows that lower H3K4me3 levels at all promoters accompanied these changes without H3K27me3 changes at gene bodies (Fig. 6d, e, bottom; Additional file 6: BED file 37–40). Together, these results show that HOXA-AS2 is required to enhance expression of all the cluster’s RA-responsive genes by promoting high H3K4me3 levels at promoters, perhaps by controlling the cluster’s three-dimensional chromatin organization.