Unbiased chromatin accessibility profiling by RED-seq uncovers unique features of nucleosome variants in vivo

Background Differential accessibility of DNA to nuclear proteins underlies the regulation of numerous cellular processes. Although DNA accessibility is primarily determined by the presence or absence of nucleosomes, differences in nucleosome composition or dynamics may also regulate accessibility. Methods for mapping nucleosome positions and occupancies genome-wide (MNase-seq) have uncovered the nucleosome landscapes of many different cell types and organisms. Conversely, methods specialized for the detection of large nucleosome-free regions of chromatin (DNase-seq, FAIRE-seq) have uncovered numerous gene regulatory elements. However, these methods are less successful in measuring the accessibility of DNA sequences within nucelosome arrays. Results Here we probe the genome-wide accessibility of multiple cell types in an unbiased manner using restriction endonuclease digestion of chromatin coupled to deep sequencing (RED-seq). Using this method, we identified differences in chromatin accessibility between populations of cells, not only in nucleosome-depleted regions of the genome (e.g., enhancers and promoters), but also within the majority of the genome that is packaged into nucleosome arrays. Furthermore, we identified both large differences in chromatin accessibility in distinct cell lineages and subtle but significant changes during differentiation of mouse embryonic stem cells (ESCs). Most significantly, using RED-seq, we identified differences in accessibility among nucleosomes harboring well-studied histone variants, and show that these differences depend on factors required for their deposition. Conclusions Using an unbiased method to probe chromatin accessibility genome-wide, we uncover unique features of chromatin structure that are not observed using more widely-utilized methods. We demonstrate that different types of nucleosomes within mammalian cells exhibit different degrees of accessibility. These findings provide significant insight into the regulation of DNA accessibility. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1104) contains supplementary material, which is available to authorized users.


Background
Eukaryotic genomes are wrapped around histone octamers to form nucleosome arrays, which are further packaged into the nucleus. Although chromatin compaction facilitates storage of large quantities of DNA within small nuclear compartments, it drastically reduces the accessibility of genomic DNA to proteins that require access.
Nucleosomal DNA is relatively inaccessible to DNA binding proteins due to both the occlusion of approximately half of its surface by contacts with histones, as well as the distortion of the normal B-form structure that occurs when DNA is wrapped around a histone octamer [1]. Consequently, chromatin structure must be disrupted to facilitate normal cellular processes, such as DNA repair, recombination, replication, and transcription.
Although protection of DNA from nuclear factors by the formation of tight interactions with histones appears to be the major method by which DNA accessibility is regulated, many different isoforms of the histone octamer exist within most eukaryotes, each with distinct biochemical and biophysical properties [2][3][4][5][6][7][8]. These differences are mainly derived from two sources. First, most eukaryotes express several variants each of histones H2A and H3. Within each family, differences between variants can range from a few amino acid substitutions to the presence or absence of additional, non-histone domains at their amino-or carboxyl-termini. Second, all four core histone proteins are subject to a wide array of post-translational modifications, including acetylation, methylation, phosphorylation, ubiquitylation, and others. Several of these modifications and variants change the overall charge of the histone octamer and/or create or destroy binding sites for proteins, resulting in alterations in nucleosome stability [5,[9][10][11]. Together, these differences in nucleosome structure and stability conferred by histone variants and modifications raise the possibility that accessibility of nucleosomal DNA may not be a simple binary phenomenon in which nucleosome-bound DNA is completely protected and nucleosome-free DNA is completely accessible; rather, DNA within some variants of nucleosomes may be more accessible than DNA bound by other variants. For example, nucleosomes harboring histone variants H2A.Z and/or H3.3 are extractable from bulk chromatin at lower salt and, in some cases, protect smaller footprints of DNA from nucleases than canonical nucleosomes [6,[12][13][14], raising the possibility that DNA within certain nucleosome variants is more broadly accessible, due to either biophysical properties or dynamic behavior of these nucleosomes. However, this possibility remains to be directly tested in vivo.
Along with differences in chromatin structure within distinct genomic regions in individual cell types, cell typespecific chromatin structural differences facilitate gene expression patterns specific to cells of different lineages [15]. In embryonic stem cells (ESCs), chromatin structure is relatively open (less heterochromatic) compared to differentiated cells, which may be necessary for their ability to self-renew (proliferate as ESCs) while maintaining the flexibility to turn on lineage-specific genes during differentiation [16,17]. As ESCs differentiate, DNA accessibility decreases, chromatin becomes less dynamic, and larger blocks of heterochromatin form, suggesting that differentiation induced chromatin alterations may stabilize cell fates by "locking down" regions of the genome in heterochromatic blocks that are relatively insensitive to transcriptional activators.
Methods have been developed to study DNA accessibility based on either the protection of nucleosomal DNA from general endonuclease digestion or the differential solubility properties of open and closed chromatin. Deoxyribonuclease I (DNase I) [18,19] preferentially digests nucleosome-free DNA [20][21][22], and genomic regions that are more sensitive to DNase I digestioncalled DNase I hypersensitive sites (DHSs)can be identified by deep sequencing (DNase-seq) [23]. Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) is a second method to isolate accessible genomic regions, which uses organic extractions of formaldehyde cross-linked chromatin to enrich protein-free DNA fragments that are subsequently identified by microarrays (FAIRE-chip) [24] or high-throughput sequencing (FAIRE-seq) [25]. Consistent with the requirement of most transcription factors (TFs) for accessible binding sites on DNA, DHSs and FAIRE-seq peaks are enriched for regulatory regions of active genes (enhancers and promoters). Conversely, micrococcal nuclease digestion of chromatin followed by deep sequencing of the regions of DNA protected from digestion (MNase-seq) allows inference of the positions and occupancy levels of nucleosomes in a population (when footprints of~150 bp are quantified) and TFs (when footprints less than~80 bp are considered) [22,[26][27][28]. When compared to maps of nucleosome positions, both DNase-seq and FAIRE-seq tend to identify large nucleosome-depleted regions that range from 100-300 bp in length [29]. As a result, differences in DNA accessibility that occur within or close to nucleosomes, or quantitative differences in accessibility of individual nucleosomes, are difficult to detect by these methods.
In addition, for more than three decades, restriction enzymes (REs) have been utilized to probe DNA accessibility at individual loci [30][31][32][33][34]. Since REs digest DNA at specific nucleotide sequences known as restriction sites (RSs), REs can quantitatively probe cell type-specific differences in accessibility at individual positions, when combined with Southern blotting or PCR. The accessibility of chromatin to REs can, in principle, be quantified at any genomic location that harbors an RS, including DHSs, DNA sequences within nucleosomes, and linker regions within closely-spaced nucleosome arrays. Previously, Gargiulo et al. developed a genome-wide method to probe chromatin structure using restriction enzymes, finding that chromatin accessibility correlated broadly with gene expression in hematopoietic cell lineages and became progressively restricted during differentiation [35]. Here we modified this method to reduce potential biases in library production and increase the fraction of reads within a library that directly reflect RE cleavage. We employ this modified method, termed RED-seq, to measure RE accessibility across the genome of multiple cell types.
Here we show that, as with DNase-seq and FAIRE-seq, RED-seq uncovers known regions of open chromatin, validating the method as a genome-wide probe of chromatin accessibility. Furthermore, we find that RED-seq can quantify both large differences in chromatin accessibility between different cell types and subtle changes that occur during ESC differentiation, highlighting the sensitivity of the assay. However, unlike these methods, we find that RED-seq also identifies differences in accessibility within nucleosome arrays. Consequently, we uncover significant differences in accessibility between nucleosomes containing different histone variants, showing that DNA bound by nucleosomes containing H2A.Z or H3.3 are more accessible than the genome-wide average. Consistent with this model, RNAi-mediated depletion of factors required for H2A.Z or H3.3 deposition into chromatin results in reduction of accessibility at these sites. Therefore, these results provide in vivo evidence that DNA accessibility within nucleosomes is modulated by the composition of histone proteins.

Genome-wide measurement of chromatin accessibility by RED-seq
Due to the inherent biases of standard methods of measuring chromatin accessibility, such as DNase-seq and FAIRE-seq, toward nucleosome-free regions of DNA, these methods are not well suited to examination of chromatin accessibility in the vast majority of the genome found within nucleosome arrays. A prior RE-based method of probing chromatin accessibility genome-wide (called NA-Seq) revealed that accessibility of regulatory regions of genes correlated with their gene expression patterns [35]. We therefore wished to examine the accessibility of ESC chromatin using REs, in order to probe regions of open chromatin structure that are well covered by DNase-seq and FAIRE-seq maps (to assess whether REs faithfully report known features of ESC chromatin structure), as well as examine chromatin accessibility within nucleosomes and between nucleosomes that lie within regularly-spaced nucleosome arrays.
NA-Seq was previously performed by exposing purified nuclei to REs, secondary digestion of the purified DNA with an additional RE, ligation of linkers, and 454 pyrosequencing [35]. We modified the NA-Seq method in several ways ( Figure 1A): First, we performed RE digestion on permeabilized cells without nuclear purification in order to reduce processing steps prior to chromatin digestion by REs. Second, we used an unbiased, sonication-based shearing approach after DNA purification to reduce potential biases in the library introduced by the genomic distribution of the restriction sites (RSs) specific for the post-DNA purification RE used in NA-Seq. Finally, we used two separate linker ligation steps to ensure that single-read Illumina sequencing would sequence the end of each DNA fragment cleaved by the RE (rather than the randomly sheared end), making nearly all mapped reads informative, rather than about half. We refer to this modified method as RED-seq to distinguish this modified protocol from the previous NA-Seq approach.
In principle, any RE or combination of REs could be used for RED-seq library preparation. We utilized Sau96I, an RE with a four base RS (GGNCC) that occurs frequently throughout the mouse genome and is abundant within gene regulatory sequences, in order to probe genomewide accessibility at relatively high resolution. First, we compared the differences in RE accessibility between mouse ESC chromatin and naked DNA. Because chromatin and naked DNA have identical RSs, differences in RE accessibility should result directly from the influences of chromatin proteins on accessibility at each RS (e.g., nucleosome occupancy or binding of non-histone proteins). Indeed, naked DNA was more efficiently cleaved and the digestion products were more uniformly distributed compared to ESC chromatin ( Figure 1B), as expected. Next, we prepared sequencing libraries of ESC and naked DNA samples, to quantify the digestion frequency at each Sau96I RS in the genome, and sequenced the libraries. The enrichment within the sequence reads of the expected product of Sau96I digestion (GNCC) immediately following the adapter barcode confirmed the quality of the libraries ( Figure 1C).
We developed a software package (also named REDseq; available as a Bioconductor package) to assign each read to a unique RS in the mouse genome (see Methods for details), and count the relative cut frequency per site corresponding to normalized read counts assigned to each RS. As we observed by electrophoresis of digested naked DNA or chromatin ( Figure 1B), average RE accessibility, as measured by relative cut frequency per RS, was reduced in the chromatin library relative to naked DNA at most sites ( Figure 2A). As expected, due to the fact that cutting frequency at each RS was normalized to total reads in each library, we observed fragments derived from some RSs that were more abundant in the chromatin library than the naked DNA library. In addition, cleavage within the naked DNA library was not uniform at all RSs (Figure 2A), likely due to the fact that fragments generated by two Sau96I cleavages within close proximity are selected against during library preparation, which eliminates small DNA fragments. This is less of a concern in chromatin samples, in which cleavage at most RSs is suppressed. Furthermore, we did not observe a strong correlation between the reads from chromatin DNA and naked DNA (R = 0.376), confirming that the degree of RE digestion at most sites was different between chromatin and naked DNA ( Figure 2B). Thus, RED-seq accurately reflects inhibition of RE accessibility by the presence of chromatin in vivo.
Active genes and nucleosome-free regions are highly accessible RE accessibility in promoter-proximal regions is usually correlated with gene expression [36][37][38]. Homeobox (Hox) genes encode key developmental TFs that are not expressed in ESCs [39]. We observed low levels of RE accessibility around Hox genes relative to surrounding regions and normalized naked DNA reads ( Figure 2C). In contrast, for genes that are highly expressed in ESCs (Oct4, Eef1a1), RE accessibility was elevated within upstream regulatory regions and surrounding transcriptional start sites (TSSs) ( Figure 2D). Overall, these results showed that enhanced RE accessibility was generally associated with transcriptional activity, consistent with previous data.
DNase I is frequently used to identify open chromatin/ nucleosome-free regions of the genome, and many gene regulatory elements are hypersensitive to DNase I [21,22,40,41]. Therefore, we next examined the frequency of RED-seq reads surrounding annotated DHSs in ESCs. Since RSs are non-uniformly distributed throughout the genome, we compared RE accessibility averaged over all DHSs to average RS density to test whether DHSs were generally accessible or inaccessible. We found that RE accessibility over DHSs was strongly enhanced relative to the RS density surrounding these regions ( Figure 3A). Similar results were observed in RED-seq maps of ESCs that combine Sau96I and a second RE, DdeI, validating these results (Additional file 1). Furthermore, our reanalysis of published NA-seq data from human NB-4 leukemia cells [42] revealed a similar pattern at DHSs, further confirming these results (Additional file 2). DHSs are typically nucleosome-depleted and highly transcribed, relative to DNase I-insensitive regions [21,22,40,41]. Therefore, we compared our RED-seq data to nucleosome occupancy maps previously obtained by deep sequencing of nucleosome-sized DNA fragments protected from digestion by micrococcal nuclease (MNase-seq) [43], and found that nucleosomes were strongly depleted over DHSs ( Figure 3B), consistent with the higher RE accessibility we observed.
Next, we compared RE accessibility surrounding the binding sites of two key TFs in ESCs. CTCF is a sequencespecific insulator binding protein with important roles in regulation of imprinted gene expression [44,45] and higher-order chromatin structure [46]. RE accessibility was enriched within the regions surrounding CTCF ( Figure 3C, Additional files 1 and 2). As previously reported [47,48], CTCF binding sites are depleted of nucleosomes, with well-positioned nucleosomes flanking the nucleosome-free regions ( Figure 3D), explaining the higher accessibility we observed at these sites. Interestingly, for highly abundant nucleosome-free regions such as CTCF binding sites and DHSs, RED-seq also revealed nucleosome phasing around nucleosome-depleted regions, with smaller phased peaks of RE accessibility found within linker regions ( Figure 3E-F). Since the majority of internucleosomal linkers are relatively small (averaging approximately 30 bp in ESCs [49], this phasing is not apparent using DNase-seq [29] which is specialized for identification of long stretches of nucleosome-free DNA ( Figure 3E-F). Together these results show that while the resolution of RED-seq at the level of individual loci is variable and depends on the frequency of RSs at each locus, when averaged over thousands of loci RED-seq not only identifies large nucleosome-free regions identified by DNase-seq, but can also probe DNA linker regions within nucleosome arrays.

Remodeling of chromatin accessibility during differentiation
ESC chromatin structure is relatively dynamic and is depleted of large blocks of heterochromatin, unlike many differentiated cell types, suggesting that major alterations in chromatin structure that accompany cellular differentiation may be important for lineage commitment [16]. To study chromatin accessibility during differentiation, we first tested whether RED-seq could identify distinct RE accessibility patterns in different cell types by comparing chromatin accessibility in ESCs and mouse embryonic fibroblasts (MEFs). We found that, in MEFs, nucleosome occupancy was increased and RE accessibility decreased at ESC-specific DHSs ( Figure 4A-B), consistent with the widespread differences in chromatin structure and gene expression between these two cell types. As with DHSs, RE accessibility at sites of CTCF binding in ESCs was  Figure 4E). Finally, we examined RE accessibility within regions surrounding TSSs in both cell types. TSS-proximal regions of actively transcribed genes are usually nucleosome-depleted and the degree of nucleosomedepletion correlates with transcriptional activity at many genes. As expected, RE accessibility was higher in ESCs than in MEFs surrounding the TSSs of genes that were highly expressed in ESCs ( Figure 4F), whereas genes highly expressed in MEFs were generally more accessible in MEFs ( Figure 4G). These data confirmed that RED-seq could identify differences in chromatin accessibility between two distinct cell types that reflected differences in TF binding and gene expression.
Next, to test whether we could observe more subtle changes in chromatin structure during cellular differentiation, we differentiated ESCs by RNAi-mediated knockdown (KD) of the ESC pluripotency TF Oct4. We chose this differentiation model since, unlike most other methods of differentiation that generate heterogeneous mixtures of many different cell types from all three germ layers, Oct4 KD robustly induces trans-differentiation to trophectoderm specifically [50]. Consistent with previous reports [50], Oct4 KD promoted ESC differentiation to cells with trophoblast morphology ( Figure 5A-B). Using RED-seq, we found that RE accessibility was decreased upon Oct4 KD near ESC DHSs and CTCF binding sites ( Figure 5C, E). Although the reduction in DNA accessibility upon Oct4 KD was not as severe as in MEFs, we also observed slightly increased nucleosome occupancy by MNase-seq upon Oct4 KD at ESC DHSs and CTCF binding sites ( Figure 5D, F), consistent with the decrease in RE accessibility that we observed in these regions.
To validate these results, we used quantitative PCR (qPCR) to determine the fraction of uncut (protected) DNA after RE digestion, probing several ESC DHSs and CTCF binding sites. Consistent with the RED-seq results, higher levels of uncut DNA were observed upon Oct4 KD at most sites tested ( Figure 6A-B). Furthermore, we tested CTCF binding at the same regions by ChIP-qPCR, and observed a reduction in binding upon Oct4 KD wherever chromatin accessibility decreased, whereas control CTCF binding sites that showed no difference in accessibility upon Oct4 KD showed no decrease in CTCF binding ( Figure 6C). These data indicate that CTCF binding and RE accessibility are inter-dependent. Next, we observed that RE accessibility surrounding the binding sites of the ESC TF Klf4 was also reduced upon Oct4 KD ( Figure 6D), with concomitant increases in nucleosome occupancy over these sites ( Figure 6E). Finally, we found the alterations in accessibility we observed over DHSs, CTCF binding sites, and Klf4 binding sites were consistent in two biological RED-seq replicates from each KD ( Figure 6F), further validating these results. These results suggest that, during differentiation, many enhancers that are protected from nucleosome deposition in ESCs (presumably by TF binding) become occupied by nucleosomes, leading to decreased RE accessibility. Taken together, RED-seq not only detects large differences in chromatin accessibility between distinct cell types (ESCs vs MEFs) but also tracks more subtle changes that occur during differentiation (control vs Oct4 KD ESCs).

Altered accessibility of nucleosomes harboring distinct histone variants
Genomic regions that are dynamic (i.e. experience relatively rapid exchange of chromatin proteins) are frequently marked with specific histone modifications and/or histone variants [51]. However, using traditional methods such as DNase-seq or FAIRE-seq, it is difficult to identify differences in chromatin accessibility that correlate with the presence of dynamic nucleosomes, because these regions are not nucleosome-free. In principle, RED-seq does not share these limitations, due to the fact that a single RE cleavage is all that is necessary for inclusion in a RED-seq library ( Figure 1A). Therefore, we examined the accessibility of regions enriched for dynamic histone variants/modifications using RED-seq.
To establish a baseline for the examination of different types of nucleosomes, we first determined the average accessibility of a random distribution of nucleosomes across the genome. To this end, we randomly selected 1% of all nucleosomal footprints from an MNase-seq library prepared from ESCs, and plotted the average RED-seq and MNase-seq profiles within a 2 kb window surrounding their positions. Consistent with the fact that nucleo some-bound DNA is relatively inaccessible to nuclear factors, we observed a low level of RE accessibility surrounding the peak of bulk nucleosomes, relative to RS density ( Figure 7A). Therefore, as expected, nucleosomefree DNA, like that underlying DHSs and TF binding sites, is generally more accessible than nucleosomal DNA.
Next, we tested whether the accessibility of nucleosome variants that harbor particular histone modifications or histone variants were identical to that of bulk nucleosomes. The two nucleosomes surrounding TSSs (referred to as +1 and -1 nucleosomes) are frequently marked by histone variants H2A.Z and H3.3 [6,[12][13][14]. Nucleosomes harboring these variants have been found to be extractible from chromatin at lower salt than is required for canonical nucleosomes [6,12], raising the possibility that they may be more highly accessible in general. H2A.Z is enriched surrounding the TSSs of many eukaryotic genes, and also found within active enhancers in mammalian cells [52]. Furthermore, H2A.Z-marked nucleosomes protect smaller footprints of DNA than canonical nucleosomes, in support of the hypothesis that that these nucleosomes are more intrinsically accessible [14]. In ESCs, H2A.Z is found near approximately 84% of all TSSs, including those of many silent genes [53]. Interestingly, we observed increased RE accessibility over the center of the H2A.Z peaks relative to both RS density and surrounding regions ± 1 kb from the peaks of H2A.Z enrichment ( Figure 7B), suggesting that H2A.Z-containing nucleosomes are generally more accessible than canonical nucleosomes. Next, we examined H3.3, which is enriched near the TSSs of both active and silent genes, as well as within gene bodies of highly expressed genes, and is incorporated into chromatin in a replication-independent manner [54][55][56]. Like H2A.Z, we found that RE accessibility over H3.3 peaks was elevated relative to RS density ( Figure 7C). These data suggest that DNA wrapped around H2A.Zand H3.3-marked nucleosomes is more accessible than DNA found within the majority of nucleosomes genomewide that lack these histone variants.
We considered the possibility that the elevated RE accessibilities observed over peaks of H2A.Z enrichment and broad regions surrounding H3.3 were due to reduced nucleosome occupancy at these sites. However, while the average occupancies of H2A.Z-and H3.3- containing nucleosomes were slightly lower than bulk nucleosomes (compare the peak heights in Figure 7D-F), these modest differences are insufficient to account for the greater than 5-fold increase in accessibility observed over H2A.Z and H3.3 peaks observed by RED-seq.
To validate these data, we examined chromatin accessibility upon KD of factors necessary for incorporation of H2A.Z or H3.3 into chromatin. In mammals, H2A.Z is incorporated into chromatin in part by p400 (gene name: Ep400), a homolog of the yeast Swr1 ATPase, whereas H3.3 incorporation depends in part on the HIRA (Hira) histone chaperone [57,58]. We tested whether the enhanced chromatin accessibility observed at sites of H2A.Z and H3.3 deposition was reduced upon depletion of their respective loading factor, and found that the elevated accessibility we observed within regions of H2A.Z and H3.3 enrichment was partially lost upon Ep400 KD or Hira KD, respectively ( Figure 8A-F). When we examined alterations in chromatin accessibility upon Ep400 or Hira KD over a random sampling of nucleosomes (as in Figure 7A), we observed only a modest decrease in accessibility, suggesting that the effects of Ep400 or Hira KD are specific for nucleosomes containing H2A.Z or H3.3 ( Figure 8G). Finally, we examined changes in chromatin accessibility due to Ep400 or Hira KD over CTCF binding sites, due to the reported enrichment of H2A.Z-and H3.3-containing nucleosomes surrounding CTCF [13]. Interestingly, while Hira KD resulted in significantly reduced accessibility over CTCF binding sites, Ep400 KD did not ( Figure 8H), suggesting that either H3.3 plays a more important role than H2A.Z in regulation of chromatin structure near CTCF binding sites or that H2A.Z is incorporated into chromatin at these sites independently of p400. We observed consistent differences in accessibility over H2A.Z, H3.3, and CTCF binding sites in biological replicate KDs of Ep400, Hira, and Hira, respectively ( Figure 8C, F and I), validating these data. Together, these results suggest that H2A.Z-and H3.3-containing nucleosomes are either more dynamic or more intrinsically accessible than canonical nucleosomes, consistent with their association with gene regulatory sequences.

Discussion
Utilizing an adaptation of a decades-old, quantitative technique for probing chromatin accessibility, we probed the chromatin structure of ESCs and differentiated cells, observing differences in chromatin accessibility in distinct regions of the genome, as well as in different cellular states. We found that both the level of nucleosome occupancy and the presence of specific histone variants at individual loci affected the level of chromatin accessibility we observed at each site.
Over the past several years, DNase-seq and FAIRE-seq have been used to identify regions of open chromatin structure within cells. One limitation of these methods is that only nucleosome-depleted regions of DNA are typically identified. Interestingly, while RED-seq identified nucleosome-depleted regions as well, we also observed differences in chromatin accessibility within nucleosomes that harbor specific histone variants, detecting increased RE accessibility in genomic regions enriched for histones H2A.Z and H3.3. Therefore, unlike previous methods, RED-seq not only measures general chromatin "openness" but also identifies highly dynamic regions of the genome, even if they are not nucleosome-free. We believe that this featurethe ability to quantify accessibility of DNA within nucleosome-bound regionsbest distinguishes RED-seq from complementary approaches such as MNase-seq and DNase-seq, which do not probe intranucleosomal accessibility.
The increased accessibility of DNA within H2A.Z-and H3.3-containing nucleosomes is due to the histone variants themselves rather than some unrelated feature of chromatin structure within these regions of the genome, since depletion of H2A.Z and H3.3 loading factors strongly reduced the accessibility of the underlying DNA. Although H2A.Z and H3.3 are also enriched near TSSs, these histone variants are also found within multiple other genomic domains. Indeed, we find that accessibility over CTCF binding sites was reduced upon KD of the H3.3 deposition factor, Hira, suggesting that H3.3 incorporation within nucleosomes surrounding CTCF binding sites may be important for CTCF binding and/or function.
Chromatin structure is dramatically altered during cellular differentiation. By examining regions of the genome enriched for histone modifications, TFs, or chromatin regulators, RED-seq could identify differences in chromatin structure within functionally distinct regions of the genome during ESC differentiation. We found that RE accessibility decreased at many CTCF binding sites upon Oct4 KD and that this decrease correlated with a decrease in CTCF occupancy and an increase in nucleosome occupancy. These differences were even more apparent when comparing ESCs with MEFs. Together, these results suggest that loss of TF binding during differentiation is coincident with deposition of nucleosomes at these sites, leading to loss of chromatin accessibility.
Besides chromatin structure, restriction enzymes have been widely used in biological assays for single nucleotide polymorphisms (SNPs) [59,60] and DNA methylation [61] at individual loci, by virtue of their inhibitory effect on RE cleavage. Therefore, a genome-wide method to directly quantify differences in RE cleavage would be highly desirable in these assays. Our method of directly purifying RE-digested sequences and quantifying RE cleavage at each site by high-throughput DNA sequencing could be (G) Effects of Ep400 or Hira KD on average nucleosome accessibility shown by plotting RED-seq data over the same 320,135 randomly selected nucleosomes as in Figure 7A. (H) Effects of Ep400 or Hira KD on chromatin accessibility over CTCF binding sites, as in Figure 3C. (I) Average accessibilities over CTCF-binding sites were quantified for biological replicate experiments from -200 to +200 bp with respect to the peak of CTCF-binding. P-values indicating statistical significance of accessibility between EGFP and Hira KDs are indicated. easily adapted to perform these types of studies. Thus, we believe that RED-seq will be a valuable tool for not only the measurement of chromatin accessibility and dynamics, but also the study of any other phenomena that alter RS cleavage by REs.

Conclusions
We developed RED-seq, an unbiased probe of chromatin accessibility, and utilized this technique to probe chromatin structure genome-wide in mouse ESCs and differentiated cells. Unlike more widely used methods that positively identify broad domains of open chromatin structure, RED-seq not only identifies open chromatin domains, but also uncovers differences in DNA accessibility within the vast majority of the genome that is not found within a large nucleosome-free region. By examining the accessibility of DNA wrapped within distinct nucleosome variants, we found that H2A.Z-and H3.3-containing nucleosomes were more accessible than the genomic average, providing in vivo evidence that these nucleosomes may be more dynamic than canonical nucleosomes. Therefore, RED-seq provides unique insights into chromatin structure that are missed by more widely utilized approaches.

Cells
The murine ESC line used in this study was E14 [62]. Mouse embryonic fibroblasts (MEFs) used in this study were immortalized by serial passaging, following a 3 T3 protocol, to minimize day-to-day differences in these cells due to their passage number. Mice used in derivation of MEFs were housed in a specific pathogen-free facility at the University of Massachusetts Medical School, and all experiments were performed in strict accordance with the recommendations of the Institutional Animal Care and Use Committee at the University of Massachusetts Medical School (approval #2165-13).

Preparation of RED-seq libraries
One million cells were used to construct RED-seq libraries. Cells were washed, pelleted, and resuspended in swelling buffer (10 mM Tris pH8.0, 85 mM KCl, 0.5% NP-40, 10 mM MgCl 2 ) with 100 units of Sau96I (NEB) and incubated in a thermomixer (Eppendroff) at 37°C for 1 hour, shaking at 900 rpm. (For testing whether two REs might increase coverage, in one experiment 100 units of Sau96I and 50 units of DdeI were used in digestion). Digestion was terminated by adding 40 μl of 10% SDS and 20 μl of 0.5 M EDTA and the chromatin was treated with proteinase K (Ambion) overnight at 55°C. Digested DNA was purified using phenol/chloroform/isoamyl alcohol extractions and precipitated at -80°C for 1 hour. Digested DNA samples were end-repaired and A-tailed as described [63], and ligated with biotinylated and barcoded adaptors. DNA was purified using Zymo Research DNA clean and concentrate columns following each enzyme reaction. The biotin-adaptor ligated DNA was sonicated in a Covaris sonicator (S220) to generate DNA peak fragments of 200 bp, on average. The sonicated DNA samples were then end-repaired, A-tailed, and ligated with nonbiotinylated adaptors. The ligation samples were loaded on 2% agarose gel and DNA was purified within a size range of roughly 200-350 bp in length. Gel-purified DNA was diluted to 250 μl with streptavidin binding buffer (20 mM HEPES pH 7.6, 500 mM NaCl, 1 mM EDTA, 0.02% NP-40) and incubated with 30 μl of pre-washed streptavidin magnetic beads (NEB) at room temperature for 1 hour. After magnetic separation, the supernatants were removed, and the beads were washed additional three times with streptavidin binding buffer. DNA was eluted from streptavidin magnetic beads by adding 20 μl of 0.1X TE and heating at 60°C for 3 minutes. The elution was repeated three times. The adaptor-ligated material was then PCR amplified with Phusion polymerase (NEB) using 16 cycles of PCR and its concentration was determined using a NanoDrop (Thermo). The integrity of each library was confirmed by sequencing 10-20 individual fragments per library. Libraries with different barcodes were pooled together and single-end sequencing (50 bp) was performed on an Illumina HiSeq2000 at the UMass Medical School deep sequencing core facility.
For most RED-seq libraries (GFP, Oct4, Ep400 and Hira KD), we added one further modification in which the sequence of the biotinylated adapters and the second, non-biotinylated, adapters were modified such that after PCR amplification of the libraries, only the end that was ligated to the biotinylated adapter would be sequenced in a single-end sequencing run (Additional file 3). Although this alteration makes the data analysis slightly simpler, the two methods provide essentially identical results.

Preparation of MNase-seq libraries
MNase-library preparation was adapted from Henikoff et al. [27]. Formaldehyde cross-linked cells were pelleted and washed twice with PBS. Cell pellets were resuspended in MNase lysis buffer (10 mM Tris pH 7.5, 10 mM NaCl, 3 mM MgCl 2 , 0.5% NP-40, 1 mM CaCl 2 , and protease inhibitors) and treated with 10 units/10 6 cells of microccocal nuclease (Roche) for 5 minutes at 37°C. The reaction was stopped with 10 mM EDTA. Nuclei were then incubated with RNaseA (Ambion) for 4 hours at 4°C with rotation followed by incubation with proteinase K (Ambion) overnight at 55°C. DNA was then isolated by Phenol: Chloroform:Isoamyl Alcohol (PCI) and EtOH precipitation. Equal MNase digestion was confirmed by examining DNA size fragments through electrophoresis on a 2% agarose gel and through bioanalyzer analysis. After phosphatase (NEB) treatment, digested DNA was endrepaired and A-tailed, with PCI extraction and EtOH precipitation following each enzyme reaction. Adaptors were ligated and DNA was size selected using Agencourt Ampure XP beads (Beckman Coulter), as previously described [27]. Equal library sizes were confirmed through electrophoresis on a 2% agarose gel and through bioanalyzer analysis. Sequencing of 10 fragments per library confirmed the integrity and libraries were sent for paired-end (100 bp) high throughput sequencing using an Illumina HiSeq at the UMass Medical School sequencing facility. Reads were mapped to the mouse genome (mm9) using Bowtie2 and uniquely mapped reads were used for further analysis.

Data analysis Assignment of reads to individual RSs
Sequence reads were binned according to the 4 bp barcode present at the beginning of each sequence using a custom Perl script. Sequences with barcodes removed were mapped to the mouse genome (mm9) using Bowtie-0.12.7 [64] with parameters set as -n 2 -l 28 -M 1 -best -strata (i.e. uniquely mapped with at most 2 mismatches at the left 28 bp seed region). Assignment of aligned sequences to individual restriction enzyme cut sites (REs) and differential cut analysis were performed using the Bioconductor package REDseq, developed by us. The ChIPpeakAnno package [65] was used to annotate the differentially cut sites to the nearest genes. Surprisingly, we found that the GGTCC sequence was cleaved more efficiently by Sau96I than GGACC, GGGCC, or GGCCC in digestions of chromatin or naked DNA control samples. This altered specificity may be due to the different buffer conditions used for digestion of chromatin (which are optimized for permeabilization of cells) relative to the optimal buffer conditions for Sau96I digestion recommended by the manufacturer. However, this phenomenon was observed in all samples, independent of cell type or KD, and therefore does not affect any comparisons of accessibility.

Aggregation of RED-seq data at specific genomic regions
Data for DNase I hypersensitive sites was downloaded from mouse ENCODE Project (UCSC). ChIP-seq data for H2A.Z (GSE34483), H3.3 (GSE16893), H3K4me3 (GSE12241) were downloaded from GEO datasets (NCBI) and analyzed in HOMER software suite [66]. The MNaseseq data in ESCs was obtained from Carone et al. [43]. The enrichment regions were identified by using the "find-Peaks" command in HOMER with default setting (1. fold enrichment over local tag count, default: 4.0. 2. Poisson p-value threshold relative to local tag count, default: 0.0001 3. False discovery rate, default = 0.001). For the binding sites of different TFs (CTCF and Klf4) in ESCs, the enriched regions were obtained from GEO datasets (GSE11431) and converted to mm9 by LiftOver (UCSC Genome Bioinformatics Group).

Calculation of restriction enzyme accessibility
RED-seq data was processed in HOMER by using "annota-tePeaks" command to bin the regions of interest in 50 bp windows and sum the reads within each window. Average RE accessibility was calculated by normalizing the reads in each window to total reads, dividing by the number of regions of interest, and presented in reads per million. To calculate the genome-wide distribution of restriction enzyme sites, we manually assigned one read to each site and calculated average RE accessibility as mentioned above.

Measurement of restriction enzyme accessibility at individual loci
DNA from RE-digested chromatin was prepared as above, up to the first DNA purification step (prior to library preparation). DNA was resuspended in 50 μl of 0.1X TE and 10 ng of DNA subjected was to quantitative PCR (qPCR) using SYBR FAST universal reagents (KAPA Biosystems) with specific primers (Additional files 4 and 5) flanking RSs of interest.

Chromatin immunoprecipitation
ChIP samples were prepared as described [69]. Briefly, chromatin from GFP or Oct4 KD ESCs was crosslinked, lysed and sonicated to generate 300-1000 base-pair fragments. 50 μl of Protein A Magnetic beads (NEB) were washed twice with PBS containing 5 mg/ml BSA and 10 μl of anti-CTCF antibody (Millipore) was coupled in 500 μl PBS with 5 mg/ml BSA overnight at 4°C. Immunoprecipitation was performed with antibody-coupled beads and sonicated supernatants in ChIP buffer (20 mM Tris-HCl pH8.0, 150 mM NaCl, 2 mM EDTA, 1% Triton X-100) overnight at 4°C. Magnetic beads were washed twice with ChIP buffer, once with ChIP buffer including 500 mM NaCl, 4 times with RIPA buffer (10 mM Tris-HCl pH 8.0, 0.25 M LiCl, 1 mM EDTA, 0.5% NP-40, 0.5% Na⋅Deoxycholate), and once with TE buffer (pH 8.0). Chromatin was eluted twice from washed beads by adding elution buffer (20 mM Tris-HCl pH 8.0, 100 mM NaCl, 20 mM EDTA, 1% SDS) and incubating for 15 minutes at 65°C. Crosslinking was reversed at 65°C for 6 hr and RNase A/T1 (Ambion) was added for 1 hr at 37°C followed by proteinase K (Ambion) treatment overnight at 50°C. ChIP-enriched DNA was purified using Phenol/Chloroform/Isoamyl alcohol extractions in phase-lock tubes. Then, chromatin was analyzed by qPCR as described above, using primers specific for CTCF sites of interest (Additional file 5).

Data access
The genome-wide data sets generated in this study can be obtained from GEO [GEO:GSE51821].