Isolation of genomic DNA and archaeal nucleosomes assembled in vivo
Cells from exponentially growing cultures of M. thermautotrophicus, T. kodakarensis TS517 (ΔpyrF; Δ trpE::pyrF; Δ TK0664), LC124 (Δ pyrF; ΔtrpE::pyrF; Δ TK0664; Δ TK1413) and LC125 (Δ pyrF; Δ trpE::pyrF; ΔTK0664; ΔTK2289)  were harvested by centrifugation, flash frozen and genomic DNA preparations isolated from aliquots of these cells as previously described [36, 40]. The remainder were ruptured by grinding in frozen micrococcal nuclease (MN) buffer [50 mM Tris (pH 8), 1 mM CaCl2, 100 mM NaCl], and the lysates allowed to thaw at 4°C. Aliquots were incubated with MN (1 U/μl) at 37°C, and the nuclease digestion then terminated, after increasing periods of digestion, by addition of 250 mM EDTA, 1% SDS, 200 mM NaCl. Following incubation with RNase A (10 mg/ml) for 60 min at 42°C, the DNA molecules that remained were purified by phenol:chloroform extraction, concentrated by ethanol precipitation, and separated by electrophoresis through 3.5% NuSieve agarose gels (Fisher Molecular Biology, Trevose, PA) or 6% polyacrylamide gels. Gel fragments that contained DNA molecules with ~60 bp lengths were excised, crushed and the DNA molecules eluted by incubation overnight at 37°C in 300 mM sodium acetate, 1 mM EDTA (pH 8), 0.1% SDS. The DNA molecules were concentrated by ethanol precipitation, and prepared for sequencing (see below).
Archaeal histone gene cloning, expression and purification of recombinant HTkA and HTkB
The genes, TK1413 and TK2289, that encode HTkA and HTkB respectively in T. kodakarensis TS517 , were PCR-amplified and cloned into plasmid pQE-80 (Qiagen, Valencia, CA) generating plasmids pTS600 (TK1413) and pTS601 (TK2289) that were transformed into Escherichia coli Rosetta 2 (EMD-Millipore, Billerica, MA). Cultures of the transformants were grown to the late exponential phase in LB medium that contained 100 μg ampicillin/ml and 30 μg chloramphenicol/ml at 37°C, and recombinant HTkA or HTkB synthesis was then induced by adding isopropyl β-D-1-thiogalacto-pyranoside (500 μM final concentration) and continued incubation for 3 h at 37°C. The cells were harvested by centrifugation, resuspended (0.33 g wet cell pellet/ml) in 25 mM Tris–HCl (pH 7), 0.1 mM EDTA, 50 mM NaCl, lysozyme (100 μg/ml) added and the mixtures held ice for 30 min. Phenylmethanesulfonyl fluoride (Sigma, St. Louis, MO) was added (100 μg/ml) and cells were ruptured by repeated passage through a French press. The lysates were clarified by centrifugation at 4°C (60,000 g, 20 min), MgCl2 (5 mM) and DNase I (40 μg/ml) added, the mixtures incubated for 1 h at 37°C and then at 85°C for 20 min. Following further centrifugation (60,000 g, 30 min, 4°C), the supernatants generated were loaded onto 5 ml Hi-Trap heparin columns (GE Healthcare; Pataskala, OH). Recombinant HTkA and HTkB were eluted by passage of 10 column volumes of linear 50 to 500 mM, and 200 to 700 mM gradients of NaCl, respectively, dissolved in 25 mM Tris–HCl (pH 7). The eluate fractions that contained HTkA or HTkB were identified by Commassie-staining of the proteins in samples of the fractions separated by electrophoresis through 22% (w/v) denaturing polyacrylamide gels. These fractions were combined and the protein solution concentrated (final volume of ~0.5 ml) by centrifugation through a pre-rinsed Vivaspin 6 centrifugal concentrator (5 K molecular weight cut off; Sartorious AG, Bohemia, NY). The solutions were adjusted to contain 600 mM NaCl in 25 mM Tris–HCl (pH 7) and then passaged through Sephacryl S-100 HR 16/40 column (GE Healthcare) at a flow rate of 0.5 ml/min. Fractions that contained HTkA or HTkB, identified by Commassie-blue staining after electrophoresis of aliquots through 22% denaturing polyacrylamide gels, were pooled and concentrated (final volume of ~2 ml) by centrifugation again through pre-rinsed Vivaspin 6 centrifugal concentrators (5 K molecular weight cut off). These proteins solution, >99% purified archaeal histone, were dialyzed against in 25 mM Tris–HCl (pH 7), 500 mM NaCl, 50% (v/v) glycerol, and stored at −20°C.
Purification of eukaryotic histones
Chicken histone octamers were purified from erythrocytes by salt extraction and by hydroxyapatite column chromatography as previously described .
Archaeal and eukaryotic nucleosome assembly in vitro
Eukaryotic nucleosomes were assembled in vitro by salt dialysis in 200 μl reaction mixtures that contained 50 μg of genomic DNA and 30 μg of chicken histone octamers . Archaeal nucleosomes were reconstituted by mixing 50 μg of genomic DNA with 30 μg archaeal histone tetramers. The complexes formed were dialyzed into MN digestion buffer, and aliquots containing ~2.5 μg of DNA were incubated with 0.1 U MN/ μl for 5 min at 37°C. The MN digestions were stopped by addition of 125 mM EDTA, 200 mM NaCl, and the DNA molecules remaining were isolated by phenol:chloroform extraction, concentrated by ethanol precipitation and separated by electrophoresis through 6% polyacrylamide or 3.5% NuSieve agarose gels. Gel fragments that contained the ~60 bp, or ~147 bp, DNA molecules protected from MN digestion by incorporation into archaeal or eukaryotic nucleosomes, respectively, were excised and the DNA molecules extracted, purified and prepared for DNA sequencing as described above.
ABI SOLiD sequencing of DNA fragments
The ends of the ~60 bp and ~147 bp DNA fragments were repaired and 5’-phosphorylated by incubation in DNATerminator end-repair kits, as recommended by the manufacturer (Lucigen Corp., Middleton, WI). SOLiD adapters were ligated and the DNA molecules PCR amplified (very low cycle number) and sequenced by using the Applied Biosystems protocol for SOLiD fragment paired-end sequencing . Sequencing generated from 2 to 12 million unique reads which, depending on the experiment, equated to 60- to 800-fold coverage per 60 bp or 147 bp nucleosome footprint.
Analysis of DNA reads generated by pair-end sequencing
We first selected reads of length 55–65 bp (nucleosome of 60 bp lengths) to construct the center-weighted nucleosome occupancy scores. If a read length was odd, a Gaussian weight of exp(−0.5 * (d/10)2) was assigned to a position d bp away from the center of the read for d ≤ 25. If a read length was even, then positions i − 1 and i were treated as the possible nucleosome centers. For example, for a 60 nucleotide sequence i = 31, and so the two potential centers were at positions 30 and 31. Each center in an even read was, in turn, assigned a weight of 0.5 * exp(−0.5 * (d/10)2) for a position d bp away from the center and the values for both positions were then divided by 2. The center-weighted occupancy score for any given position was defined as the aggregation of the weighted scores from all reads. We identified well-defined peaks on the reads occupancy-curve as putative nucleosome centers by controlling the peak height and steepness simultaneously. To generate AA/AT/TA/TT frequency plots, after defining the nucleosome center positions based on the peaks of center-weighted occupancy score, dinucleotide frequency scores were computed as described by Segal et al.. We searched for a sequencing tag of length 60 bp nearest to the peak position in the +/−5 bp region. If no such read existed, we further searched for reads of lengths 61, 59, 62 and 58 bp sequentially within +/− 5 bp region of the peak until the first read was identified. The center of the identified read was treated as the nucleosome center to generate the AA/TT/TA/AT frequency plot. If no such read was identified in the +/−5 bp region, the peak position was treated as the true nucleosome center to generate the alignment. For paired-end MNase sequencing data for 147 bp long nucleosomes, read lengths of 137–157 bp were used. We followed a similar approach as described above and also employed by Brogaard et al. [21, 65] to identify the nucleosome centers.
Analyses of the DNA reads generated by single-end sequencing
For the single-end reads with known start position on the Watson strand, their end positions are unknown. However, since the DNA inserts are mapping nucleosomes, their length must be subject to the constraint of being around one nucleosome repeat length. Thus, if we observe a single-end read on the Watson strand at position i, we could practically assume that its end position should be within a region, say [i + a, i + b], and follows some distribution. For practical purpose, we let a = 51, b = 68. We further assumed that the start and end positions of the DNA inserts are independently distributed around the two edges of the nucleosome they map. Let c
i + 51, …, c
i + 68 be the Crick strand tag numbers in this region. Then the relative frequency defined as
can be used to estimate the probability of a DNA insert ending at position i + k for k = 51, …, 68. Thus, if we observe w
single-end tags at position i from the Watson strand, then we could regard that we had observed w
paired-end tags ending at i + k for k = 51, …, 68 with respective frequency
. Likewise, if we observe c
single-end tags at position i from the Crick strand, we would regard that there were c
paired-end tags ending at i − k for k = 51, …, 68 with respective frequency
. By this calculation the observed data with
single-end tags are converted approximately to a pseudo data set consisting of
paired-end tags. The approach defined above for paired-end data was then used to define the center-weighted reads occupancy score and the nucleosome centers.
The sequences obtained and detailed descriptions of the computational analyses are available . The M. thermoautotrophicus and T. kodakarensis genome coordinates and RefSeq transcript annotations used were from the methTher1 and therkoda1 genome assemblies available on the Archaeal Genome Browser web site .