Effects of Alu elements on global nucleosome positioning in the human genome
© Tanaka et al. 2010
Received: 25 February 2009
Accepted: 17 May 2010
Published: 17 May 2010
Skip to main content
© Tanaka et al. 2010
Received: 25 February 2009
Accepted: 17 May 2010
Published: 17 May 2010
Understanding the genome sequence-specific positioning of nucleosomes is essential to understand various cellular processes, such as transcriptional regulation and replication. As a typical example, the 10-bp periodicity of AA/TT and GC dinucleotides has been reported in several species, but it is still unclear whether this feature can be observed in the whole genomes of all eukaryotes.
With Fourier analysis, we found that this is not the case: 84-bp and 167-bp periodicities are prevalent in primates. The 167-bp periodicity is intriguing because it is almost equal to the sum of the lengths of a nucleosomal unit and its linker region. After masking Alu elements, these periodicities were greatly diminished. Next, using two independent large-scale sets of nucleosome mapping data, we analyzed the distribution of nucleosomes in the vicinity of Alu elements and showed that (1) there are one or two fixed slot(s) for nucleosome positioning within the Alu element and (2) the positioning of neighboring nucleosomes seems to be in phase, more or less, with the presence of Alu elements. Furthermore, (3) these effects of Alu elements on nucleosome positioning are consistent with inactivation of promoter activity in Alu elements.
Our discoveries suggest that the principle governing nucleosome positioning differs greatly across species and that the Alu family is an important factor in primate genomes.
The genomic DNA of eukaryotes forms chromatin structures with several proteins. Chromatin is composed of nucleosome cores in which 146-147 base pairs (bp) of DNA are wrapped in 1.67 turns around a histone octamer containing two copies each of four core histones: H2A, H2B, H3, and H4 . Another histone (linker histone) binds to about 20 bp of DNA in the linker region flanking the nucleosome core [2, 3]. Nucleosomes are involved in various cellular processes, including transcription, because chromatin can limit the accessibility of regulatory sites. For example, it has been reported in several organisms that the nucleosome occupancy rate upstream from transcription start sites (TSSs) is lower than that in other regions [4–12]. Therefore, understanding the mechanism of nucleosome positioning is important for the analysis of transcriptional regulation and promoter functions.
It is known that nucleosome positioning can be affected by DNA sequence. Many previous studies have identified various motifs for nucleosome positioning or inhibition with in vivo and in vitro experiments [13–18]. It is also known that 10-bp periodic AA/TT or GC dinucleotides are strongly associated with nucleosome positioning in the genomes of several species and in synthetic DNAs [19–22]. Short oligonucleotides occurring at intervals of about 10 bp are associated with the positions of the major grooves or minor grooves facing the histone surface and with the bendability of DNA during nucleosome formation . Using these dependencies, some researchers recently succeeded, more or less, in the computational prediction of nucleosome positions in the genome sequences of several yeasts [24–27]. In particular, Segal et al. explained about 50% of in vivo nucleosome positions using a position weight matrix of center-aligned mononucleosome DNA in budding yeast and chicken . The 10-bp periodicity has been observed by Fourier analysis in the genome of nematode, plant, insect and fungus .
In recent years, high-throughput sequencing techniques and tiling array experiments have provided an avalanche of nucleosomal DNA location information in the human [8–10], fly [11, 30], nematode [7, 20], and budding yeast genomes [4–6, 12]. Schones et al. demonstrated nucleosomal reorganization during the activation of human T cells using a large number of nucleosomal DNAs, which were massively sequenced with a new-generation sequencer. Lee et al. and Shivaswamy et al. showed that about 70%-80% of the whole genome of budding yeast is occupied by nucleosomes. These large-scale experiments make it possible to analyze the sequence dependencies of global nucleosomal positioning across a wide range of organisms.
In this study, we first asked whether the reported periodic motifs can widely affect in vivo nucleosome locations through the whole genomes of all eukaryotes. Using Fourier analysis, the spectrum of primate genomes does not exhibit clear peaks with a 10-bp periodicity: strong and wide 84-bp and 167-bp periodicities are observed, instead. These periodicities, which may be associated with the length of DNA wrapping core histones and the linker histone, mainly originate from Alu repetitive elements, as their strength decreased markedly in Alu-masked genomes.
The Alu family are primate-specific short interspersed elements (SINEs), and constitute the most prevalent repetitive element in the human genome . Alu elements are categorized into two groups: monomers and dimers. A typical dimeric Alu is about 300 bp long, and is composed of two distinct GC- and CpG-rich monomers flanking an A-rich region and a poly(A) tract. Monomeric Alu elements consist of two classes: free left Alu monomers (FLAM) and free right Alu monomers (FRAM), corresponding to each monomer in a dimer. The left monomer is slightly shorter than the right one . Although a few Alu elements in promoters are reported to affect their downstream gene expression [33, 34], most of them are silent in cells. Thus, specific positioning of nucleosomes on Alu elements may be important in masking unnecessary effects of Alu's to nearby genes. DNase I nicking analyses have demonstrated that several dimeric Alu elements have some affinity to core histones [35, 36]. According to these studies, two nucleosomes are formed in both sides of the central A-tract of Alu elements.
Using large-scale nucleosome mapping data, we observed that such specific positioning occurs globally in the entire genome. We further showed that nucleosomes are also arranged in nucleosomal repeat lengths (about 170-200 bp) around them. Our discoveries should be useful in the prediction of nucleosomal positions in primates and the analysis of transcriptional regulation.
We analyzed the genome-wide nucleotide periodicity in 16 different species (See additional file 1) under the assumption that if the 10-bp periodic nucleotides are the main factor determining nucleosome positioning, peaks of this periodicity should also be observed in the whole genome. The strength of the periodicity was calculated for 12 kinds of mono- and di-nucleotide steps using Fourier analysis [21, 37].
Because it is known that some repetitive elements are detectable by nucleotide periodicities , we next analyzed the genome-wide periodicity while masking all Alu elements, which are primate-specific SINEs detected by RepeatMasker. The strengths of the 84-bp and 167-bp periodicities in the Alu-masked primate genomes were remarkably lower than those in the nonmasked genomes (green lines in Figure 1 and additional file 3). Although weak and sharp peaks of 84-bp or 167-bp periodicity still remained in some nucleotide steps, they were possibly derived from Alu elements that had not been identified by RepeatMasker or from some other genome features. Thus, it is clear that these 84-bp and 167-bp periodicities mainly originate from Alu repetitive elements.
Mapping results of nucleosome DNA tags
Nucleosomal DNA tags
We also detected phased nucleosome positioning at approximately the nucleosome repeat length (about 170-200 bp) around the Alu elements (yellow ovals in Figure 2) . This phased frequency extended to both orientations and became weaker as the site is more distant from Alu elements. These observations suggest that Alu elements contain nucleosomes in specific positions, and also influence the positioning of neighboring nucleosomes.
Considering these data together, it is clear that the Alu family significantly affects nucleosome positioning in the human genome.
Alu elements are distributed only in primates, and comprise about 10% of the whole human genome . In this study, we showed that nucleosome positioning is significantly influenced within and around Alu elements using large-scale nucleosome mapping data. Englander et al. previously reported that dimeric Alu elements have a capacity containing two nucleosomes [35, 36]. These positions are consistent with the two peaks observed in Figure 2B. Since DNase I nicking pattern in their experiment is less clear in the right arm compared with the left arm of dimeric Alu elements, the low occupancy in the right arm observed in our analysis may be caused by the lower stability of nucleosome positioning. From a massive number of nucleosomal DNA sequence data, we demonstrated that neighboring nucleosomes tend to be arranged with regular intervals. Nucleosome signals from the tiling array data showed some preference for nucleosome positioning just around Alu elements. The A-rich sequences observed in low nucleosome density regions are known as a motif for inhibiting nucleosome formation . From our analysis based on two independent data sets, we conclude that the lower nucleosome signals at the boundary of Alu elements affect the arrangement of nucleosome positioning around them. Although nucleosome depletions are often observed in the promoters of active genes, few TSS tags could be mapped in Alu elements and their flanking regions. It is possible that restriction of Alu elements is regulated by other epigenetic factors. This hypothesis is supported by two previous reports that 76.2% of CpG sites in Alu elements are completely methylated , and that methylation of H3 lysine 9 is enriched in Alu elements .
The genome-wide periodicity of 16 organisms showed differences in their sequence dependence. Peaks with 10-bp periodicity were observed in organisms from budding yeast to chicken. Our results are in agreement with those of Segal et al., who showed that the nucleosome locations in budding yeast can be predicted with the model constructed from chicken nucleosomal DNA. In some organisms, periodicities of about 9 bp and 11 bp have been found. These periodicities might be related to the minimum and maximum periods of the double helix of DNA in nucleosomes because its observed value varies from 9.4 bp to 10.9 bp . Furthermore, Gupta et al. previously showed 3-bp periodicities of CG and GC dinucleotides are overrepresented in nucleosome-forming sequences in promoters and ENCODE regions . In our results, peaks of CG 3-bp periodicities are found in many species, but not in primates, suggesting that these sequence dependencies may not effect across the whole primate genomes.
It is surprising that the sequence features governing global nucleosomal positioning are quite different among organisms, even though the core histone protein families are highly conserved among species . Although it has been reported that a set of DNA molecules with the highest affinity with the histone octamer in an in vitro selection assay showed a 10-bp periodicity of mono- or dinucleotide steps , our result suggests that they may not use these features as the major mechanism of nucleosomal positioning, to allow flexibility of chromatin remodeling and complex transcription regulation. We showed that a significant part of nucleosome positioning in the primate genomes can be explained by 84-bp and 167-bp periodicities, which were previously reported in human chromosomes 21 and 22 by a similar method . Here, we confirmed that these periodicities are specific to primates because they are due to primate-specific Alu elements.
Overall, our study provides an important clue to understanding the whole chromatin composition of the human genome. We hope that our discoveries will extend our understanding of the nucleosomal organization in primate genomes, and contribute essential knowledge about the complexities of transcriptional regulation.
Genome sequences were downloaded from UCSC, The institute for Genomic Research (TIGR), RIKEN, and Saccharomyces genome database (SGD) (See additional file 1). All positions of repetitive elements and RefSeq genes were obtained from "rmsk" files and the "refGene.txt" file in the UCSC database, respectively.
The tiling array data for human promoters (GSE6385) were downloaded from Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/. About 40 million raw sequence reads of precise TSSs from HEK293 and MCF7 cells were obtained from DBTSS http://dbtss.hgc.jp/.
For each species, 10,000 fragments of length 8,193 (213 + 1) bp were randomly extracted from the whole-genome sequence without gap regions ("N" characters). As a control, we generated 10,000 random DNA sequences that had the same base composition as the genome sequence of each organism. To apply discrete Fourier transformations to DNA sequences, we transformed the nucleotide steps to binary codes. For two mononucleotide steps and 10 dinucleotide steps, the target step γ; was converted to 1, and the other steps were converted to 0. For example, if γ is AA/TT, a DNA sequence "CTTGAAT" is changed to the binary sequence "010010".
The background noise is defined as the average power spectrum. The S/N ratio R γ (n) is interpreted as the relative strength of the periodicity p (= N/n) [38, 48, 49]. The S/N ratios were calculated for all DNA fragments, and were averaged for each organism. The Fourier transform was calculated with the FFTW3 library .
In this study, the Alu family was categorized into three classes: dimer Alu, FLAM, and FRAM, based on the "repName" field (See additional file 7). We also restricted the length of each Alu class to 280-320 bp, 110-150 bp, and 150-190 bp, respectively. Sequences that were longer or shorter than these lengths were discarded.
The paired-end nucleosomal DNA tags were taken from HEK293 cells by the micrococcal nuclease (MNase) digestion and by the sequencing with the Illumina GA-II sequencers (Irie et al., submitted). All 36-bp nucleosome reads were mapped to the human genome (chr1-22, chrX and chrY) using Bowtie (version 0.12.2) with the option: "-f -I 122 -X 172 -n 0 -a --best --strata --chunkmbs 2048" . Because the 5' ends of nucleosomal DNA were sequenced on both strands, their midpoint was presumed to be the positions of the nucleosome core centers. Since the digestion of MNase does not always produce complete 147-bp DNA fragments, we further calculated the nucleosome signal S(i), which is introduced as the "coarse-grain smoothing" in . As background, 30 million 36-bp paired-end genome fragments, whose interval is 147 bp, were randomly extracted and were also mapped to the human genome by Bowtie. In the same way, we deduced the background signal S back (i) from the uniquely mapped random fragments. The signals, S(i) and S back (i), were normalized with the p.p.b. (parts per billion) unit.
In the human promoter tiling array , we used probes whose center positions were within 600 bp of Alu elements. We also selected other repetitive elements (all repetitive elements except Alu in the UCSC annotation) that were the same size as the Alu elements, and probes within 600 bp of them were extracted similarly. Each probe was sorted by the distance from the 5' end or the 3' end of these repetitive elements. The hybridization values of the probes at each distance were averaged in each cell type or in all cells. As the background of the signals, we shuffled the hybridization values of all probes in the tiling array. This shuffling was repeated 10 times. Averages and standard deviations were calculated from the 10 randomized data sets and were represented by points and error bars, respectively.
To obtain the random sites, we repeated the calculation of this rate 10 times, and calculated the means and standard deviations of the results.
We are grateful to Kengo Kinoshita, Kohji Okamura, Alexis Vandenbon, and other members of the Nakai-Kinoshita Laboratory for a great deal of valuable advice and discussions. Computational time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, The University of Tokyo. This work was supported in part by Global COE Program (Center of Education and Research for Advanced Genome-Based Medicine), MEXT, Japan.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.