Genetic and epigenetic variations contributed by Alu retrotransposition

Background De novo retrotransposition of Alu elements has been recognized as a major driver for insertion polymorphisms in human populations. In this study, we exploited Alu-anchored bisulfite PCR libraries to identify evolutionarily recent Alu element insertions, and to investigate their genetic and epigenetic variation. Results A total of 327 putatively recent Alu insertions were identified, altogether represented by 1,762 sequence reads. Nearly all such de novo retrotransposition events (316/327) were novel. Forty-seven out of forty-nine randomly selected events, corresponding to nineteen genomic loci, were sequence-verified. Alu element insertions remained hemizygous in one or more individuals in sixteen of the nineteen genomic loci. The Alu elements were found to be enriched for young Alu families with characteristic sequence features, such as the presence of a longer poly(A) tail. In addition, we documented the occurrence of a duplication of the AT-rich target site in their immediate flanking sequences, a hallmark of retrotransposition. Furthermore, we found the sequence motif (TT/AAAA) that is recognized by the ORF2P protein encoded by LINE-1 in their 5'-flanking regions, consistent with the fact that Alu retrotransposition is facilitated by LINE-1 elements. While most of these Alu elements were heavily methylated, we identified an Alu localized 1.5 kb downstream of TOMM5 that exhibited a completely unmethylated left arm. Interestingly, we observed differential methylation of its immediate 5' and 3' flanking CpG dinucleotides, in concordance with the unmethylated and methylated statuses of its internal 5' and 3' sequences, respectively. Importantly, TOMM5's CpG island and the 3 Alu repeats and 1 MIR element localized upstream of this newly inserted Alu were also found to be unmethylated. Methylation analyses of two additional genomic loci revealed no methylation differences in CpG dinucleotides flanking the Alu insertion sites in the two homologous chromosomes, irrespective of the presence or absence of the insertion. Conclusions We anticipate that the combination of methodologies utilized in this study, which included repeat-anchored bisulfite PCR sequencing and the computational analysis pipeline herein reported, will prove invaluable for the generation of genetic and epigenetic variation maps.


Background
Repetitive elements constitute over 50% of the human genomic sequence [1]. The most prevalent repeats are the Alu family of SINEs, which comprise approximately 10% of the human genome. A typical Alu element is approximately 300 bp long and contains two almost identical arms separated by an A-rich sequence. The ancestor of the Alu monomer is the 7 SL RNA gene, which encodes the RNA component of the signal recognition particle (SRP) that is involved in the translocation of newly synthesized proteins [2,3]. Similar to the 7 SL gene, Alu elements with intact promoters -namely A and B boxes -may be transcribed by RNA polymerase III [2,4]. With the aid of the LINE-encoded retrotransposition machinery, Alu transcripts gain mobility and expand in genomes through a process involving reverse transcription and integration [5].
Alu retrotransposition has been an important molecular evolutionary force reshaping the primate genomes [6]. The expansion of the Alu elements in the primate genomes is dated at least 60 million years ago [7]. Based on their evolutionary history, Alu elements can be classified in three major subfamilies: AluJ, AluS, and AluY [8]. Among them, the youngest Alu elements -AluY and its variants AluYa-g -remain very active, and exhibit the highest rate of retrotransposition in the human genome [9][10][11][12]. While several recent studies have shown that LINE-1 elements contribute substantially to the structural variations observed in the human genome [13][14][15], the retrotransposition rate of Alu elements is ten times higher than that of LINE-1, with an estimated new insertion at every 21 births [16].
Decades of research have demonstrated that Alu elements play important roles in the genome and transcriptome [17][18][19][20]. Alu elements may contribute a large number of transcription factor binding sites [21], some of which may serve as enhancers involved in tissue development [22,23]. In addition, some Alu elements may be expressed and Alu transcription affects nearby gene expression, distal gene expression, and global translation. For instance, the expression of an Alu in the promoter of an epsilon-globin gene was found to negatively regulate globin gene expression by transcriptional interference [24]. Recently, Alu RNA was found to be a modular transacting repressor of mRNA transcription [25]. Interestingly, such transcriptional suppression was found to be specific and limited to certain genes. Alu RNAs also affect translational initiation and were found to form stable, discrete complexes with the doublestranded RNA-activated kinase PKR, and to antagonize PKR activation [26]. Transcription derepression of otherwise active Alu elements, which so often reside within genes, may lead to the formation of doublestrand RNA -if in antisense orientation -and ultimately to heterochromatinization and silencing of the gene [27].
One of the key mechanisms controlling Alu expression is DNA methylation. The human genome has approximately 28 million CpG dinucleotides, 7 million of which are found within Alu elements [1]. In most somatic tissues, the CpG dinucleotides within the Alu sequence are heavily methylated to suppress Alu expression [28,29]. It has been demonstrated that the A and B boxes (5-16 bp, and 75-84 bp from the 5' terminus, respectively) are critical cis-elements for Alu expression. In particular, methylation of the B box is thought to inhibit protein binding and hence block Alu transcription [30]. Albeit not sufficient, demethylation and consequently transcription of Alu elements is required for occurrence of de novo retrotransposition [28]. Methylated CpGs can undergo deamination and thereby lead to mutations that render them unable for retrotransposition [8,9].
Although much effort has been made to identify structural variations resulting from Alu integration, much less is known with regard to the epigenetic status of newly inserted elements and of their flanking genomic sequences. Here we report the utilization of an Alu-anchored bisulfite PCR strategy to generate methylation maps for thousands of Alu elements in human cerebellum and in ependymomas [31,32]. In this approach, most of the targeted Alu elements are members of the active AluY subfamilies. In this study, we analyzed the aforementioned datasets to identify newly integrated Alu elements, to investigate sequence characteristics and commonalities of their integration sites, to uncover their methylation statuses, and to determine whether the methylation patterns of the sequences surrounding their integration sites would be altered in the alleles harboring the insertion in individuals hemizygous for the Alu retrotransposon.

Identification of recent Alu insertions
The method developed by Xie and colleagues was initially designed to generate a methylation map for a subset of young Alu elements [32]. The strategy applied a primer targeting CpG-rich Alu repeats to simultaneously amplify thousands of Alu elements and their 5' flanking sequences. Unequivocal mapping of these repeats was therefore achieved through their -most often unique -5' flanking sequences. Eight Alu libraries were derived with this strategy, six from ependymomas and two from normal brain tissues [31,32].
In previous studies, a number of sequence reads from these libraries could only be partially mapped to the human reference sequence. In order to determine whether any of these sequence reads corresponded to a novel Alu integration event, i.e. one that was not yet documented in the UCSC database, we designed a computational pipeline to reanalyze these datasets ( Figure  1). For 158,591 sequence reads partially mapped in previous studies, we first masked Alu sequences and then selected the ones containing at least 40 bp of 5' flanking sequences. A total of 24,820 sequence reads were thus identified. The Alu flanking sequences were then extracted from these reads and subjected to Megablast against in silico bisulfite converted human reference genome sequence. Unambiguous mapping was achieved for 8,738 sequence reads. As expected, the majority of these reads (79.8%) mapped to genomic sequences adjacent to an Alu element. Further examination of the remaining 1,762 sequences reads (Additional File 1, Table S1) enabled their grouping into 327 clusters according to their genomic coordinates (Additional File 2, Table S2). It is noteworthy that due to the highly stringent mapping criteria applied in our previous studies [31,32], a few mismatches in the alignments between the reference genomic sequence and the sequences generated from the Alu libraries were sufficient to lead to their classification as "partially mapped" reads.
We examined the distribution of the 327 clusters comprised of 1,762 sequences reads in eight Aluanchored bisulfite PCR libraries ( Table 1). Out of 327 clusters, 163 clusters (49.8%) were found to be supported by more than one sequence read and 87 clusters (26.7%) were found to be present in more than one library. Among these 87 clusters, 56 clusters (64.4%) were found in both normal and tumor tissues. This indicates that a majority of these putative insertions are not associated with tumorigenesis and/or cancer progression. In addition, a library derived from a normal brain tissue contributed 159 clusters (48.6% of the total 327 clusters) with 692 sequence reads (39.3% of 1,762 total sequence reads), while a library derived from a relapsed aggressive ependymoma only contributed 16 clusters with 23 sequence reads. Based on the difference in number of sequence reads generated from each library, we normalized -for each library -the number of putative insertions that were identified, to the number of Alu repeats that were successfully mapped to the reference genome. No significant difference was observed in this ratio between normal and tumor tissues (p = 0.34, t-Test).
To investigate whether these putatively new Alu insertions had been identified in previous studies, we extracted 1,763 and 795 known polymorphic Alu elements from dbSNP (The Single Nucleotide Polymorphism database, NCBI) [   Retrotransposon Insertion Polymorphisms) [34], respectively. This analysis revealed that 316 of the 327 clusters were novel, i.e. they corresponded to yet undocumented de novo retrotransposition events. The putative integration sites of 140 of such clusters (42.8%) were found to localize to intronic regions, except for one, which mapped to the 3'-UTR of TOMM40, a gene that codes for the translocase of the mitochondrial outer membrane (TOM) complex. We further analyzed these genes with NCBI's DAVID functional annotation tool to examine whether any specific gene category was more likely to harbor these Alu insertions. One hundred thirty-two genes were found annotated in the NCBI database. Compared to all genes annotated in the human genome, no significant enrichment was identified for this set of 132 genes in terms of biological process, cellular localization or molecular function (Additional File 2, Table S2).

Verification of recent Alu insertions
To validate the evolutionarily recent Alu de novo retrotransposition events identified in this study, we randomly selected twenty-one genomic loci encompassing such putative new Alu insertions. For each genomic locus, we designed primers based on the upstream and downstream sequences surrounding the predicted integration sites. With these primers, the PCR products were expected to be~120 bp (without Alu insertion) or 420 bp (with Alu insertion). Due to the diploidy of the human genome, three kinds of PCR results were expected: (1) hemizygous Alu insertion: PCR products of two different sizes were expected, one fragment with the Alu insertion and another without it (spanning~420 bp and~120 bp, respectively); (2) homozygous Alu insertion: only one PCR product was expected, this fragment containing an Alu element (spanning~420 bp); (3) nulizygous Alu insertion: no Alu insertion was present in either homologous chromosome, hence just one small PCR product (spanning~120 bp) was expected.
The Alu insertions were successfully verified for fortyseven out of forty-nine cases representing nineteen genomic loci ( Figure 2). To ensure that the regions amplified by PCR were indeed new Alu insertions, for each locus, PCR products were cloned and sequence-verified. The sequences representing these nineteen genomic loci were submitted to GenBank. Their accession numbers are: [HQ709117, HQ709118, HQ709119, HQ709120, HQ709121, HQ709122, HQ709123, HQ709124, HQ709125, HQ709126, HQ709127, HQ709128, HQ709129, HQ709130, HQ709131, HQ709132, HQ709133, HQ709134, HQ709135]. Fourteen out of the nineteen insertion events were predicted to occur in more than one individual. Interestingly, we found that nine out of these fourteen insertions were hemizygous in all individuals examined -i.e., the Alu insertion only occurred in one of the two homologous chromosomes. The remaining five insertions were hemizygous for some individuals and homozygous for others. From a total of forty-seven Alu insertions, thirty-six were found to be hemizygous and eleven were found to be homozygous. The fact that the majority of the insertions have remained in hemizygosity in the genome may be interpreted as suggestive of their recent evolutionary origin. However, that will remain speculative until populational studies are performed.

Genomic features and sequence characteristics of Alu elements and their flanking sequences
It has been shown that polymorphic Alu elements and their flanking sequences may share some distinct sequence features [34,35]. The Alu transcripts derived from the ones with conserved structure would interact productively with SRP9/15 host proteins and gain the ability to retrotranspose [12]. The AluY subfamily and its variants Yc1, Yc2, Ya5, Ya5a2, Ya8, Yd8, Yb8, and Yb9, are the ones considered to be very active due to the conservation of its structure. To conclude the analysis, we classified the Alu elements identified in this study according to its family of origin. We found that the new insertions identified in this study belong to the relatively recent family of AluY elements or to the subfamilies AluYa5, AluYb8, AluYb9, and AluYg6. It has also been shown that the occurrence of a longer poly(A) tail might facilitate Alu retrotransposition [35]. Our analysis revealed that all twenty-two new Alu insertions that were sequence verified in this study have an A-tail that ranged from 11 to 45 nucleotides, with an average length of 29 bp.
Alu retrotransposition is facilitated by LINE-1 elements. LINE elements encompass two open reading frames, namely ORF1 and ORF2P. ORF1 encodes a non-specific RNA binding protein, and ORF2P encodes an endonuclease and a reverse transcriptase. During the process of retrotransposition ORF2P cleaves genomic DNA at a degenerate consensus sequence (TT/AAAA). Accordingly, the presence of a TT/AAAA sequence motif in the 5'-flanking region seems essential for Alu insertion [5,36,37]. The Alu insertion site is generated by a single-strand break that occurs in the target DNA made by ORF2P. The mechanism of Alu insertion is called Target Primed Reverse Transcription (TPRT) [8,37]. Indeed, we were able to document the occurrence of this sequence motif -either a perfect match or a highly similar sequence -in the 5' flanking regions of all new Alu insertions that were sequence-verified in this study. For the nineteen Alu insertions identified in this study, the characteristic sequence features of Alu and flanking sequences are summarized in Table 2.
In addition to the Alu sequence itself, the genomic sequence adjacent to the recent Alu insertions encompass at least two typical sequences. As a hallmark of a recent retrotransposition event, the sequences immediately flanking the Alu elements corresponded to short direct repeats, ranging from 4-17 nucleotides. The insertion mechanism generates direct target site duplications (TSDs) flanking the newly inserted element. These TSDs have variable length and are highly suggestive of LINE mediated endonucleolytic cleavage [12,38]. Such short direct repeats, also called AT-rich target site duplications, were present in 19 of the sequence-verified genomic loci (Table 2).

Methylation status of recent Alu elements
All sequences generated in our previous studies, encompassing Alu elements and their 5' flanking sequences, were derived from bisulfite converted genomic DNA [31,32]. Due to the high frequency of C-to-T transitions in CpG dinucleotides of Alu repeats caused by deamination of the methylated cytosines, in the absence of a reference genomic sequence, one cannot determine the methylation status of a novel Alu insertion by this method. Hence, to examine the methylation pattern of the newly integrated Alu elements, we aligned the sequences generated in this study for nineteen of such Alu elements with their bisulfite converted sequences from our previous studies [31,32] (Additional File 3, Figure S1). Our results showed that the recently inserted Alu elements are heavily methylated, with an average methylation level of 90.7%; this is similar to the average methylation level observed for evolutionarily young non- Figure 2 PCR validation of putative Alu insertions (a-g). The Alu insertions were sorted based on genomic coordinates. The Alu insertions were named AI1 through AI21. N-normal brain tissue DNA; E1, E2, and E3-brain tumor tissue (ependymoma) DNA from different individuals; P and R-ependymoma DNA, P is primary and R is relapsed tumor from the same individual. polymorphic Alu elements [31,32]. We further examined the methylation status of the two important promoter regions inside the Alu elements, the A and B boxes. Alu elements have a bipartite structure, which is similar to that of tRNA elements. It has been shown that the A box is responsible for determining the strength of the Pol III promoter and the B box is important in enabling transcription [30,39]. Also, deletion of the B box sequence completely abolished transcription of the elements, while deletion of the A box reduced the efficiency of transcription [40]. In almost all cases, these promoter sequences were methylated (Additional File 3, Figure S1). This result suggests that transcription of most newly inserted Alu elements is suppressed by DNA methylation.
Interestingly, we found one Alu element at chr9:37594172-37594310 with a completely unmethylated 5'-end (AI19, Additional File 3, Figure S1). Since amongst all Alu elements chosen for verification this was the only element found to be unmethylated, and also because only two bisulfite sequence reads had been previously generated for this element [31,32], we designed bisulfite PCR primers to amplify the entire Alu element including the two flanking CpG dinucleotides ( Figure 3). Indeed, the 5'-end of this newly inserted Alu element was found to be completely unmethylated while its 3'-end exhibited some degree of DNA methylation. It is noteworthy that the 5'-flanking CpG site was completely unmethylated, and the 3'-flanking CpG site was completely methylated. Importantly, the 5' terminal nucleotide of this newly inserted Alu element mapped 1,576 bp downstream from a CpG island and 1,674 bp downstream from the transcription start site of the TOMM5 gene. This result suggests that the methylation status of this Alu element is under the influence of the epigenetic environment surrounding its insertion site. Since this Alu insertion was found to be in homozygosity, i.e. it was present in the two homologous chromosomes, we were not able to investigate whether the Alu insertion exerted any influence on the methylation status of CpG dinucleotides flanking the Alu element. To confirm our hypothesis that the methylation of the Alu element is under the influence of the CpG island, we ascertained the methylation status of a fragment (chr9:37592324-37592701) corresponding to the 5' terminal 377 bp of the CpG island, and also of the AluJo element flanking the 3'end (chr9:37594745-37595002) of the newly identified Alu element that was partially methylated. Indeed, we found that this CpG island fragment was completely unmethylated while the AluJo sequence flanking the 3'end of the newly inserted Alu exhibited a methylation level of the order of 40%. Interestingly, this AluJo exhibited a pattern of methylation very similar to the pattern presented by the newly inserted Alu element (Figure 3). There are 3 Alu repeats and 1 MIR element localized between the newly inserted Alu and the CpG island. The methylation levels of these elements are indeed very low (Figure 3). We conducted similar analysis to two other genomic loci, chr10:72605338-72605440 and chr2:48276482-48276601, which were randomly chosen. The Alu insertions on these two loci were found to be in hemizygosity. This allowed us to compare the methylation status of the alleles with and without the Alu insertion ( Figure  4). The sequencing results derived from bisulfite-PCR cloning demonstrated that both newly inserted Alu elements were indeed heavily methylated, as anticipated based upon our previously generated high-throughput bisulfite sequencing data (AI10 and AI11, Additional File 3, Figure S1). In addition, for the two genomic loci examined, there was no methylation difference between the alleles with and without the Alu insertion in the two homologous chromosomes, nor was there a difference in the methylation statuses of the CpG dinucleotides flanking the chr2:48276482-48276601 Alu insertion site. Furthermore, the CpG dinucleotide that is immediately downstream of the chr10:72605338-72605440 Alu insertion site was also found to be methylated. Due to the Figure 3 Bisulfite PCR cloning and sequencing to validate methylation status of an unmethylated Alu insertion (chr9:37594172-37594310). Asterisk indicates the CpG dinucleotides that are flanking the Alu element; the scheme shows the relative location of TOMM5 and the CpG island in relation to the Alu AI19 insertion (USCS Genome Bioinformatics). a) methylation status of a downstream AluJo (sequence coordinates: chr9:37594745-37595002) near the newly inserted Alu element; b) newly inserted Alu element and its methylation status; c) methylation status of 2 CpGs upstream of the newly inserted Alu; d), e), f), and g) methylation statuses of 3 Alu repeats and 1 MIR element localized between the newly inserted Alu and the CpG island, respectively, AluSx, AluJo, MIRb, and AluSx; h) methylation status of the 5'end of a CpG island located 1,576 bp (sequence coordinates: chr9:37592324-37592701) upstream from the newly inserted Alu element. Note that the TOMM5 transcription unit is in opposite orientation to that of the newly inserted Alu element. The methylation levels for a, b, c, d, e, f, g, and h were 40%, 33.7%, 79.1%, 4.2%, 0%, 3%, 32%, and 0.6%, respectively. low CpG density of its 5' flanking genomic sequence, no methylation data were derived for the region upstream of chr10:72605338-72605440. To identify methylation differences among samples, we calculated the methylation level of all mapped Alu elements, and also that of the structural variants present in the 19 loci verified. This analysis revealed no methylation differences among tissues (Additional File 4, Table S3).

Discussion
Recent studies demonstrated that major structural variants in the human genome are derived from retrotransposons, Alu elements in particular [16,41]. Due to the extensive sequence homology that exists among young Alu repeats, the identification of such structural variants remains a challenging task. To date, a total of 2,558 polymorphic Alu retrotransposons have been reported to occur in human populations, 1,763 of which have been deposited in dbSNP and 795 in dbRIP. In this study we implemented a computational pipeline to identify recent Alu insertions, and examined the methylation status of the newly inserted Alu retrotransposons and their flanking sequences. At the time we developed this strategy the Genome Sequencer FLX System was the most suitable alternative available, given the greater length of the sequence reads that it generates, and the fact that sequences would encompass an Alu repeat and would be derived from bisulfite-converted genomic DNA. Altogether, the longer reads generated with the FLX System greatly facilitated their mapping back to the reference genome sequence. Notwithstanding this advantage, however, we anticipate that our approach may be adapted to take advantage of competing next generation sequencing platforms that have a higher throughput and that can now generate sufficiently long sequence reads. Using this strategy a total of 327 putative Alu elements were identified. We found that 42.8% of their insertion sites fell within intronic regions, while one integration site mapped to the 3'-UTR of the TOMM40 gene. TOMM40 is a component of the preprotein translocase complex of the outer mitochondrial membrane, which consists of at least 7 different proteins (TOMM5, TOMM6, TOMM7, TOMM20, TOMM22, TOMM40, and TOMM70). These results are consistent with previous studies indicating that Alu retrotransposons tend to be inserted within intragenic regions [1,42].
Out of the twenty-one insertion events that were randomly selected for validation analysis, nineteen were successfully verified. A limitation of the Alu-anchored bisulfite PCR approach that needs to be acknowledged is the fact that only 5' flanking sequences are obtained. The right arm of the Alu retrotransposons and their 3' flanking sequences are not represented in the sequence reads that are generated. Hence, in order to design primers for the validation experiments, we used the reference sequence of the human genome as source of putative 3'-flanking sequences for the Alu insertions. Accordingly, it is possible that the two cases that could not be verified may have been caused by the utilization of an incorrect 3' flanking sequence for primer design. Notwithstanding this limitation, the lowest estimated accuracy for the analysis pipeline that we have implemented in this study for the identification of de novo Alu retrotransposition events would be of 90.5% (19/21).
The sequence features (TSD, TT/AAAA cleavage sequence, and A-rich Alu tail) that are typically observed in newly inserted Alu elements constitute hallmarks of retrotransposition [5,10,36,37]. Indeed, further analysis of the aforementioned nineteen PCR-cloned Alu elements and flanking sequences revealed the presence of both the TSD and TT/AAAA sites. Alu A-tails seem to be an important factor to enable Alu element retrotransposition [4,35]. Roy-Engel et al. reported that the average A-tail length of active Alu elements is 26 [35]. Consistent to their finding, the Alu A-tail sizes of the Alu elements described in this study ranged from 11-45 with an average of 29 bp.
Most cancer genomes are characterized by localized hypermethylation as well as by global hypomethylation [43,44]. This hypomethylation process may enable transcription and de novo retrotransposition of Alu elements which, in turn, may lead to genome instability [45]. Our previous study demonstrated that the methylation level of Alu elements decreased in ependymomas, and most significantly in recurrent tumors [31]. To examine whether some Alu insertions represented somatic events limited to recurrent ependymomas, which could have occurred in consequence of the loss of DNA methylation, we generated and compared PCR products from ten genomic loci in primary and in recurrent tumors derived from one individual. The same results were obtained in all ten genomic loci. In addition, five of the ten Alu insertions were also found in other individuals. These results suggested that such validated Alu insertions most likely represent germ-line rather than somatic events.
In this study, in addition to identifying structural variants in the genome of 6 individuals, we investigated epigenetic variations that might result from de novo retrotransponsition events. The Alu elements identified in this study were heavily methylated, as it was previously shown by high-throughput bisulfite sequencing and herein validated by cloning and sequencing analyses. The analysis of methylation throughout the mapped Alus and among the 19 loci verified revealed that there were no methylation differences among tissues (Additional File 4, Table S3). This result indicates that at least by the time these DNA samples were obtained most of the newly inserted Alu elements were already transcriptionally repressed. This finding is further supported by the fact that the promoters of the Alu elements, i.e. their A and B boxes, were found to be methylated. However, there was one exception. We found an Alu insertion that was partially unmethylated (chr9:37594172-37594310). Interestingly, the insertion of this Alu element occurred 1,576 bp downstream from a CpG island and 1,674 bp downstream from the transcription start site of TOMM5, a gene encoding the translocase of the outer mitochondrial membrane 5. With a completely unmethylated promoter (both the A box and the B box were unmethylated), it is conceivable that this Alu element may have remained transcriptionally active and hence have served as source for additional retrotransposition events. Another interesting finding was that a CpG island that is upstream of the element -i.e., that of TOMM5 -may be influencing the methylation pattern of this Alu repeat. Indeed, the methylation status of the CpG island was similar to that of the 5' end sequences of this Alu repeat, i.e. both were unmethylated. It would be interesting to explore the functional impact of this particular Alu on the nearby TOMM5 gene. Additionally, 3 Alu repeats and 1 MIR element that are localized between the newly inserted Alu repeat and the CpG island were found to exhibit very low methylation levels. Such striking pattern of DNA methylation may indeed be an indication of the influence exerted by the adjacent CpG island. It is also possible that other epigenetic factors might be affecting the methylation statuses of these Alu elements, such as nucleosome positioning. Two previous studies have reported the influence of nucleosome positioning, within and around Alu element, in Alu activity [46,47]. Accordingly, it is noteworthy that an AluJo that is localized downstream of this newly inserted Alu exhibits a similar pattern of DNA methylation, i.e. its 5' half is unmethylated while its 3' half is methylated. In our previous study [32], we found that genomic localization has a profound impact on Alu methylation status. In this study, the identification of both methylated and unmethylated Alu elements provided additional support to there being a positional effect on Alu methylation. Last, but not least, it is noteworthy that two of the novel Alu insertions herein reported map within or near genes encoding members of the preprotein translocase complex of the outer mitochondrial membrane, namely TOMM40 and TOMM5, respectively. It is conceivable that given their housekeeping function and ubiquitous expression pattern, hence commonly open chromatin status, these genes may be more vulnerable to uptake de novo retrotransposition events.
To explore the epigenetic impact of Alu insertion on adjacent genomic sequences, we examined the methylation statuses for two loci harboring hemizygous insertions, and -in one case -obtained the methylation patterns of CpG dinucleotides flanking the Alu insertion sites. Both alleles -irrespective of the presence of an inserted element -were found to be heavily methylated, and no significant epigenetic variation was observed in association with the presence of the additional Alu element.

Conclusions
In this work we have identified a few novel Alu insertions sites. We used DNA samples from normal and from tumor tissues, but the data obtained did not show any tissue preference for these insertions. More studies are highly desired to further scrutinize the functional aspects of structural variants in the human genome, including epigenetic variations that might arise in consequence of a de novo retrotransposition event.

High-throughput bisulfite sequencing datasets for Alu elements
The high-throughput bisulfite sequencing data were derived from Alu-anchored bisulfite PCR libraries derived from tissues samples, including a normal cerebellum, a normal 4 th ventricle lining, two primary nonaggressive, two primary aggressive and two recurrent ependymomas [31,32]. Briefly, genomic DNA is first digested with AluI restriction enzyme, ligated to adaptors and then subjected to bisulfite treatment. Bisulfite treated DNA is amplified with adaptor and Alu-specific primers, the latter targeting a large pool of CpGrich Alu elements. Thus, each PCR product contains the 5'end of an Alu element and its (most often) unique flanking genomic sequence, which makes it possible for each sequence to be unambiguously mapped to the reference human genome. Primary non-aggressive ependymomas are defined as primary tumors from patient free of disease progression for more than 4 years and primary aggressive ones are defined as primary tumors from patients with recurrent disease within 3 years or deceased of disease.

Computational pipeline for the identification of recent Alu insertions
To identify putatively recent Alu insertions, sequence reads rejected in previous studies were selected. Briefly, after removal of primer and adaptor sequences, sequences greater than 40 bp were aligned to the in silico bisulfite converted reference genome using multiple cycles of MegaBLAST. The word size used in Megablast was set to 100 for the first cycle, it was decreased by 20 for every cycle thereafter until the last, for which the minimum length of best perfect match was set to 40. In addition, the identity percentage cutoff for a significant alignment was set to be 100 for the last cycle and 95 for all other cycles of Megablast [32]. The sequence reads that mapped to genomic loci within 10 bp from an Alu element were considered as a putative recent Alu insertion.

PCR, cloning, and sequencing
For PCR primer design, the original (not bisulfite converted) DNA sequences flanking the predicted Alu insertion sites were extracted from the UCSC reference human genome, based on their genomic coordinates [48]. PCR primers were designed in the region surrounding the Alu insertion sites. PCR reactions were performed using HotStartTaq R Plus Master Mix from QIAGEN. Each reaction was prepared as follows: 12.5 μL of HotStartTaq R mix, 30 ng of DNA, 14 μM of each primer, and enough water for 25 μL. The PCR reactions were performed on a MJ Research machine (model PTC 225). Reactions were subjected to an initial activation step of 95°C for 15 min, then by a denaturation step of 94°C for 1 min, followed by 40 cycles of 1 min at 94°C, 30 s at optimal annealing temperature, and 40 s at 72°C, followed by a final extension step of 10 min at 72°C. PCR product annealing temperatures (Tm) and primers used on each reaction are listed on Additional File 5, Table S4. After reactions were completed the amplified fragments were separated using 1.5% agarose gel electrophoresis that was stained with ethidium bromide and visualized using UV fluorescence system. Running was carried out until a good separation of bands was obtained. After separation in the 1.5% agarose gel the bands were excised off the gel and purified using a gel purification kit from QIAGEN, QIAquick R PCR Purification Kit. The purified PCR products were cloned using the TOPO TA Cloning R System from Invitrogen. Sequencing reactions for individual colonies were conducted at the Sequencing Core Facility of the Children's Memorial Research Center of Northwestern University's Feinberg School of Medicine.

Bisulfite PCR
Bisulfite conversion of genomic DNA was performed with EZ DNA Methylation Gold kit (Zymo Research Corporation, Irvine, CA) following the manufacturer's instructions. 300 ng of genomic DNA was treated and eluted with 10 μL of elution solution. After this step, DNA from the chr10:72275361-72275449 genomic locus was amplified using the pair of primers: 5'-GGA TTA AGT TTT TTT TTT GTT T -3' and 5'-CTA CAA AAA AAA ATA ACT CAT A -3'; the chr2:48129974-48130105 genomic locus was amplified using the pair of primers: 5'-CCT TAC CAT TTA AAA ATA AAA AAT CAA -3' and 5'-GTT TAA GAT TTA AAG GAA TGA GTT AG -3'. PCR reactions were prepared using the same reagents and conditions described above. The PCR program used was: activation step of 95°C for 15 min, then by a denaturation step of 94°C for 1 min, followed by 40 cycles of 1 min at 94°C, 30 s at optimal annealing temperature (49°C for locus chr10:72605338-72605440 and 42°C for locus chr2:48276482-48276601), and 40 s at 72°C, followed by a final extension step of 10 min at 72°C. PCR amplified fragments were separated in 1.5% agarose and excised from it as described above.