Computational prediction and molecular confirmation of Helitron transposons in the maize genome

Background Helitrons represent a new class of transposable elements recently uncovered in plants and animals. One remarkable feature of Helitrons is their ability to capture gene sequences, which makes them of considerable potential evolutionary importance. However, because Helitrons lack the typical structural features of other DNA transposable elements, identifying them is a challenge. Currently, most researchers identify Helitrons manually by comparing sequences. With the maize whole genome sequencing project underway, an automated computational Helitron searching tool is needed. The characterization of Helitron activities in maize needs to be addressed in order to better understand the impact of Helitrons on the organization of the genome. Results We developed and implemented a heuristic searching algorithm in PERL for identifying Helitrons. Our HelitronFinder program will (i) take FASTA-formatted DNA sequences as input and identify the hairpin looping patterns, and (ii) exploit the consensus 5' and 3' end sequences of known Helitrons to identify putative ends. We randomly selected five predicted Helitrons from the program's high quality output for molecular verification. Four out of the five predicted Helitrons were confirmed by PCR assays and DNA sequencing in different maize inbred lines. The HelitronFinder program identified two head-to-head dissimilar Helitrons in a maize BAC sequence. Conclusion We have identified 140 new Helitron candidates in maize with our computational tool HelitronFinder by searching maize DNA sequences currently available in GenBank. Four out of five candidates were confirmed to be real by empirical methods, thus validating the predictions of HelitronFinder. Additional points to emerge from our study are that Helitrons do not always insert at an AT dinucleotide in the host sequences, that they can insert immediately adjacent to an existing Helitron, and that their movement may cause changes in the flanking region, such as deletions.


Background
Helitrons represent a new class of transposable elements recently uncovered in animals and plants [1], including maize [2][3][4]. The first two Helitrons described in maize were the causative agents of stable mutations: one in the shrunken2 mutant sh2-7527 [2] and another one in the barren stalk1 reference mutant ba1-Ref [3]. The termini of a 6525-bp Helitron in the ba1-Ref mutant share striking similarity with those of the Helitron insertion in the sh2-7527 mutant, indicating that they belong to the same fam-ily. Lai et al. [4] reported that two Helitrons, HelA and HelB, accounted for all of the genic differences distinguishing two previously described bz locus haplotypes [5]. HelA is 5.9-kb long and contains sequences for three of the four genes found only in the McC bz-locus haplotype. A nearly identical copy of HelA was isolated from a different chromosomal site in the B73 inbred. Both sites appear to be polymorphic in maize, suggesting that these Helitrons have been active recently.
Basic Helitron features include: • Conserved TC and CTAG sequences at the 5' and 3' termini, respectively • Palindromes (16-to 20-bp 'hairpin loops') 10-15 bp upstream of the 3' terminus • Flanking A and T host nucleotides at the 5' and 3' termini, respectively The Figure 1 of a recent paper [4] comparing Helitron end sequences contains the 5' and 3' termini of the maize Hel- Besides the typical Helitron features they all share, there are two invariant CGs located 10 bp apart in each member of the palindromic repeat, the second one occurring just 9 bp from the 3' end. In the HelA subgroup, there is an invariant AA dinucleotide between the palindromic repeats. The 3' terminal 30 bp of HelA are very conserved with other Helitrons. In fact, of those 30 bp, HelA shares 26 and 24 bp, respectively, with the Helitrons previously identified as the causative agents of mutations at sh2 and ba1.
One remarkable feature of Helitrons is their ability to capture gene sequences, a feature that makes them of considerable potential evolutionary importance. However, because Helitrons lack the typical structural features of other DNA transposable elements, identifying them is a challenge. Currently, most researchers identify Helitrons manually by comparing sequences. For example, Wang and Dooner [6] identified Helitrons by vertical comparisons of the bz regions from 8 different maize inbred lines. Although very precise, this approach is time consuming. Just lately, one model-based identification of Helitrons was introduced for Arabidopsis thaliana [7]. With the maize whole genome sequencing project underway, an automated computational Helitron searching tool is needed. The characterization of Helitron activities in the maize genome needs to be addressed in order to better understand the impact of Helitrons on the organization of the maize genome.

Identification of Helitrons by in silico Analysis
There are basically two main non-autonomous categories of Helitrons in maize, Hel1 or HelA, and Hel2 or HelB. The majority of identified Helitrons in maize are of the HelA type (listed in Table 1, which was kindly provided by Dr. S. Lal), so our HelitronFinder program is focussed exclusively on the prediction of maize HelA type Helitrons. The 'hairpin loop' and the CTAG termini at the 3' end of known Helitrons are the key characteristics for the identification of new Helitrons. The most challenging part is to identify the 5' end. For this purpose, we selected the first 25 nucleotides from the 5' end of each known Helitron of Table 1 and aligned them using Clustal [8]. There is a strong similarity in the first 18 nucleotides among the aligned Helitrons (Fig. 2). The consensus from the alignment is our main criterion to search for the 5' end of new Helitrons.
We chose the first 18 nucleotides from Figure  We downloaded maize sequences from the GenBank nonredundant database to our local Sun workstation and used the HelitronFinder program to predict Helitron candidates. There are 44 and 102 predicted Helitrons in our "high quality" and "medium quality" outputs, respectively. The output files are in text format, with a GenBank accession number for each predicted Helitron. Outputs specifically identify Helitron sequences as being in a forward or reverse complement orientation. The Helitron-Finder program also successfully identified all the known Helitrons listed in Table 1.

Confirmation of Helitrons by Molecular Analysis
We randomly selected five predicted Helitrons from the program's high quality output for molecular verification.
Alignment of the first 25 nucleotides of known maize Helitron 5' ends Alignment of the first 25 nucleotides of known maize Helitron 5' ends. A * means that all the sequences at that particular location are the same. There is a strong similarity in the first 18 nucleotides among the aligned Helitrons. The consensus from the alignment is our main criterion to search for the 5' end of new Helitrons. All the Helitrons in this table, which was kindly provided by Dr. S. Lal, have been published. The pertinent references are listed under the "Source" column. The accession numbers refer to entries in the GenBank sequence database: the Helitron coordinates in the sequence are identified under the "Start" and "End" columns.
PCR primers were designed based on the flanking sequence of each predicted Helitron. We surveyed 11 maize inbred and genetic lines for three of the five Helitron candidates and 15 lines for the other two. Four sets of primers successfully amplified either the Helitron-occupied or the Helitron-vacant site from different lines. The PCR products highlighted in bold in Table 2 were cloned and sequenced for further confirmation.
Four out of five selected predicted Helitrons were confirmed by PCR products from different maize inbred lines ( Table 2). They are named Silico 1, 2, 3, and 4, and are predicted from BSSS53, B73, B73, and McC sequences, respectively. The "occupied" and "vacant" entries denote PCR bands corresponding to the presence and absence of Helitrons, respectively. The X sign stands for no PCR amplification product. In addition to the inbreds from whose sequences they were predicted, Helitrons were detected in other inbreds.  5). This result reveals +/ -polymorphism among different inbred lines and confirms that the predicted Helitron, Silico1, is genuine.
Silico3 was predicted from the B73 maize sequences. A total of 15 maize lines were used for molecular verification of this HelitronFinder prediction (Fig. 6). Both B73 and A636 show Helitron occupied sites, whereas lines McC, A188, W22, W23, and bz-R show Helitron vacant sites. In addition to the Helitron band amplified from B73, there was a faint band of the same size as the vacant site. We cloned and sequenced this product and confirmed it to be a vacant site.

Characterizations of Helitrons in the Maize Genome Discovery of Two Adjacent Helitrons
The HelitronFinder program identified two adjacent, head-to-tail Helitrons in a maize BAC sequence with Gen-Bank accession number AF466202 (Fig. 7). This is the first case of back-to-back Helitrons detected in the maize genome. A peculiarity of these head-to-tail Helitron configurations is that the TC 5' terminus of the second Helitron follows the CTAG 3' terminus of the first, creating a novel G/T junction, rather than the A/T junction normally found at a Helitron's 5' end. Pritham and Feschotte [9] reported several cases of perfect head-to-tail junctions of two Helitron elements in the genome of the bat Myotis lucifugus. They suggested that these were tandem repeats of Helitrons in the Myotis lucifugus genome. They also argued that one would expect the A of the host target site to occur Alignment of the last 50 nucleotides of known maize Helitrons 3' end between the CTAG end of the first element and the TC start of the second element if the elements had inserted independently. We aligned these two adjacent maize Helitrons and found that the sequences differed significantly and contained different genes or gene fragments. This indicates they are not tandem repeats, but arose by consecutive insertions.

A Putative Helitron Somatic Excision
We further cloned and sequenced the PCR products of Silico3 from lines A636, B73, McC, W22, W23, and bz-R. Fig. 9 presents the sequence alignment showing the insertion of the predicted Helitron Silico3 in A636 and B73.
There is no Helitron insertion in McC (C7053), W22, W23, or bz-R. The sequence results validate the HelitronFinder's prediction. It is interesting that, in addition to an occupied site, B73 also shows a weak Silico3 vacant-site-sized band (Fig. 6). Sequencing of this PCR product confirmed it to be an unoccupied site (Fig. 9). There are no sequence polymorphisms in the adjacent sequences to rule out the possibility that this band arose from DNA contamination in the B73 DNA preparation. Alternatively, however, this band may represent Helitron somatic excision products, which have been found at other polymorphic sites in maize (Y. Li and H.K. Dooner, unpublished data). This is a surprising result in light of the fact that Helitrons presumably transpose by a rolling circle transposition mechanism that does not generate empty sites.

Deletion of Helitron Flanking Regions
The PCR products of Silico1 (Fig. 5) from A636, B73, BSSS53, Mo17, W23, and 4Co63 were also cloned and sequenced. In addition to the BSSS53 inbred line from which Silico1 was predicted, we were able to amplify and sequence the 5' end of Silico1 from W23. The sequences of Silico 1 occupied and vacant sites are aligned in Fig. 10. Silico1 is present in W23 and BSSS63 and absent from B73, A636, 4Co63, and Mo17. The 3' flanking region in B73 is identical to that in BSSS53. However, the 3' end flanking regions of Silico1 in A636, 4Co63, and Mo17 are missing 38 nucleotides. The presence of the same deletion in three different lines points to a common origin of this chromosomal segment. Possibly, the deletion arose following the imprecise excision of Silico 1 from an occupied site in a common progenitor of these lines.

Discussion
Helitrons are novel transposons that have not been well characterized experimentally. Implementing our maize Helitron discovery algorithm, we found two adjacent Helitrons, which we arbitrarily named No.1 and No.2, in the r1 region of B73 (Figs. 7 and 8). Here, we propose two models for how these adjacent Helitron arose. One hypothesis is that these are tandem repeats, which arose The heuristic algorithm for searching 3' end of Helitrons by the Helitron's rolling circle mechanism of replication, as postulated by Pritham and Feschotte [9]. An alternative hypothesis is that one Helitron inserted next to an existing Helitron. The sequence data support the latter model.  We detected a putative Helitron excision product in the B73 inbred (Fig. 9), but could not rule out DNA contamination because of the absence of polymorphisms in the adjacent sequences. All four predicted Helitrons are present in some inbred lines and absent in others. This shows that Helitrons are active in the maize genome. We speculate that insertions and excisions of Helitrons can Two adjacent Helitrons detected in the r1 region of B73 (GenBank accession number AF466202) Figure 7 Two adjacent Helitrons detected in the r1 region of B73 (GenBank accession number AF466202). cause changes in the flanking regions, as the 38-bp deletion shown in Fig. 10.

Conclusion
We have identified 140 new Helitron candidates in maize with our computational tool HelitronFinder. Four out of five candidates were confirmed to be real by empirical methods, thus validating the predictions of our program. Additional points to emerge from our study are that Helitrons may not always insert at an AT dinucleotide in the host sequences, that they can insert immediately adjacent to an existing Helitron, and that Helitron movement may cause changes in the flanking region, such as deletions.

Heuristic Search Algorithm of HelitronFinder
The HelitronFinder program is written in PERL and uses its regular expression abilities to look for the specified patterns of Helitrons in maize genome. The update_blastdb.pl script provided by NCBI was modified to work with the HelitronFinder program to download the maize genome DNA sequences in fasta file format when requested. The HelitronFinder will search the input DNA sequences from both forward and reverse directions. For each direction, there are two main subroutines to search for the 5' and 3' ends, respectively.
The 5' end subroutine uses the consensus derived from Figure 1 as its search criterion. This is relative straightforward. However, the 3' end structure is more complex, requiring a search for 16-to 20-bp palindromes in the DNA sequences. More specifically, we look for palindromes containing the self-pairing CG and the double A in the middle of the HelA type Helitrons. Then, the subroutine will identify 3' CTRR termini within 20 bp downstream of the palindrome and output the sequences from the beginning of the palindrome to the 3' CTRR terminus, along with their coordinates. For each possible instance of a 5' end, the subroutine lists the closest 3' ends within 50,000 bases.
The HelitronFinder program has two levels of constraints for the searching criteria, high quality and medium. The 5' end criterion of the high quality constraint is: ( The 3' end criterion of the high quality constraint is: The double 'A' in bold is one of the characteristics of HelA type Helitron. The high quality searching criterion is mainly targeting this type of Helitrons.
For the medium searching criterion, we use less constraints than the high quality criterion. The 5' end consensus is as close to the high quality as possible. However, we pick the less conserved 3' end as below: This will be able to predict HelB type Helitrons as well.

Primer Design
PCR primer pairs were designed based on the 500 bp of sequences flanking each Helitron end.
Alignment of Silico 3 sequences indicating the insertion of the predicted Helitron Silico3 in A636 and B73 Figure 9 Alignment of Silico 3 sequences indicating the insertion of the predicted Helitron Silico3 in A636 and B73. There is no Helitron insertion in McC, W22, W23, or bz-R. It is interesting that, in addition to an occupied site, B73 also shows a weak Silico3 vacant-site-sized band in Fig. 4. Sequencing of this PCR product confirmed it to be an unoccupied site.
Location of PCR primers flanking and internal to adjacent Helitrons identified in sequence AF466202 Figure 8 Location of PCR primers flanking and internal to adjacent Helitrons identified in sequence AF466202. We designed four pairs of primers for these two Helitrons: F1/R1, F3/R3, F2/R4, and F4/R4. F and R represent forward and reverse primers, respectively. Cloning, Sequencing PCR products then were cloned into pGEM-T easy vector (Promega). Sequences were obtained through 3700 DNA Analyzer using Big Dye v3.1 terminal reaction (Applied Biosystem). Consensus sequences were used for analysis.

Availability and Requirements
The HelitronFinder program is available for public access at http://limei.montclair.edu/HT.html The detailed description and sample run are also provided at the website.
Alignment of Silico 1 sequences Figure 10 Alignment of Silico 1 sequences. Silico1 is present in W23 and BSSS63 and absent from B73, A636, 4Co63, and Mo17. The 3' flanking region in B73 is identical to that in BSSS53. However, the 3' end flanking regions of Silico1 in A636, 4Co63, and Mo17 are missing 38 nucleotides.
Publish with Bio Med Central and every scientist can read your work free of charge