Possible structures of HAmo SINE leading to successful proliferation
The analysis of HAmo SINE shows that it was very young and proliferated recently to estimated about 2 × 105 and 1.7 × 105 copy numbers in the haploid genome of silver carp and bighead carp, respectively. In fact, most of the SINE loci isolated in this work are species-specific or even not fixed among fish populations when we detected the presence or absence of SINE insertions using flanking primers (our group, unpublished data). So HAmo SINE are highly efficient and successfully proliferated recently in the genome and it maybe owe to its overall structure and internal structure as described below.
Firstly, HAmo SINE keep the overall secondary structure and conserved A and B box in the tRNA-related region, which ensures the RNA III recognition and transcriptional activity of SINEs. More importantly, the irregularity of the acceptor stem, as same as SmaI family, seems to help to escape recognition by tRNA-processing or RNA-modifying enzymes and therefore prevent the RNA from being cleaved by the 3'-endonuclease.
Secondly, HAmo SINE share the almost identical 3'tail with HAmo LINE2 in primary sequence and secondary structures, which keep them to well utilize the LINE2 enzymatic machinery. Their shared same stem-loop region is thought to function as a recognition site for the UnaL2 protein (UnaL2p) when this region is transcribed in the RNA .
Moreover, more than one repetition of the short tandem repeat TAAATG appeared in most copies, which are revealed to be necessary for successful retrotransposition by mutational analyses in the experiments on other LINEs of the L2 clade and the initiation of reverse transcription of UnaL2 RNA in UnaL2 [25, 28].
Thirdly, RNA structure of the HAmo SINE is obviously composed of three parts: tRNA-related region, a family-specific region and LINE2-related region (Figure 10), that correspond to there parts of secondary structure of its RNA: the cloverleaf structure (the 5'domain), an unstructured region, the extended stem-loop (the 3'domain). This characteristic domain composition is experimentally probed in salmon SmaI SINE RNA and seems to have guaranteed successful and continuous amplification of SINEs in eukaryotic genomes during evolution. From this view, HAmo SINE may reveal some internal structures of SINE that may lead to its successful retrotransposition and proliferation.
Three similar but independently derived SINE families
When using HAmo SINE as query to tBlastN search, it shows unexpected high similarity to other two SINE families in salmon: SmaI family (77%) and FokI family (71%), both are young SINE families and have a limited distribution in several specific species belonging to the family Salmonidae . The two SINE families shared a common tail and parasitized SalL2 in salmon genome . After detailed comparison of the consensus sequences of them and their respective partner LINE families (Figure 10)[48, 49], we found that HAmo SINE is similar in tRNA-related region (1–76 bp) and LINE-related region (107–150 bp) with SmaI family and FokI family. But the existence of a central region (76–107 bp) which showed no similarity with each other and are specific to each family make us deduce that they are probably independently generated and evolved in respective evolutionary lineage other than horizontal transfer.
As noted in the Introduction section, the template switch during TPRT was proposed to explain how SINE acquired the tail from corresponding LINE. In this process, a short cDNA would first be generated by copying the 3'terminal LINE RNA sequence, and then RT landing pad will jump to another RNA parent of the SINE-to-be carrying an internal pol III promoter . So the above-mentioned tRNALys derived SINE may be born through template switch between respective LINE and ancestor RNA of SINE-to-be containing tRNA-derived region and family-specific region in respective genome of three fishes. Coincidently, the three young families are all derived from tRNALys or structurally related to tRNALys (Figure 6). Moreover, their parental LINE (HAmoL2 and Sal L2) of the above mentioned three SINE families are homologous and share a common tail.
In fact, tRNALys is the most common source of SINEs [7, 24, 51]. The possible reason is that maybe the ancestor tRNALys SINE RNA had special selective advantage in the above generation process or been preferentially transcripted and retrotransposed after generation among the population of RNA of SINE -to be.
So this finding suggested that the three similar but distantly related young SINE families were generated independently and created by LINE families within the same lineage of a LINE phylogeny in the genomes of different hosts.
Some aspects about the new retroposons enrichment strategy
Magnetic Bead-based isolation system has been widely used for the separation of several specific targets like cells, proteins, microsatellites and so on. However, our work is the first report about application of this system for isolation of SINEs and LINEs from fish genomes by developing new special protocol. The results demonstrate that this protocol is technically straightforward and permits the isolation of a large number of SINE and LINE from unknown genome in less time consumption and less cost and effort than is required to execute traditional protocol involving rounds of filter hybridization.
In general, if all steps work, the procedure takes only about a week from tissue to several hundred positive clones. Additionally, the purchase of the reagents needed for building and screening one library by traditional protocol will supply sufficient reagents for ten or more libraries applied by enrichment protocol. Moreover, the protocol can be easily controlled and handled since it requires little specialized equipment platform or technical expertise, May be the PCR and cloning be the most difficult step.
Our method, relying on solution hybridization, could greatly facilitate and speed up the interaction between probe and target DNA and result in better hybridization efficiencies in comparison with fixed solid supports [52, 53]. Moreover, this method can be useful in the case of low copy number SINEs and LINEs since at last only a population of sequences enriched for specific retroposons is cloned. Generally the frequency of positive clones can reach 50–90% if conditions were optimized . So it shows great advantage when usually a great number of retroposon insertions need be isolated as temporal landmarks of evolution for estimations of phylogeny.
Most steps in the protocol presented here can be readily modified to suit different experimental backgrounds and knowledge about SINE and LINE and can easily combine with other protocols. Okada's group successfully isolated many SINE families from many organisms by using the in vitro transcript of total genomic DNA as the probes utilizing the properties that SINEs are redundant in the genome and transcribed by RNA polymerase III [55, 56]. While Kramerov's group prefered to use AB-PCR product containing a 30–40 bp sequence located between boxes A and B of SINE as a probe [37, 57]. All these specific probes including known SINE sequence (this paper) can be biotinylated to join into this enrichment strategy.
But it is noted that there are many principles that should be kept in mind. Firstly, correct restriction enzyme should be selected to generate appropriate size fragments evenly and its recognized sites should not exist in the targeted repeat elements. In our work, although we isolated SINEs and LINEs simultaneously at one isolation reaction, we only obtained the HaeIII-fragmented partial LINEs because of the existed HaeIII site in full-size LINEs. Secondly, the amplification cycles of step PCR enrichment and adapter PCR (see Methods) should be optimized to generate a smear of the PCR products without specific bands. In this case, 15 and 12 of cycles were done in the two steps respectively to keep the complexity of DNA molecules for preventing the generation of a lot of identical clones at last. Moreover, the selectivity and specificity can be adjusted by altering specific probe and the stringency conditions (temperature and salinity of washing buffer). Thirdly, there are several methods for labeling one biotin at one terminus of the DNA fragment such as PCR method with one of the two primers biotinylated (this paper), end-labeling using terminal transferase, ligation reaction with a biotinylated adaptor  or direct generated by company service. No matter which method to be used, it is important to label only one biotin molecule at one terminus of the DNA fragment, otherwise magnetic beads will crosslink and clot through DNA bridges, which may result in poor reaction kinetics between beads and target molecules. In addition, isolation of large size of DNA fragments may be limited by beads binding ability and cloning efficiency of large fragments into T-vector. However, our procedures mainly base on PCR and hence could be use to track the progress of the entire process from step to step by gel electrophoresis.