Transcribed Tc1-like transposons in salmonid fish

Background Mobile genetic elements comprise a substantial fraction of vertebrate genomes. These genes are considered to be deleterious, and in vertebrates they are usually inactive. High throughput sequencing of salmonid fish cDNA libraries has revealed a large number of transposons, which remain transcribed despite inactivation of translation. This article reports on the structure and potential role of these genes. Results A search of EST showed the ratio of transcribed transposons in salmonid fish (i.e., 0.5% of all unique cDNA sequences) to be 2.4–32 times greater than in other vertebrate species, and 68% of these genes belonged to the Tc1-family of DNA transposons. A phylogenetic analysis of reading frames indicate repeated transposition of distantly related genes into the fish genome over protracted intervals of evolutionary time. Several copies of two new DNA transposons were cloned. These copies showed relatively little divergence (11.4% and 1.9%). The latter gene was transcribed at a high level in rainbow trout tissues, and was present in genomes of many phylogenetically remote fish species. A comparison of synonymous and non-synonymous divergence revealed remnants of divergent evolution in the younger gene, while the older gene evolved in a neutral mode. From a 1.2 MB fragment of genomic DNA, the salmonid genome contains approximately 105 Tc1-like sequences, the major fraction of which is not transcribed. Our microarray studies showed that transcription of rainbow trout transposons is activated by external stimuli, such as toxicity, stress and bacterial antigens. The expression profiles of Tc1-like transposons gave a strong correlation (r2 = 0.63–0.88) with a group of genes implicated in defense response, signal transduction and regulation of transcription. Conclusion Salmonid genomes contain a large quantity of transcribed mobile genetic elements. Divergent or neutral evolution within genomes and lateral transmission can account for the diversity and sustained persistence of Tc1-like transposons in lower vertebrates. A small part of transposons remain transcribed and their transcription is enhanced by responses to acute conditions.


Background
A large fraction of repetitive sequences originate in eukaryotic genomes from mobile genetic elements (MGEs), which are grouped into 2 classes. Class I transposons require mRNA intermediates, whereas class II elements transpose directly as DNA. Tc1-like class II transposons, named after the founder gene in Caenorhabditis elegans, are probably the most widespread MGEs in nature, and are found in fungi, plant ciliates, nematodes, arthropods, fish, amphibians and mammals (reviewed in [1]). These genes contain a single reading frame that encodes for the enzyme transposase, which is flanked with terminal inverted repeating units. Transposition of class II MGEs is characterized by limited requirements for host cellular factors, which can account for their remarkable ability to undergo horizontal transfer across great taxonomic distances [2]. MGEs are regarded as parasitic genes, and proliferation is deleterious for the host. Therefore, transposition is commonly followed by inactivation. MGEs could play an important role in the evolution of teleost fish, and comprise a substantial fraction of their genome. Multiple copies of Tc1-like transposons were found in several fish species from different orders [3][4][5][6], however transcription of teleost Tc1-like genes has not been documented. Recent high-throughput sequencing of salmonid cDNA libraries has revealed surprisingly large number of transposon transcripts. Most if not all these sequences contain incapacitating mutations in the reading frames, and can be regarded as transcribed pseudogenes or null-alleles. At present, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar) TIGR Gene Indices [7] contain 50773 and 31341 unique cDNA sequences, respectively among which we found several hundreds MGE, Tc1-like genes being most abundant. This wealth of sequence information provides insight into the structure and evolution of transposons. We also cloned several copies of two rainbow trout Tc1-like genes with complete reading frames, which adds to understanding the transposon life cycle. Multiple gene expression analyses with high-density cDNA microarray indicate stimulation of rainbow trout transposons transcription in response to stress, toxicity and pathogens.

Results
In order to search for transcribed transposons in salmonid fish, we compared the unique cDNA sequences from TIGR gene indices with 262 metazoan transposon proteins retrieved from Swissprot. Blastx found matches in 273 rainbow trout and 163 Atlantic salmon sequences at a cutoff value e < 10 -20 (Table 1). The ratio of transposons to all cDNA sequences in salmonids was 2.35-31.6 times greater than in other vertebrate species with available gene indices, and a large fraction (68.3%) showed similarity to 11 proteins of the Tc1 family. Tc1-like transcripts were found in the gene indices of 4 other teleost fish species and in the African clawed frog Xenopus laevis, but not in higher vertebrates. To estimate an approximate number of Tc1-like genes, 6 genomic clones of Atlantic salmon were analyzed, covering 1.2 MB [Genbank:AC148723, Genbank:AC149099, Genbank:AC148779, Genbank:AC148618, Genbank:AC148617 and AC148616], and a blastx search found 56 matches at a cutoff value of 10 -20 . The size of the haploid Atlantic salmon genome is equal to 3 billion base pairs. Assuming a relatively homogenous distribution of Tc1-transposons, about 140,000 copies can be expected, which is 3 orders of magnitude greater than the number of Tc1-like sequences in the salmonid fish gene indices. It is necessary to note that TIGR contigs are produced by automatic assembly of EST sequences that have at least 95% homology in overlaps of minimum units of 40 base pairs [8]. Therefore, transcripts of recently diverged transposon copies could be merged unless they were flanked by differing 5'-and 3'-untranslated sequences. The numbers of transcribed transposons can be greater than the number estimated by searching across gene indices, but it is likely that only a minor fraction of salmonid Tc1-like genes is active.  Most Tc1-like sequences from the rainbow trout gene index contained incomplete reading frames. To analyse the structural relatedness of these genes, we used 38 fragments, which encode at least 170 amino acids at the C-termini. Thus, 31 sequences were from the TIGR database plus 7 more were produced in this study (i.e., newly identified genes named Glan and Barb [Genbank: AY880883-AY880888]). The maximum likelihood (ML) tree consisted of 3 single genes and 11 clades, containing 2 to 5 sequences ( Figure 1). Seven clades (I-VII) could be regarded as a part of the multi gene family, as sequence identity with the nearest neighbors was in the range of 35-73%; the remaining 4 clades were highly divergent. Only 1 of 6 clades containing more than 3 genes (X in Figure 1) was split into clusters supported by high bootstrap values. The highest sequence identity were observed for Glan and Barb. However, divergence within other clades could in theory be overestimated, due to forced assembly of similar transcripts.
Sequencing of complete reading frames for 3 copies of Glan and 4 copies of Barb allowed for the study of transposons molecular evolution within the rainbow trout genome. All 7 sequences include incapacitating mutations, which prevent translation of transposase. Barb copies have diverged up to 11.4 ± 1.4% (mean ± SD) and accumulation of deletions ( Figure 2) impeded reconstruction of the ancestral protein. Low divergence of Glan copies (1.9 ± 0.8%) suggest relatively recent transposition into the rainbow trout genome. The consensus sequence of 3 reading frames was identical to TIGR contig [TGI:TC46394], which encoded a protein with characteristic features of Tc1-like transposase, such as the presence of domains required for nuclear localization, DNA binding, cleavage and joining and DDE motif found in the catalytic units of diverse MGEs and retroviruses ( Figure 3). Noteworthy of mention is that all transcripts of Glan contained mutations that prevented translation of transposase, however the consensus contig sequence that was assembled from a large number of EST from different cDNA libraries appeared intact. Given that the rate of spontaneous mutations in vertebrate germ cell lines is 10 -5 [9], transposition of Glan could have taken place as recently as only a few thousand years ago. We also performed PCR screen of this gene in fish from inland reservoirs of Finland, where it was detected in 17 species from different orders (Table 2). Interestingly, three of the four species in which Glan was not found (grayling, whitefish and vendace) are more closely related to rainbow trout than most of those species carrying this gene. Low divergence of copies and discontinuous distribution are evidence for horizontal transmission. We analysed the rates of synonymous (Ks) and non-synonymous (Ka) substitutions in newly identified rainbow trout transposons using a sequence of the nearest Swissprot protein (hypothetical transposase of plaice, with 77% homology [Genbank:CAB51372]) as a reference (Table 3). With respect to this transposase, the Ks/Ka ratio was high and significantly greater in the younger gene (4.85 ± 0.30 in Glan and 3.35 ± 0.04 in Barb). A comparison of copies indicated a probability of divergent evolution in Glan (Ks/Ka = 0.69 ± 0.05). In Barb the rates of synonymous and nonsynonymous substitutions approached unity (Ks/Ka = 1.03 ± 0.12), which is consistent with the protracted accumulation of mutations in a solely neutral mode.
We did not find sequences of any other known proteins in the salmonid Tc1-like contigs and probably transposons are transcribed from own promoters. Evidence for regulation of transposon transcription rate was produced in microarray analyses. We used a platform designed for studies of responses to environmental stress, toxicity and pathogens in salmonid fish [10,11]. Overall this platform included more than 1300 genes, 7 of which were similar to Tc1-like transposons. Five transposons showed marked differential expression in response to external stimuli, such as handling stress, exposure to toxic compounds and injection of cortisol or bacterial antigens; the microarray results were confirmed with real-time qPCR. A consensus profile of transposons correlated with those of 27 protein coding genes in 35 microarray experiments (Pearson r 2 > 0.63); examples are presented in Figure 4. The highest correlation (r 2 > 0.8) was shown by classical markers of cellular stress, such as the aryl hydrocarbon receptor, MAP  Alignment of protein coding sequences of new transcribed rainbow trout TC1-like transposases cloned in this study Figure 2 Alignment of protein coding sequences of new transcribed rainbow trout TC1-like transposases cloned in this study.
kinase 13 and hypoxia inducible factor. We also searched for enrichment of Gene Ontology [12] categories in this list of genes. Significant over-representation was demonstrated by functional classes that are implicated in protective reactions to acute conditions (i.e., response to stress and oxidative stress, defense and humoral immune response, receptors and regulators of transcription, Table  4).

Discussion
Having a large number of transposons and a preponderance of Tc1-like genes is a characteristic feature of salmonid genomes [3][4][5]. Sequence analysis of the transcribed genes ( Figure 1) suggested repeated transpositions at protracted intervals. A wide distribution of Tc1 transposons is believed to account for the limited requirements in the host cellular factors. Sleeping Beauty, an artificially reconstructed salmon transposon [13] is capable of integration into genomes of a wide range of vertebrate species, however different efficiencies observed in various cell lines point to possible involvement of the recipient's proteins in transposition [14]. This is in line with a wide, though limited, distribution of homologs for the transcribed salmonid DNA transposons, which have not been found among EST of warm-blood vertebrates. The variety of salmonid Tc1-like genes is truly remarkable. Phylogenetic analyses of 38 sequences, encoding homologous fragments of C-termini, found 14 distinct types of Tc1-like genes and the real number of different genes is probably much greater. Our search was based on the similarity between proteins that were available from Swissprot, and many transposons could remain unidentified due to the lack of known homologs. Furthermore, the rapid decay of transposons could impede the discovery of ancient transposed genes.
Despite the wide spread occurrence of Tc1-like transposons in vertebrates, not a single active gene has been identified to date [14]. Inactivation of salmonid DNA transposons could take place within a relatively short period of time after transmission. Cloning of 2 transposons having a relatively low divergence rate indicates the rapid accumulation of incapacitating mutations, such as insertions or deletions, shifts of reading frames and premature stop codons ( Figure 2). Analysis of synonymous and non-synonymous substitutions suggest that inactivation of younger transposon could be preceded by selective divergence within a limited period of time, whilst evolution of the older gene appeared entirely neutral. Results from a study on recent transpositions in insects from four different orders suggest that selective constraints operate exclusively by horizontal gene transfer [15]. A comparison of rainbow trout genes with Tc1-like transposon from plaice confirm the conservation of functionally important domains in distantly related proteins, which is gradually obscured during the course of neutral evolution (K s /K a ratios in the younger Glan and older Barb genes are 4.85 ± 0.31 and 3.35 ± 0.04 respectively).
Silencing of transposons takes place at the transcriptional or post-transcriptional levels [16], and both of these mechanisms could act in salmonid fish. Based on fre- quency in a 1.2 MB gene fragment, we can assume that Tc1-like genes comprise nearly 5% of the Atlantic salmon genome and only a minor fraction preserved transcription after inactivation of translation. A survey of salmonid EST found untranslated transposons in both sense and antisense polarities, which is the main prerequisite for the formation of double-stranded RNA. RNA interference (RNAi) is implicated in the control of transposition in germ cell lines of the nematode C. elegans [17], and existence of an RNAi pathway in rainbow trout was recently demonstrated [18]. Suppression of intact transposases with mutant genes was also reported in insects, and this control mechanism is referred to as dominant-negative complementation [19].
Given efficient protection against transposition in animals, the tenacity and variety of transposons may seem surprising. Sustained persistence of transposons can, in theory, account for their residence in unknown reservoir species; e.g, the role of parasites as potential vectors of horizontal transfer across phylogenetically remote organisms has been hypothesized [20]. However this can hardly explain the remarkable diversity of these genes. The ML tree (Figure 1) suggests that at each transposition event, the rainbow trout genome was invaded with a new transposon, although several genes could have a common ancestor. If expression of translated genes is under control of RNAi, successful recurring transposition of identical or highly similar genes appears unlikely. Hence, the combination of neutral or divergent evolution within a genome with transfer across phylogenetic boundaries can be the most efficient strategy for the survival and diversification of transposons. PCR screen detected Glan in genomes of many fish species from phylogenetically remote taxonomic groups ( Table 2). Clades I-VII of the ML tree (Figure 1) can correspond to genes that evolved independently. However it is also possible that descendants of a founder gene has returned several times into the rainbow trout genome, after passage through a chain of co-evolutionary hosts.
Results of our microarray studies suggested that a large fraction of transcribed Tc1-genes can be stimulated under acute conditions, but it remains unclear whether or not the transposon transcripts have any functional importance. In theory, they can be transcribed from cryptic promoters, which are activated by the remodelling of chromatin. However, input from stress-responsive promoters is also plausible. Transcripts can be required for the control of transposition through RNAi, however such explanation appears unlikely for highly mutated genes that were probably silenced long ago in evolutionary time. Currently, there is a growing body of evidence to support the involvement of non-coding RNA into the regulation of gene expression at different levels. The role of small and large RNA in modification of the chromatin structure was reviewed recently [21][22][23]. Stress-induced transcription of short interspersed repeated sequences (SINE) was reported in human, mouse and silkworm [24][25][26][27]; and SINE transcripts were shown to enhance translation of reporter genes [28,29]. Stress also activates the transcription of satellite III repeat [30]. Because this large non-coding RNA is consistently associated with chromatin, it can be required for the protection of sensitive regions from stress-induced damage. Synthetic doublestranded RNA enhances the expression of anti-viral proteins in salmonid fish [31,32] and, in theory, endogenous dsRNA can mimic a viral infection by launching protective reactions.  [33]), and involvement of dispersed repeated sequences into the co-ordination of gene expression with similar functions was hypothesized more than three decades ago [34]. The role of transposon transcripts in the regulation of gene expression was recently discovered in yeast [35], where the induction of an RNAidependent silent chromatin configuration resulted in reduced transcription of several meiotic genes. A possible involvement of transposon transcripts in the regulation of gene expression in salmonid fish remains to be studied.

Conclusion
Information produced by the sequencing of salmonid fish cDNA libraries and identification of recently transmitted transposons provide new insights into the structure, diversity and molecular evolution and life cycle of mobile genetic elements. High expression levels in rainbow trout tissues and marked responses to external stimuli indicate potential functional roles of transposon pseudogenes, which requires further investigation. These genes can be used as sensitive molecular biomarkers of acute conditions in salmonid fish.

Sequence analyses
The expressed transposons were analysed in rainbow trout and Atlantic salmon TIGR Gene indices, and sequence comparison was conduced with stand-alone blast [36]. Multiple sequence alignments were performed with Clus-talW [37] and the conserved protein domains were searched in Interpro [38]. Synonymous and non-synonymous substitutions in newly cloned genes were determined by Dnasp [39]. Maximum Likelihood (ML) phylogenetic analyses were performed with Phylip [40].

PCR cloning
The conserved sequence in the untranslated regions of rainbow trout Tc1-like transposases were inferred from EST sequences. RNA was extracted from rainbow trout brain and treated with Rnase-free Dnase (Promega). Reverse transcription with SuperScriptIII (Invitrogen) was primed with oligo(dT). PCR was performed with primer 5'-ATACAGTGCCTTGCGAGAGTATTC-3' using a Triple-Master kit (Eppendorf), and the product was cloned into pcDNA3.1/V5-His-TOPO (Invitrogen). Seven of nine sequenced clones contained complete reading frames.

PCR analyses of genomic DNA
The fish samples were collected from inland reservoirs in Finland, and DNA from fin clips was prepared with salt extraction [41]. In brief, fin samples were digested at 60°C in 440 µl of buffer (1.8 mM EDTA, 9 mM Tris-HCl, pH 8; 1.8% SDS) containing 160 µg of proteinase K. After addition of 300 µl of 6 M NaCl, lysates were centrifuged at 12,000 g for 30 min. DNA in the supernatants was precipitated with isopropanol, washed with 70% aqueous ethanol and dissolved in water. The 654-base fragments of Glan PCR were amplified using the Hot Master Taq kit (Eppendorf). Primers (5'-TGAAGAATCGACAACAAGT-GGGACA-3' and 5'-GCTTTCTTCTTGCCACTCTTCCATA-3') were annealed to templates at 68°C.

Microarray analyses
Fish experiments, design of the rainbow trout cDNA microarray, hybridization protocol and data analyses are described in detail elsewhere [10,11]. In brief, the platform included 1,300 genes printed in 6 replicates each. The dye swap design was used; each sample containing RNA from 4 individuals was hybridized to slides with reverse assignment of fluorescent dyes (Cy3-and Cy5-dCTP from Amersham Pharmacia). Labels were incorporated at the stage of cDNA synthesis. The measurements in spots were filtered by criteria I/B ≥ 3 and (I-B)/(S I +S B ) ≥ 0.6, where I and B were the mean signal and background intensities, respectively, and S I, S B were the standard deviations. Lowess normalization was performed and differential expression was analysed with the Student's t-test (p < 0.01). The genes were ranked by the log(p-level).