We originally set out to gain a better understanding of the nature and extent of genetic diversity in phytopathogenic Verticillium spp. and, to this end identified mobile elements in the sequenced V. dahliae and V. albo-atrum genomes, and explored their distribution in other strains of different origins. Our genome-wide search yielded complete retroelements and “cut-and-paste” DNA transposons, whose structure we characterized in detail. Among the VdLs.17 LTR sequences that we identified, the Ty1/Copia-like VdLTRE5 had the GAG and POL ORFs in an organization that is very rare , and uses a leaky stop codon for translation of the POL ORF that is downstream of the GAG ORF. VdLTRE5 did not have, downstream of the GAG stop codon, the conserved CARYYA sequence, which has been previously shown to be important for this stop codon read-through . However, the Ty1/Copia-like element Tca2 of Candida albicans also does not possess this sequence and it has been proposed that the sequences responsible for stop codon read-through of Tca2 can be multiple, remote, and scattered throughout the element . The same may therefore be true for VdLTRE5.
The VdLs.17 retrotransposons VdLTRE 2, 3, and 4 were identifiable as Gypsy elements in their having GAG and POL genes with the POL ORFs in a −1 frameshift orientation relative to that of the GAG ORF . A notable feature differentiating these three TE families, which are predicted to encode almost identical polyproteins, was the difference in their LTR lengths. LTR elements contain critical cis-acting signals that define the element borders and act as transcriptional promoters , and we indeed found up to 5-fold differences in the number of the ESTs corresponding to each type of element.
Among the “cut-and-paste” TEs in VdLs.17, the Tc1/mariner superfamily, which derives its name from the founder transposons Tc1 of Caenorhabditis elegans and mariner of Drosophila mauritiana, was predominant. All 29 full-length DAHLIAE elements were approx. 2 Kb in size and comprised single intronless DDE_1-encoding transposases flanked by short (41–102 bp) TIRs. The DDE_1 transposases act generally as dimers or oligomers and harbor functional domains mediating protein-protein interaction, DNA-binding, -cleavage and -joining activities . We predicted the presence of two types of N-terminal DNA-binding domains, HTH_psq and HTH_Tnp_Tc5, which are involved in the recognition and interaction with the TIRs . The DDE_1 domain, first identified in bacterial transposases and retroviral integrases, is characterized by conservation of three aspartate (D) residues or two, non-contiguous (D) and one glutamate (E) residue, a catalytic triad that forms a pocket able to bind two divalent metal ions, mostly likely Mg2+, that are necessary for transposition . The DDE signature has been detected in all eukaryotic “cut-and-paste” transposase superfamilies, indicating their common evolutionary origin . In the DAHLIAE transposases, the three aspartate residues are separated by 110–112 and 35 amino acids, respectively (D110-112D35D).
In vitro and in vivo trans-kingdom assays have previously demonstrated that, while host factors are not needed for transposition, intact transposase and TIRs are required for the initiation and completion of the process [53–56]. Tc1/mariner family transposon termini comprise at least three types of functional sequences involved in transposition: the 4–7 nucleotide TE cleavage sites at the outer extremities of the TIRs, the DRs within the TIRs, which are the transposase binding sites, and the UTRs, between the TIRs and ORFs, which act as enhancers of transposition efficiency [49, 57–59]. Transposons of different families generally differ in their terminus structure and length, as well as in the transposase domain architecture. Each terminus/transposase combination appears to mediate a slightly different version of the “cut-and-paste” mobilization mechanism, ensuring transposition specificity . In VdLs.17, while DAHLIAE3 starts with the unique cleavage motif CCCG and does not possess recognizable repeated sequences in the TIRs like those of Tc1/mariner TEs of other ascomycete fungi , DAHLIAE 1 and 2 start with the sequence ACGT-, and their TIRs contain two or three DRs. In particular, DAHLIAE2 carries three internal repetitions of a 17 bp-sequence at the 5’ terminus, and two at the 3’ terminus. The DAHLIAE ORFs do not overlap with the TIR sequences and are flanked by asymmetric UTRs that vary in length from 33 to 125 bp, according to the TE family type. The VdLs.17 Activator and Mutator elements are highly degenerate (VdHATs) or with limited sequence similarities (VdMULEs). Most of the elements appeared to be of the non-autonomous type due to mutations that disrupted TIR sequences and/or resulted in incomplete transposases. Although these elements are probably unable to transpose and are fossils, we were able to identify corresponding ESTs for some of them. These sequences may therefore still play the important role of repressing transposition of the complete elements of the same family through transposase dilution or through a negative dominant repression by the truncated transposases.
Domestication is the process by which TE functional domains are incorporated into functional host proteins . In VdLs.17 we found an insertion of a fragment of a Mutator element within a nitrilase gene. The fused sequence is predicted to generate a protein carrying in-frame nitrilase and MULE domains. Although we did not find corresponding ESTs, we cannot rule out the possibility that this new protein is functional.
The EST data and expression analysis under heat stress further showed that several of the full length Class I and II TEs, which are predicted to code for complete transposition-mediating enzymes, are still transcriptionally active and differentially responsive to different stimuli.
In general, our phylogenetic analysis of both Class I and II TEs mirrored the relationships among the fungal species that were defined by the fungal genome initiative on the basis of whole genome comparative studies. The evolutionary ancestors of the Copia, DAHLIAE, VdHAT and VdMULE TEs apparently evolved into different groups before insertion into the VdLs.17 genome. While high bootstrap values supported the monophyletic evolution of the V. dahliae Tc1/mariner and Mule elements, the Copia VdLTREs 1 and 5 and the VdHATs fell into distinct lineages. The three Gypsy families diversified after introgression into VdLs17 genome. Also, the Tc1/mariner family DAHLIAE1 underwent a recent expansion in VdLs.17, generating five VdLs.17-specific subfamilies. These subfamilies differed in sequence and length of their termini, and of their ORF sequences. DAHLIAE1 a, b and d all putatively coded for intact transposases and were present in multiple, almost identical copies, comprising 74% of the total DNA TEs in VdLs.17. In fungi, the selective amplification of transposon subfamilies, such as we have detected for the DAHLIAE1 elements, has been associated to events of horizontal gene transfer , however no definitive mechanism has yet been proven.
It has been proposed that TEs may enhance recombination to cause genetic variation, giving populations the flexibility to adapt, a phenomenon which could be especially important for species that do not have a sexual phase . TE clustering such as we observed in the VdLs.17 LS regions has been found in the genomes of other phytophathogenic fungi. In M. oryzae, for example, both Class I and II transposable elements are clustered within three regions of chromosome seven, characterized by a high rate of duplication and evolution . Similarly, TEs were also found to cluster in regions undergoing rapid reorganization in the genome of the plant vascular pathogen F. oxysporum. It has been hypothesized that these types of clusters are important for the generation and evolution of new genes , and it has been proposed that the transposon-rich LS regions in V. dahliae may impart a degree of genetic flexibility in the species, and allow rapid adaptation to new host niches . The mechanism(s) by which TE clusters are generated in fungal genomes has not been elucidated yet. While the clustering could simply be the passive result of selection against TE incorporation into gene-rich zones of the genome, it alternatively could result from an active process related to TE function. Many TEs do selectively integrate into specific sequences [62, 63], which could lead to biased TE distribution. Such selectivity has, for example, been observed for S. cerevisiae, in which Ty3 LTR elements are most often found to be integrated into upstream regions of genes transcribed by RNA polymerase III . The parallel, non-random clustering of the Class I and II V. dahliae elements suggests that there may be synergistic interactions among different types of elements, and selective pressure in V. dahliae for TE clustering, which in turn may be important for generating the genomic diversity necessary for niche adaptation and host range expansion. Interestingly, among the 354 predicted genes encoded within the LS regions, there were no “housekeeping” genes . Rather, the predicted genes encoded proteins of potential importance in pathogenicity, including bZIP transcription factors, ferric reductases, phospholipases, and other genes predicted to play a role in response to stress . Moreover, clusters or pairs of genes were clearly duplicated in these LS regions ( and this study). It is unclear if such duplications are the direct result of TE activity, and more studies are required to investigate the role(s) of these putative pathogenicity factors. However, the presence of solo-LTR sequences does suggest that recombination between the repeated sequences of the LTR elements could be a contributing factor in the reorganization of Verticillium dahliae genome.
Since TEs can have a large effect on the genomes of their hosts, causing gene deletion and duplication as well as chromosomal rearrangements, host fitness could be adversely affected if TE-induced transposition and recombination events disrupted or altered function of essential genes. However, some filamentous fungi have a unique tool, known as repeat-induced point mutation (RIP), to deal with repetitive sequences such as TEs. RIP, a process first described in Neurospora crassa, was found to occur during the sexual cycle, and to subject duplicated sequences of >400 base pairs to irreversible CG to TA transition point mutations. Although RIP is known to only occur during the fungal sexual cycle, there are several examples in asexual fungi where TEs with RIP-like mutations have been identified [23, 65, 66]. The identification of RIP-like mutations in some VdLTRE2/3/4 sequences  indicated that, at some point in Verticillium spp. evolution, a RIP-like process operated to protect the genome from infiltration by TEs. The presence in V. dahliae and V. albo-atrum of identifiable RID-like protein orthologs, which are known to be a part of the RIP machinery in N. crassa, suggests that these fungi may still possess the capacity for RIP. Interestingly, in VdLs.17, RIP mutations affected members of the Gypsy but not Copia superfamilies of retrotransposons, indicating either a differential susceptibility to RIP by different types of TEs or introgression by horizontal transfer of already RIPed sequences .
The TE content of different organisms is variable, sometimes accounting for as much as 60–80% of the genome, as in the case of cereals. More recently, mobile elements have been found to account for about half of the genome size also in the phytopathogenic oomycete Phytophthora infestans and the truffle fungus Tuber melanosporum. Noticeably, a high degree of variability in the number and types of transposons has been documented even in closely related species. For instance, the TE content of the fungal plant pathogens F. oxysporum and F. graminearum differs by a two-fold order of magnitude, 4% and 0.03%, respectively . The observed TE spatial-temporal fluctuations appear to be TE-family dependent and governed by multiple mechanisms including elimination by ectopic recombination, extinction by genetic drift, reintroduction by horizontal transfer, and environmental stress-driven expansion (reviewed by Hua-Van et al. ). Despite the high level of identity (97%) and synteny identified between the genomes of the recently sequenced Vd and Va isolates, the V. albo-atrum genome assembly was distinct in containing far less repetitive DNA than did that of V. dahliae. In particular, VaMs.102 lacked the highly repetitive LS regions present in VdLs.17, and contained neither full-length nor defective VdLTRE5 Copia or “cut-and-paste” DNA elements. This absence seems simply to reflect the isolate of V. albo-atrum sequenced rather than a general feature of the species. In fact, 66% of the other Va isolates we surveyed for the presence of Vd.Ls17 TE-like sequences were positive for the Tc1/mariner DAHLIAE2.
Extensive studies conducted on natural populations of other organisms such as Drosophila and different plant species including Arabidopsis, barley, maize, rice and wheat have clearly demonstrated that transposon dynamics plays a central role also in generating intraspecific variability [69–73]. Our findings have shown that the Copia-like VdLTRE1 and the Gypsy-like retrotransposons are almost ubiquitous in V. dahliae, whereas the Copia-like VdLTRE5 and most of the “cut-and-paste” DNA TEs appear to have a much more limited distribution. Although lack of detection of VdLs.17-like TEs could indeed reflect total absence of mobile elements, those Verticillium genomes without such elements may either contain related sequences that have diverged to a degree that prevented detection under the conditions used in this study (see Materials and Methods), or may simply harbor their own distinct arrays of transposable elements.