Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes

Background Bacterial repetitive extragenic palindromes (REPs) compose a distinct group of genomic repeats. They usually occur in high abundance (>100 copies/genome) and are often arranged in composite repetitive structures - bacterial interspersed mosaic elements (BIMEs). In BIMEs, regularly spaced REPs are present in alternating orientations. BIMEs and REPs have been shown to serve as binding sites for several proteins and suggested to play role in chromosome organization and transcription termination. Their origins are, at present, unknown. Results In this report, we describe a novel class of putative transposases related to IS200/IS605 transposase family and we demonstrate that they are obligately associated with bacterial REPs. Open reading frames coding for these REP-associated tyrosine transposases (RAYTs) are always flanked by two REPs in inverted orientation and thus constitute a unit reminiscent of typical transposable elements. Besides conserved residues involved in catalysis of DNA cleavage, RAYTs carry characteristic structural motifs that are absent in typical IS200/IS605 transposases. DNA sequences flanking rayt genes are in one third of examined cases arranged in modular BIMEs. RAYTs and their flanking REPs apparently coevolve with each other. The rayt genes themselves are subject to rapid evolution, substantially exceeding the substitution rate of neighboring genes. Strong correlation was found between the presence of a particular rayt in a genome and the abundance of its cognate REPs. Conclusions In light of our findings, we propose that RAYTs are responsible for establishment of REPs and BIMEs in bacterial genomes, as well as for their exceptional dynamics and species-specifity. Conversely, we suggest that BIMEs are in fact a special type of nonautonomous transposable elements, mobilizable by RAYTs.


Background
Transposable elements (TEs), or transposons, are a large group of mobile genetic elements with ability to actively transfer themselves into new locations in their host´s DNA. This process, called transposition, is catalyzed by transposases, coded for by TEs themselves. Insertion sequences (ISs) present the simplest examples of TEs.
The IS200/IS605 family of transposable elements was first described in genus Salmonella [1] and further in many other bacterial and archaeal genomes [2]. Contrary to the majority of TEs that transpose using transposases whose active site is composed of a triad of acidic residues (DDE transposases), known members of the IS200/IS605 family lack terminal inverted repeats and do not generate larger target site duplications upon transposition [3]. Crystal structures of two IS200/IS605 transposases have been solved (PDB IDs: 2a6o and 2f4f) [4,5]. Their fold is remarkably similar to proteins involved in rolling circle (RC) replication -conjugative plasmid relaxases and viral Rep proteins [4,5]. This similarity is further supported by shared mechanism of DNA cleavage: transesterification reaction takes place between DNA strand and conserved tyrosine residue, resulting in covalent protein-DNA intermediate. A histidine-hydrophobic-histidine motif and a divalent metal (magnesium) cation are another mandatory components of properly assembled active site, aiding the nucleophilic attack of catalytic tyrosine [6,7]. Next trait common for both IS200/IS605 transposases and RC enzymes is that the cleavage of DNA depends on the recognition of stemloop structures, present at either the origin of RC replication or IS termini [6,7]. IS200/IS605 transposases are the smallest transposases known, with average length below 150 amino acids. To encompass size limitation, they work as a homodimer with two hybrid active sites, each composed of tyrosine from first unit and the histidine-hydrophobic-histidine motif from second unit [4,5].
As determination of eukaryotic genomic sequences progressed in the last two decades, it has become obvious that their genetic information is littered with highly repetitive, "junk" DNA. More detailed analyses of these repetitive elements revealed that many of them are actually special cases of TEs. They generally retain conserved terminal sequences (for example inverted repeats) of their corresponding full-length transposons, which are important for transposition initiation, but lack completely or partially the transposase gene. Therefore, transposase encoded by "parental" full-length transposons needs to be supplied in trans. These repetitive elements are thus called nonautonomous TEs. Three groups of nonautonomous TEs account for substantial fractions of eukaryotic genomes. The first group is represented by short interspersed nuclear elements (Alu-like), derived from non-LTR-retrotransposons [8]. Helitrons, the second type of nonautonomous TEs, are thought to be mobilized by Y-2 type transposases, that are homologous to RC replication relaxases [9]. The last type, miniature inverted repeat transposable elements (MITEs), is present in both eukaryotes and prokaryotes. Most studied MITEs are related to two homologous insertion sequence families, IS630 (prokaryotic) and Tc-Mariner (eukaryotic) [10], both employing DDE catalytic mechanism. IS630-derived MITEs in prokaryotic genomes include Correia elements in Neisseria species [11] and RUP elements in Streptococcus pneumoniae [12]. Besides these, MITEs related to other IS families have been identified in prokaryotes [2].
Repetitive extragenic palindromic sequences (REPs) were originally identified in enteric bacteria [13] and later in several other bacterial taxa [14][15][16] as a class of abundant repeats with characteristic architecture. REP elements contain imperfect palindrome in their sequence. The majority of REPs are arranged in repeats of higher order, bacterial interspersed mosaic elements (BIMEs) [17]. In BIME-1, two oppositely orientated REPs are located close to each other. The inter-REP sequence interacts with integration host factor (IHF) [18]. BIME-2 and atypical BIMEs are composed of several tandemly repeated BIME-1-like units [19] and have been shown to strongly bind DNA gyrase [20]. REPs themselves interact with DNA polymerase I [21] and facilitate Rho-dependent transcription termination [22].
Our present results describe an intimate relationship between REP and BIME elements and one apparently monophyletic group of IS200/IS605 transposases. Because of striking similarities to known nonautonomous TEs, we propose that BIMEs are in fact nonautonomous TEs and that IS200/IS605 transposases are responsible for their mobilization.

Case study -genus Stenotrophomonas
We have studied mechanisms of high-level tetracycline resistance in bacteria from agricultural soil treated with manure from tetracycline-fed animals. Among tetracycline-resistant isolates, identified as Stenotrophomonas maltophilia, Variovorax paradoxus and Chryseobacterium balustinum, horizontal gene transfer from S. maltophilia to other two species was detected. The transferred nucleotide sequence was 90% identical to a histidine kinase/response regulator/sodium-symporter family gene, present in both sequenced S. maltophilia strains. We investigated the region surrounding this gene in sequenced stenotrophomonads for the presence of genes known to be involved in horizontal transfer of genetic information. A putative transposase of the IS200/IS605 family was found one gene away from histidine kinase in S. maltophilia R551-3. Analysis of sequences flanking the transposase gene revealed inverted repeats containing an imperfect palindrome. More sequences identical to these inverted repeats were observed scattered in several instances between neighboring genes ( Figure 1A).
We performed a BLAST search that revealed five apparent homologs of this transposase in genomes of sequenced stenotrophomonads. Their genes were all found to be delimited by inverted repeats of the same architecture ( Figure 1B). The 5-GT(A/G)G "head" is immediately followed by perfectly complementary, GCrich palindrome, interrupted by 2-4 bases in its middle ( Table 1, bottom). Due to the presence of multiple copies of these repeated sequences in the proximity of the transposase gene (see above), we scanned whole Stenotrophomonas genomes for additional copies of repeats flanking each particular transposase homolog. The number of hits ranged from 37 up to 427 perfect copies of given repeat per genome (Table 1, bottom). Because of their palindromic nature and abundance, features they share with published REP sequences, they will be called REPs and their cognate transposases will be called REPassociated tyrosine transposases (RAYTs).
We noticed that some of the REPs identified were arranged in clusters. Ten clusters composed of REPs were then analysed in detail ( Figure 2). The core (basic module) of each of these compound structures consists at least of two inverted REPs, separated by two intervening segments. Several of these basic modules are connected to each other in a head-to-tail fashion. The inter-REP segments do not show any homology with each other and vary substantially in length, suggesting that these clusters arose repeatedly and independently. Because of their exceptional structural similarities with published BIMEs, they will be called BIMEs.
Stenotrophomonas BIMEs show several interesting aspects. Some of them are hybrid and contain REPs from two different RAYTs. Moreover, slightly modified REPs occur in BIMEs, differing only in a few nucleotide positions. Still, in all cases, the palindromic features of REPs are preserved, suggesting selection for complementary mutations. Intriguingly, one rayt gene (Smal4) is directly associated with a BIME, its downstream REP being one of the BIME-constituting REPs.
Since all six rayt genes are flanked by two inverted REPs, this type of organization is likely to be subject to evolutionary preservation. To estimate evolutionary relationship between these elements, phylogenetic trees were constructed from RAYT amino acid sequences and REP nucleotide sequences, respectively. Both phylograms display the same topology (Figure 3), suggesting that RAYTs coevolve with their cognate REPs and that their typical organization is ancestral.

RAYTs in other bacteria
We wondered if similar RAYTs, REPs and BIMEs also occur together in other bacterial taxa. Using Smal1 RAYT sequence as query, exhaustive BLAST search was performed to identify RAYT homologs in other prokaryotic organisms. Retrieved homologs, all of which contained the "Pfam01797: Transposase_17" domain (peculiar to IS200/IS605 transposases), were tested for the presence of palindrome-containing inverted repeats flanking their genes. Subsequently, the number of these putative REPs in host genomes was determined. Only RAYTs associated with abundant REPs were further analysed. Detected RAYTs are listed in Table 1. RAYT homologs suiting our criteria were only found in gammaproteobacteria.
All detected REPs consist of GT(A/G)G head and GCrich imperfect palindrome with potential to form stemloop structures in single-stranded state (Table 1). Importantly, in all cases when REP sequences were determined in bacterial species taken into our analysis prior to this work, REPs identified by our approach are in agreement with these sequences. This concerns Escherichia coli [19], Salmonella sp. [23], Pseudomonas putida Pput2 [16] and Stenotrophomonas maltophilia Smal4 [24] REPs. For example, E. coli RAYT-coding gene (yafM) is delimited by two different REPs (Table  1). These are in fact Y and Z2 palindromic units, constituents of modular BIMEs (BIME-2 and atypical BIMEs) [25]. E. coli rayt itself is flanked by BIME-2 on both sides. Similar direct association with BIME was observed in total for one third of detected RAYTs (Table 1) in various species.
Further, we examined distribution of identified REPs in host genomes. Analysis revealed that most REPs are arranged in clusters (Additional file 1). In some cases

Evolution of RAYTs and REPs
Since REPs share several common structural features, they are likely to represent a group of related elements. We wondered if the same is true for RAYTs. Because RAYTs were detected due to similarity of their protein sequences (see above), they are thought to be structurally related. To specify this relationship, an alignment of selected RAYTs together with reference set of "typical" IS200/IS605 transposases was constructed ( Figure  4). The alignment reveals that all catalytically confirmed residues -histidine-hydrophobic-histidine motif and nucleophilic tyrosine -are conserved in both groups. It is thus reasonable to conclude that RAYTs are capable of cleaving DNA with formation of DNA-RAYT covalent intermediate. On the contrary, several motifs and conserved residues are peculiar only to RAYTs. This is in particular true for 100% conserved threonine near Nterminus and the NP(L/V)(R/K)xG motif that is located close to C-terminus adjacently to nucleophilic tyrosine. The presence of these unique structural features could signify that RAYTs are monophyletic group of proteins.
The question therefore arises as to whether the entire RAYT clade has been evolving with their corresponding REPs, as seen in Stenotrophomonas (Figure 3). Due to rather high divergence of REPs, it is not possible to construct their accurate phylogram. However, REPs show group-specific features that correlate well with phylogenetic grouping of their cognate RAYTs. For example, enterobacterial RAYTs are clearly monophyletic (Additional file 2) and accordingly, their REPs are rather long, substantially dimorphic and their palindrome is interrupted twice (Table 1). Furthermore, uniquely for REPs of monophyletic Pseudomonas and Xanthomonas RAYTs (Additional file 2), 5´-GA-3´dinucleotide is inserted between their GT(A/G)G head and palindromeforming part (Table 1). Together, these observations support long-term coevolution of RAYTs and and their cognate REPs.
Next, we examined chromosomal localization of rayt genes. Among RAYTs listed in Table 1, three couples of orthologous rayt genes (Pput1 and Pput2, Pput3 and Pput4, Smal3 and S_sp2), located in the same genomic context in different host species or strains, were identified ( Figure 5). These orthologs have, due to the shared synteny, unambiguously evolved from a common ancestor and allow us to trace back changes they have gone through following divergence event. Although orthologous rayt genes do not change their genomic position, their flanking REPs differ in up to three point mutations (Table 1) and still retain palindromicity and inverted repeat arrangement. Evidently, strong selective pressure works for preservation of these REP traits, underlining their functional importance. It is extremely improbable that repeated changes in REP sequences flanking these orthologs result merely from random fixation of successive random mutations.
Comparison of sequence identity between orthologous rayt genes revealed an interesting phenomenon. In all three cases, the degree of identity of the RAYT amino acid sequences was significantly less than that of the flanking genes ( Figure 5), suggesting that RAYTs evolve more faster than protein products of common genes. Possible explanation for this accelerated evolution is included in the Discussion section.

Relationship of RAYTs, REPs and BIMEs
We have shown so far that RAYTs and REPs are evolutionarily and physically connected. Since REPs are known to be species (or strain)-specific and the same applies to RAYTs (Table 1), it is possible that the presence of a particular RAYT itself in one bacterium might be responsible for proliferation of corresponding REPs.
Where genome sequences suitable for comparison were available, strains differing by the presence or absence of a particular rayt gene were tested for prevalence of REPs in their genomes. In most cases, a strong correlation between rayt presence and total number of its cognate REPs was found (Table 2), rayt-bearing strains containing on average ten times more REPs in their genomes than strains devoid of rayt genes. These results indeed suggest that presence of a given RAYT is the direct cause of REP sequences proliferation over host chromosome.
In search of support for this hypothesis, we found that in three marine gammaproteobacteria and one betaproteobacterium (all possessing clear RAYT homologs), the distribution of inverted palindromic repeats flanking their rayt genes is not genome-wide (as in other REP cases). Instead, REPs are accumulated proximally to particular rayt gene (Additional file 3). The REP-containing regions span at most two hundreds of kilobases. In the case of the marine gammaproteobacteria, the physical association between rayt genes and REPs is very pronounced. Thauera sp. (a betaproteobacterium) is of special interest because it has obviously acquired its RAYT by horizontal transfer from gammaproteobacteria. This RAYT displays highest sequence similarity to Pseudomonas RAYTs (56% identical residues), has no counterpart in other betaproteobacteria and its REP sequences are also Pseudomonas-like (Table 1, Additional file 3). High numbers of REPs are present in the Thauera genome. More than a third are located proximally to rayt gene. This suggests that, following acquisition of the rayt gene, new REP copies have been preferentially produced in its vicinity.
Physical association with rayt genes was already shown for BIMEs (Table 1). Upon closer examination, we detected four cases where 3´end of rayt gene, together with sequence between rayt stop codon and downstream REP, is integrated into BIME, becoming a part of BIME´s inter-REP segment (Additional file 4). This unexpected observation proves that the mechanism responsible for establishment of BIMEs is also directed to rayt genes.

Discussion
We have characterized a novel class of transposases, closely related to IS200/IS605 family. What makes these transposases (RAYTs) unique is the obligate delimitation of their genes by two inverted palindromic sequences (REPs), which are at the same time highly overrepresented in host genomes. We have shown that this type of organization (REP-rayt-REP, Figure 1B) has been preserved during evolution and that both RAYTs and REPs undergo long-term coevolution. Characteristic structural elements in both RAYT and REP sequences suggest that all detected RAYTs and REPs are descendants of a common ancestor. We propose that their origin dates to the period after branching of the gammaproteobacteria, since no homologs have been found in other major bacterial lineages.
The structure of a rayt gene flanked by two oppositely orientated REPs is strikingly reminiscent of the organization of a typical bacterial insertion sequence. The position of REPs as terminal sequences for RAYTencoding genes is supported by the fact that they are in many cases located very close or even immediately downstream of the rayt gene stop codon (Additional file 4), excluding additional terminal sequences. There are other known transposase genes associated with REPs, however, all of them are contained in bona fide ISs, complete with their own terminal sequences [27][28][29][30][31][32][33][34]. These ISs use REPs as targets for their transposition.
We have not found typical signs of IS-like mobility for RAYTs, i.e. presence of their multiple copies in host genomes and changes of chromosomal location. This might indicate that RAYTs have lost the ability to transpose their own genes. Still, there are at least two reasons to assume that RAYTs recognize REPs and cleave DNA strand in their proximity. By mere analogy, transposases always bind and cleave sequences that flank their genes during the course of transposition. This precise positioning of REPs by rayt genes is conserved. Moreover, related IS200/IS605 transposases recognize stem-loop structures [4,5] that can readily arise from imperfect palindromes like those contained in REP sequences.
One of the most interesting outcomes of this study is the previously unrecognized wide distribution of BIME elements. BIMEs were detected in most of RAYT-and REP-carrying species (Figure 2 and data not shown). Apparently, there is a common mechanism of BIMEs formation. The mechanism is targeted to rayt genes, one third of which are directly associated with BIMEs    The values represent the numbers of exact copies of REP sequences, flanking identified rayt genes (as denoted in Table 1), in bacterial genomes. For dimorphic REPs, the upper value corresponds to the number of upper REP sequences from Table 1 and vice versa. In cases where the cognate rayt gene or its close homolog (flanked by the same REPs) is actually present in the given genome, the numbers are written in bold and underlined.
( Table 1). Furthermore, 3´termini of rayt genes were found captured between REPs in four rayt-adjacent BIMEs (Additional file 4). BIMEs are known to exhibit extensive interstrain differences in length and distribution [24,27] that seem unlikely to result solely from processes such as homologous recombination or DNA polymerase strand-slippage. We hypothesize that the putative RAYT-catalyzed reaction, as described further, may pose the driving force behind BIME establishment and dynamics.
In the simplest case, the information contained in a REP sequence would be sufficient for its recognition and cleavage by RAYT. Because of high level of conservation of the 5´head sequences (Table 1), we hypothesize that they might serve as determinants of position of cleavage site. Presumed REP-targeted RAYT activity would then result, for example, in reversible formation of a free hydroxyl group and covalent attachment of 5´t erminus of REP sequence to RAYT protein ( Figure 6A). Host genomes typically harbor hundreds of REPs and all of them present potential substrates for RAYTs. The RAYT activity can thus account for various imaginable DNA rearrangements. There are two important aspects of presumed RAYT catalysis. Firstly, the transiently present free hydroxyl group can serve as a primer for initiation of DNA replication. Secondly, in trans ligation (reverse RAYT-catalyzed reaction) might occur relatively frequently between two RAYTs that act on different REPs. Because the assembly of catalytic site in related IS200/IS605 transposases is achieved by dimerization (due to their limited size), RAYTs probably form dimers as well. The physical proximity of two subunits enhances the frequency of in trans ligations.
We suggest that REP-dependent RAYT activity is responsible for some of the unusual observations regarding REPs. For example, high number of REPs in host genome is conditioned by presence of their cognate RAYT (Table 2). Further, the substitution rate for rayt genes was shown to greatly exceed the rate of substitutions in surrounding host genes ( Figure 5). If RAYTs cleave in adjacency of their flanking REPs, resulting OH groups may prime DNA replication into rayt gene, leading to partial or complete replacement of one or both strands. When several rounds of such replication are performed during each cell cycle, excessive mutations accumulate. Although this is a rather complicated theory, the alternatives, like strong positive selection for mutated RAYTs, are equally uneasy to substantiate.
Another process we propose is RAYT-dependent is the preferential formation of new REPs in vicinity of a rayt gene (Additional file 3), following its horizontal transfer into the host. In this case, acquired RAYT obviously causes new REPs´production, possibly through multiplication of existing REPs flanking its gene.
A possible model of BIME formation is depicted in Figure 6B. Starting with one basic module of BIMEs (two directly repeated REPs and one REP between them in inverted orientation - Figure 2), RAYT dimer cleaves at both top-strand REPs. Another RAYT dimer works on bottom strand, due to presence of single REP, only one unit of the dimer is attached to REP after cleavage. Upon in trans ligation within the frame of "yellow" dimer, circularized basic module and bottom strand hold together by their complementary parts. The circle is primed by the free OH group resulting from RAYT cleavage of bottom strand. At this point, rolling circle replication of basic module begins. The main replicative DNA polymerase (Pol III holoenzyme in E. coli) might accomplish the process on its own, since it was shown to possess intrinsic moderate strand-displacement activity [35]. The amplified basic module (BIME) is cut off from the rolling circle after the second unit of "blue" RAYT dimer cleaves newly synthetized REP. Then, second in trans ligation within the frame of "blue" dimer integrates BIME into the bottom strand. Following replication of chromosome and separation of daughter cells, one of them contains a modular BIME in its genome.
Taken together, we have gathered considerable amount of in silico evidence to propose significant role of transposases in generation of bacterial intergenic repeats. If our assumptions are true, then REPs and BIMEs represent a novel class of nonautonomous TEs. To confirm this, additional experiments are needed to simulate interaction between RAYTs and REPs in vivo and in vitro.