Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
© Iyer et al 2002
Received: 07 January 2002
Accepted: 21 March 2002
Published: 21 March 2002
Skip to main content
© Iyer et al 2002
Received: 07 January 2002
Accepted: 21 March 2002
Published: 21 March 2002
The DNA single-strand annealing proteins (SSAPs), such as RecT, Redβ, ERF and Rad52, function in RecA-dependent and RecA-independent DNA recombination pathways. Recently, they have been shown to form similar helical quaternary superstructures. However, despite the functional similarities between these diverse SSAPs, their actual evolutionary affinities are poorly understood.
Using sensitive computational sequence analysis, we show that the RecT and Redβ proteins, along with several other bacterial proteins, form a distinct superfamily. The ERF and Rad52 families show no direct evolutionary relationship to these proteins and define novel superfamilies of their own. We identify several previously unknown members of each of these superfamilies and also report, for the first time, bacterial and viral homologs of Rad52. Additionally, we predict the presence of aberrant HhH modules in RAD52 that are likely to be involved in DNA-binding. Using the contextual information obtained from the analysis of gene neighborhoods, we provide evidence of the interaction of the bacterial members of each of these SSAP superfamilies with a similar set of DNA repair/recombination protein. These include different nucleases or Holliday junction resolvases, the ABC ATPase SbcC and the single-strand-binding protein. We also present evidence of independent assembly of some of the predicted operons encoding SSAPs and in situ displacement of functionally similar genes.
There are three evolutionarily distinct superfamilies of SSAPs, namely the RecT/Redβ, ERF, and RAD52, that have different sequence conservation patterns and predicted folds. All these SSAPs appear to be primarily of bacteriophage origin and have been acquired by numerous phylogenetically distant cellular genomes. They generally occur in predicted operons encoding one or more of a set of conserved DNA recombination proteins that appear to be the principal functional partners of the SSAPs.
Homologous DNA recombination is a fundamental process in the biochemistry of DNA repair and replication, which contributes to the generation of the genetic diversity critical for natural selection. An important step in the recombination process is the pairing of homologous double-stranded DNAs followed by the exchange of DNA strands between the paired molecules. Experimental studies have shown that members of the archetypal RecA family of recombinases are central to this reaction in all extant forms of life [1, 2].
Studies in Escherichia coli have shown that, although RecA is the principle protein involved in pairing and strand exchange, unrelated proteins, that have a much more restrictive phyletic distribution, can also promote similar reactions in a RecA dependent or RecA-independent manner . These alternative or additional mediators of homologous recombination include the well-characterized prophage RecT, phage λ Redβ and phage P22 ERF proteins [4, 5]. Similarly, in yeast and vertebrates, the RAD52 protein is involved in the pairing and strand exchange reaction and can promote recombination in a RAD51 (the eukaryotic RecA homolog)-dependent or independent manner . The RecT protein works in conjunction with the RecE-nuclease  and was initially described in genetic studies on the complemention of mutations in the RecBCD pathway of DNA repair [8–10]. Biochemically, RecT has been shown to bind single-stranded (ss) DNA 3' overhang regions generated by the RecE nuclease, and promote strand exchange between homologous DNA partners by assisting the pairing of complementary single-stranded regions [4, 10]. The reaction catalyzed by the RecT/RecE system is similar to that described for the phage λ exonuclease (exo/Redβ) and the single-strand annealing protein Redβ. The similarity between these two systems is further extended by the observation that the RecT/E system can complement mutations in the λ exo/Redβ system [10, 12]. In eukaryotes, RAD52 protein has been shown to exhibit properties similar to those of RecT and Redβ proteins: it binds ssDNA and promotes strand exchange via the pairing of complementary single strands [6, 13]. In vitro studies on quaternary structures have shown that the single strand annealing proteins (SSAPs), RecT, Redβ, ERF and RAD52, form similar helical super-structures [14–17]. This has led to the proposal that RecT, Redβ, ERF and the eukaryotic RAD52 function in an analogous fashion, and even are "structural homologs" .
However, no sequence or secondary structural similarities have been noticed between different SSAPs and current understanding of their evolutionary history and phyletic range remains poor. Here, we describe the results of an in-depth sequence analysis of these proteins and delineate their evolutionary relationships and phyletic horizon in available genomes. We show that, in spite of the functional similarities, and the similar quaternary structures, there are three distinct superfamilies of SSAPs, namely the RecT/Redβ, RAD52 and ERF, that appear to be evolutionarily unrelated to each other. These superfamilies show a wide distribution in viral and cellular genomes, but appear to have originally evolved in large DNA bacteriophages. Through an analysis of the contextual information provided by the predicted operons, in which the SSAPs occur, we predict several previously undetected functional connections of these proteins, which might shed new light on the corresponding DNA repair/recombination pathways.
A multiple alignment of all members of the RecT/Redβ superfamily was generated using the T_coffee program followed by adjustments based on the PSI-BLAST search results. This alignment was used to predict their secondary structure using the JPRED and PHD methods; these predictions pointed to an α + β domain with a core of five β-strands and five α-helices (Fig. 1). Some of the strongest conservation is concentrated in the long helices, and the pattern includes some charged or polar residues, suggesting that they are probably exposed and participate in the protein-protein and protein-DNA interactions that are typical of this superfamily (helices 2,3, 4 in Fig. 1). The conserved, regularly spaced hydrophobic residues in the RecT/Redβ superfamily are predicted to be buried, allowing these domains to assume a globular structure. Experimental studies have shown that the strand transfer reaction mediated by RecT and its binding to dsDNA are sensitive to Mg+2 concentrations and it was proposed that the levels of free Mg+2 could regulate RecT activity . Similarly, Redβ has been shown to promote single strand annealing in a Mg+2-dependent manner . In this context, the conservation of the two C-terminal acidic residues in the majority of members of this superfamily suggests that these might be involved in the coordination of Mg+2 and implies that the metal ion-dependent conformational switching is likely to be a generic feature of this family.
Secondary structure prediction based on the multiple alignment of the ERF domain suggests a globular α + β fold with five helices and three or four strands (Fig. 3). The above-mentioned motif that is typical of this family is associated with helix 4 of this domain; given the presence of conserved basic residues, it may be critical for DNA-binding and strand-transfer activity of the ERF-like proteins. Additionally, in the loop between helices 4 and 5 of the ERF domain there is a universally conserved acidic motif of the form DXD. Analogous to the RecT superfamily, this acidic dyad might coordinate a divalent cation and undergo a conformational change dependent on metal-binding. However, the average size of the core domains, the patterns of conserved residues, and the predicted secondary structures of the RecT/Redβ and ERF domains show no correspondence to each other, implying that there is no direct evolutionary link between these protein groups.
The secondary structure predictions showed that the Rad52 superfamily proteins adopt a structure with interspersed α-helices and β-strands (Fig. 5). Additionally, fold predictions using 3DPSSM (E-value=.0085, corresponding to a 90% confidence in the prediction) and the hybrid fold method (Z-score = 19.5) predicted the presence of a potential Helix-hairpin-Helix (HhH) fold in members of the RAD52 superfamily. The HhH domain is a small nucleic acid-binding module comprised of two helices joined by a central loop (hairpin), which functions as the DNA-binding moiety of numerous repair and recombination proteins[27, 28]. Two HhH modules are predicted in the core conserved domain of the RAD52 family, the first one bounded by the predicted helices 2 and 3, and the second one bounded by helices 5 and 6 (Fig. 5). Although these predicted HhH modules are very divergent in sequence from the typical versions, the hairpin in both HhH modules of the RAD52 family proteins is bounded by small residues, typically glycine; this conforms to the signature motif characteristic of the classical HhH modules [28, 29]. However, in the case of the RAD52 superfamily the predicted HhH modules appear to have been welded into a large globular superstructure that maintained its evolutionary distinctness over time. The conservation pattern and predicted structural elements of the RAD52 superfamily are distinct from those predicted for the ERF and RecT/Redβ superfamilies (Fig 1, 3, 5), supporting the lack of a direct evolutionary relationship between these proteins.
The RAD52 superfamily shows a sporadic phyletic distribution, and even in the crown-group eukaryotes, might have been secondarily lost in certain lineages, such as plants, nematodes and insects. The sporadic distribution of this family among phylogenetically distant bacteria, along with its presence in several prophages, suggests that, like the RecT/Redβ and ERF superfamilies, at least the bacterial RAD52-proteins might be of predominantly phage origin. The core of the eukaryotic recombination system appears to have been inherited from the system present in the common ancestor shared with the archaea . However, RAD52 is thus far absent in all archaeal genomes and is restricted to a single orthologous group in the eukaryotes . Thus, it appears plausible that eukaryotic RAD52 was ultimately derived through lateral transfer either from a bacterial genome or directly from a viral source, at a point at least predating the divergence of the crown group eukaryotes and Entamoeba.
The clustering of functionally related genes in prokaryotic genomes into co-transcribed and co-regulated units, operons, often allows functional assignments through the principle of 'guilt by association' [30–32]. Generally, genes whose products physically interact to form a complex or are involved in successive steps in a biochemical pathway form operons that are conserved over large evolutionary distances . On previous occasions, we have used gene neighborhoods or operons to predict novel DNA repair complexes and their components . Accordingly, a similar approach was applied to the three families of SSAPs (RecT/Redβ, ERF, Rad52), to shed light on their functional links.
Notably, the genes encoding the three evolutionarily distinct SSAPs co-occurred with similar sets of DNA repair/recombination-related proteins (Figs. 2,4). In at least one case, each of them was found adjacent to the gene for the single-strand-binding protein (SSB), an OB-fold protein that binds ssDNA (Figs. 2,4). This association ties in with the function of the SSAPs in single-strand annealing, suggesting that they closely interact with SSB. It has been suggested in the case of RecT that it may compete with SSB for binding single strand overhangs and thereby make them available for the annealing process . Similar interactions between other SSAPs and SSB, that probably coats the ssDNA generated by nucleases, appear likely. Genes for SSAPs from all the 3 distinct superfamilies may also occur adjacent to or in the vicinity of genes encoding nucleases or Holliday junction resolvases (HJRs). Genes for RecT/Redβ superfamily proteins are associated with genes encoding a λ-type exonuclease (LE) of the type II restriction enzyme fold, RecE, which also might be a divergent member of this fold, and a nuclease of the Endonuclease VII (EndoVII) fold [7, 34] (Fig. 2). The ERF superfamily genes are associated with a RusA superfamily nuclease/HJR and EndoVII fold nucleases (Fig. 4) . Furthermore, the Borrelia plasmids that encode ERF, also almost always additionally encode a λ-type exonuclease, even if it is not the adjacent gene. In a single instance, in the Gram-positive bacterium Ruminococcus albus, the gene encoding a RAD52 superfamily protein occurs adjacent to a gene for a λ-type exonuclease. These nucleases probably contribute to the repair process, in which SSAPs are involved, by providing the initial break in the dsDNA and/or in digesting the nicked target to generate ssDNA.
The RecT and Redβ family proteins often co-occur with the SbcC gene that encodes an ABC ATPase with a large coiled-coil segment. These proteins are known to cooperate with SbcD, nuclease of the calcineurin-like phosphoesterase superfamily and to degrade dsDNA in the 3' → 5' direction generating ssDNA [35, 36]. It seems likely that RecT/Redβ proteins, at least in certain cases, function in conjunction with the SbcCD-pathway, by utilizing the single-stranded regions generated by the SbcCD nuclease. Additionally, several genes, whose functions are less clear, tend to co-occur with the genes coding for the SSAPs. These include DNA methyltransferases and the primosomal protein DnaD from low-GC Gram-positive bacteria  that co-occur with both ERF and RecT superfamily members (Figs. 2,4). The poorly characterized phage-or prophage-specific genes that are frequently observed in these neighborhoods include ORF15 (Streptococcus thermophilus bacteriophage 7201), ORF86, ORF100a (Staphylococcus aureus temperate phage φSLT) and ORF364 (bacteriophage φ31.1) (Figs. 2,4). Secondary structure predictions indicate a high α-helical content for these proteins. It is likely that these α-helical proteins are phage innovations that could function as adaptors in the recombination pathway either as accessory protein-protein interacting domains or as DNA-binding domains.
A superposition of the gene neighborhood information upon the phylogenetic trees for the SSAP superfamilies provides insights into the evolutionary processes that led to the emergence of the operons that include the SSAP genes. As discussed above, the RecT/Redβ superfamily clearly splits into three distinct families (Fig. 2). The phylogenetic tree shows that SbcC co-occurs with the SSAP once within Redβ-family and once within the RecT-family. An examination of the tree and the respective gene neighborhoods suggests that independent juxtaposition of SbcC with Redβ-like and RecT-like genes on two separate occasions is the most parsimonious explanation. The alternative explanation, namely that the gene coding for the common ancestor of the Redβ and RecT already co-occurred with the sbcC gene is far less likely because it would require over 10 independent losses of this, apparently, functionally advantageous organization in different bacterial and bacteriophage lineages. Likewise, the observation that, in one or more cases, genes encoding each of the SSAPs co-occur in the same predicted operon with SSB or a λ-type exonuclease, suggests that similar operon structures may also emerge independently in evolution. Thus, the same or analogous operon organizations may emerge convergently on multiple occasions, probably due to the selective pressure arising from the strong interactions between the SSAPs and their functional partners such as SbcC, SSB and LE.
The distribution of the gene neighborhoods, in which a member of the RecT superfamily occurs next to the RecE on the phylogenetic tree of the RecT/Redβ superfamily, indicates that the RecE-RecT combination was probably the ancestral State for at least the RecT and EHAP1 families (Fig. 2). This implies that, on at least two occasions, the gene for λ endonuclease displaced the functionally analogous recE gene and became the adjacent gene to RecT (Fig. 2). That this displacement might have occurred by in situ insertion of a non-orthologous gene is suggested by the detection, on three separate occasions, of unusual remnants of pre-existing genes. The RecT/Redβ superfamily members, namely EHAP1 from the enterobacteria and PF161 from Borrelia hermsii, contain a small, C-terminal fragment of the core conserved domain of the ERF superfamily, which is located C-terminal of their bona fide RecT/Redβ domains. These fragments of the ERF protein are closely related to other ERF domains from related organisms and are unlikely to fold into the native conformation characteristic of the full-length ERF domain. For example the ERF fragment fused to the EHAP1 RecT/Redβ domain is closely related to the P22 phage ERF domain. This suggests that in each of these cases a RecT superfamily gene was inserted in frame into a pre-existing ERF gene leaving behind only a non-functional fragment of it (Fig. 2). In a very similar case, the bacterial RAD52-like protein from a Shiga toxin encoding temperate phage is fused to an extreme C-terminal fragment that is nearly identical to the C-terminal most portion of the P22 ERF protein. In this case, it appears that the pre-existing ERF gene was displaced through the insertion of a bacterial RAD52-like gene. Interestingly, and in the same vein, the RecT proteins from Bacillus species contain a short C-terminal acidic module that is missing in other RecT proteins, but is highly similar to the C-terminal region of SSBs, particularly those from Gram-positive bacteria (data not shown). This suggests that, at some stage in their evolution, the Bacillus recT gene protein has recombined with the gene coding for SSB, which might even have resulted in a functional replacement of an SSB with an SSAP.
Thus it appears likely that functionally equivalent genes may displace their analogs in operons via insertion into the same position.
We show that functionally similar SSAPs belong to at least three evolutionarily distinct superfamilies. We unify the Redβ and RecT proteins and their homologs, which have not been reported as being related at the sequence level, into a single superfamily, supporting the notion that these proteins share a similar mechanism of action. The second superfamily typified by the ERF proteins is predominantly found in bacteriophages and is also present on all circular plasmids from Borrelia, suggesting a role in the recombination of these plasmids. The third superfamily, typified by the yeast RAD52 protein and previously detected only in eukaryotes, was shown to include bacterial and phage homologs and to contain a modified HhH domain. By comparing the gene neighborhoods of the SSAPs, we show that the predicted operons that include the SSAP genes evolve according to the "LEGO" principle. In these operons, the SSAP genes are linked to the genes for various DNAses and DNA repair related proteins, such as SSB and SbcC, which implies functional connections between the encoded proteins. Evidence is presented of convergent emergence of similar SSAP-encoding operons in different lineages and of in situ non-orthologous displacement of functionally similar genes in these operons.
Sequence searches of the non-redundant (NR) and the unfinished genomes databases, were done using the gapped BLAST and PSI-BLAST programs . Iterative PSI-BLAST searches used for in-depth sequence analysis were done with the profile inclusion cutoff expectation value (E value) set at 0.1. Multiple sequence alignments were generated using the T_Coffee program  and the output was adjusted using PSI-BLAST search results and secondary structure predictions, which were conducted using the PHD [40, 41] and Jpred  programs. Fold predictions were done using the 3-D position specific score matrix (3DPSSM)  and the Hybrid fold method . Phylogenetic analysis was carried out using the neighbor-joining algorithm, with subsequent local rearrangements using the maximum likelihood algorithm . The robustness of tree topology was assessed with 10000 Resampling of Estimated Log Likelihoods (RELL) bootstrap replicates. The MOLPHY and Phylip software packages were used for the analyses [46, 47].
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.