- Research article
- Open Access
Phylogenetic distribution of translational GTPases in bacteria
BMC Genomicsvolume 8, Article number: 15 (2007)
Translational GTPases are a family of proteins in which GTPase activity is stimulated by the large ribosomal subunit. Conserved sequence features allow members of this family to be identified.
To achieve accurate protein identification and grouping we have developed a method combining searches with Hidden Markov Model profiles and tree based grouping. We found all the genes for translational GTPases in 191 fully sequenced bacterial genomes. The protein sequences were grouped into nine subfamilies.
Analysis of the results shows that three translational GTPases, the translation factors EF-Tu, EF-G and IF2, are present in all organisms examined. In addition, several copies of the genes encoding EF-Tu and EF-G are present in some genomes. In the case of multiple genes for EF-Tu, the gene copies are nearly identical; in the case of multiple EF-G genes, the gene copies have been considerably diverged. The fourth translational GTPase, LepA, the function of which is currently unknown, is also nearly universally conserved in bacteria, being absent from only one organism out of the 191 analyzed. The translation regulator, TypA, is also present in most of the organisms examined, being absent only from bacteria with small genomes.
Surprisingly, some of the well studied translational GTPases are present only in a very small number of bacteria. The translation termination factor RF3 is absent from many groups of bacteria with both small and large genomes. The specialized translation factor for selenocysteine incorporation – SelB – was found in only 39 organisms. Similarly, the tetracycline resistance proteins (Tet) are present only in a small number of species.
Proteins of the CysN/NodQ subfamily have acquired functions in sulfur metabolism and production of signaling molecules. The genes coding for CysN/NodQ proteins were found in 74 genomes. This protein subfamily is not confined to Proteobacteria, as suggested previously but present also in many other groups of bacteria.
Four of the translational GTPase subfamilies (IF2, EF-Tu, EF-G and LepA) are represented by at least one member in each bacterium studied, with one exception in LepA. This defines the set of translational GTPases essential for basic cell functions.
Translational GTPases (trGTPases) are proteins in which the GTPase activity is induced by the large ribosomal subunit [1, 2]. Several members of this protein family (EF-G, EF-Tu, IF2 and RF3) bind to an overlapping site on the ribosome [1, 3–6]. This conserved region of the large subunit includes part of domain II of 23S RNA (the binding site for the antibiotic thiostreptone), part of domain VI (the sarcin-ricin loop), and proteins L11 and L7/12. This region is responsible for activating the trGTPases [1, 2].
The specific sequence features of the trGTPases allow proteins that belong to this family to be identified . In bacteria, the family includes proteins that are considered to belong to the "classical" set of translational GTPases (EF-G, EF-Tu, IF2, RF3), proteins that bind to the ribosome and have auxiliary or unidentified functions (SelB, Tet, LepA, TypA), and a group of proteins that have acquired functions in sulfur metabolism and might have lost their ability to bind to the ribosome (CysN/NodQ). Several additional GTPases with sequences that do not group them into the trGTPase family bind to, or have their activities induced by, the ribosome [8–12]. The GTPase activity of these proteins is not activated by the conserved region described above. The present work focuses on the family of trGTPases ("the classic translation factor family" according to Leipe et al., 2002), so these additional proteins are not included.
It has been shown that many members of this family are nearly ubiquitous in bacteria [13–15]. However, these studies were performed on relatively small datasets because few fully sequenced genomes were available. Moreover, there is confusion in the literature about the members of the core set of trGTPases present in all bacteria. For example, some studies find that LepA is ubiquitous [13–15] but this finding has not been confirmed by others . The number of fully sequenced bacterial genomes is now rapidly increasing and several hundred are available in the databases. This provides a basis for studying the presence of trGTPases in many different organisms. Moreover, no attempts were made in the previous studies to identify the trGTPase subfamilies missing from the organisms under investigation. Careful annotation of these missing trGTPases is essential for understanding the global distribution of this protein family. Therefore, we attempted not only to find as many trGTPases as possible but also to find all the trGTPases in the genomes we studied. This approach allows the presence or absence of genes for particular trGTPases in the genomes to be annotated.
Our study reveals the number of genes in nine subfamilies of ribosome-associated GTPases from 191 fully sequenced bacterial genomes. Four of the subfamilies (IF2, EF-Tu, EF-G and LepA) are represented at least by one member in all bacteria studied (with one exception in the case of LepA, as discussed below). The other subfamilies (Tet, RF3, SelB, TypA, CysN/NodQ) are present only in some bacteria.
To analyze the gene content of trGTPases in the fully sequenced genomes we needed to group all the trGTPases into subfamilies. This was done in several steps to ensure that all functional genes were detected and properly classified.
Creating the initial database
We started to gather genes for trGTPases by downloading all the annotated ORF sequences from the RefSeq database . However, this database might contain annotation errors and lack some ORFs. It was important to ensure that none of the trGTPases genes were missing. Therefore, we performed a BLAST search against the genomic sequences using TBLASTN with the nine known trGTPase genes from Escherichia coli and the tetracycline resistance gene from Bacillus cereus. This search resulted in 6 potential trGTPase genes. Four of these had previously been annotated as pseudogenes, but they could be functional genes. Two additional EF-Tu genes were found (one from Wolinella succinogenes and one from Clostridium acetobutylicum) that were missing from RefSeq. Interestingly, these two EF-Tu genes were present in GenBank, indicating that RefSeq had missed annotation of these genes. They were added to RefSeq to create a so-called "updated gene database". The ORF sequences in this database were translated into an "updated protein database", which was used for further studies (Fig. 1A).
Detection of all trGTPase candidates with subfamily-specific HMMs
To ensure that all trGTPases were detected, we used a set of subfamily-specific Hidden Markov Models (HMM) . Subfamily-specific HMMs should detect trGTPase candidates more specifically than the commonly-used BLAST or PSI-BLAST searches. These HMM models were created in several steps: retrieving well-conserved trGTPases from the "updated protein database" by a BLAST search with EF-Tu from Escherichia coli, computing a phylogenetic tree, dividing the proteins into nine subfamilies based on the tree and creating subfamily-specific HMMs (Fig. 1B). In addition, "outgroup" HMM profiles were created from 30 non-translational GTPases for control purposes. It is important to notice that the initial tree was calculated using the GTPase domain only, because reliable alignment of full-length sequences is not possible.
All proteins from the "updated protein database" were run against all nine HMM models using HMMSEARCH . Each protein was classified into the most similar family, decided by the HMMSEARCH score. This was done iteratively at increasing sensitivity levels until the number of proteins in all trGTPase families remained unchanged (Table 1); then we retrieved all the potential trGTPases. It is interesting to note that all trGTPases were retrieved at E-value 1e-10, and searches at lower stringency yielded no additional ones (Table 1). The classification of trGTPases was confirmed by calculating a phylogenetic tree as described below ("Final grouping of trGTPases into subfamilies"). It is also important to note that at E-value 1, any of the nine HMM profiles was able to detect members of all other subfamilies. This result indicates that in case there existed an additional trGTPase subfamily, not represented by any of the sequences on our preliminary phylogenetic tree, it would have been detected at this stage.
Validation of the trGTPases found
The results of the automatic procedures mentioned above were additionally verified by manual inspection (Fig. 1C). Although most of the proteins in our set of trGTPase candidates proved valid, there was also a small subset of proteins that cannot be GTPases because they lack the highly conserved consensus elements (G1, G3 and G4 motifs) of the GTPase domain [16, 19]. In addition, four of the proteins were very short (less than 60% of the average protein length of the subfamily). All these cases (listed in Additional file 2 as exceptions) were annotated separately.
In seven cases there was an upstream start codon that was not annotated as a functional start codon but would allow a functional protein to be produced. For example, in the current annotation, the correct start position is missed in the Photobacterium profundum SelB coding gene because it overlaps with the stop codon of the previous gene (SelA). In other cases, an alternative, non-AUG initiation codon could restore a functional protein. For example, in Borrelia burgdorferi and Bacillus licheniformis, full length lepA can be restored only if we assume that AUU is a start codon (Fig. 2F and see Additional file 1). There are two existing examples in which AUU has been shown to function as an initiation codon: in Escherichia coli infC (coding for IF-3), AUU regulates expression at the translational level ; and expression of pncB is reduced because of the AUU start codon .
In our dataset, there are also cases where a frame-shift event might restore a functional gene. In some of these, frame-shift is a probable case. For example, during translation of selB in Yersinia, frame-shift might occur at the poly(G)10 track. Homopolymeric tracks are known to be frame-shifting sites .
In conclusion, we found that in 17 cases a functional protein might be restored (Fig. 2, see Additional file 1). These examples are included in the final list of trGTPases and the correction of initiation site or frame-shift event is indicated in Figs. 4, 5, 6. After manual inspection and validation we ended up with 1314 trGTPase proteins (see Additional file 2). These proteins were classified into 9 different families.
Final grouping of trGTPases into subfamilies
The initial grouping of the trGTPases into nine subfamilies was dependent on the initial tree, which was created from a smaller subset of proteins and contained some non-functional proteins. Thus, we decided to confirm the classification of trGTPases again, (a) by dividing proteins among 9 HMMs and (b) by computing a phylogenetic tree from all 1314 validated trGTPases (Fig. 1D). The tree was calculated again using only the GTPase domain, which is universally conserved in all trGTPases. A distance-based phylogenetic tree was created and bootstrapped using PHYLIP  with PAM distances (Fig. 3). On this tree the same familiar nine branches appeared with high bootstrap support. Furthermore, all the proteins fell into the same branches as they did using the HMM classification. Thus, the phylogenetic tree supports the classification of proteins into 9 subfamilies as described in Table 1.
There appears to be an additional well-separated branch within the EF-G branch (Fig. 3). However, in quartet puzzling tree (TREE-PUZZLE ) and identity-based distance tree, this branch disappears. Therefore, we did not treat it as an independent family of trGTPases in the current study. Nevertheless, this branch may contain EF-G-like proteins that are diverging functionally, as it contains only proteins encoded in genomes with more than one gene for the EF-G subfamily.
After identifying all the genes for trGTPases in the genomes under study, we considered the presence or absence of these genes in different phylogenetic groups of bacteria. The number of genes for each trGTPase subfamily is presented in Figs. 4, 5, 6. The 16S ribosomal RNA tree and the phyla of Bergey's bacterial systematics  are also shown. We analyzed the relation between genome size (Figs. 4, 5, 6, "size") and the number of trGTPase subfamilies it codes for (Fig. 7). Smaller genomes clearly contain fewer genes for trGTPases. Many small genomes (shorter than 2 Mb) code only for the core set of four trGTPases (IF2, EF-Tu, EF-G and LepA). As the genome size increases, the number of different trGTPase genes also increases, reaching a plateau value between 7 and 8 genes. There are some notable exceptions: the Buchnera genomes, which are only 0.6–0.7 Mb, contain 6–7 trGTPase genes; Pirellula with genome size 7.2 Mb codes for only six subfamilies of trGTPases, lacking the gene for RF3.
We have annotated the genes for trGTPases in 191 fully sequenced bacterial genomes. The approach we have developed (Fig. 1) allows misannotations, possible sequencing errors, frameshifts and non-canonical translation initiation events to be identified (Fig. 2). We paid special attention to finding all the trGTPases genes in the genomes analyzed. This allows cases where certain subfamilies are not encoded in a given genome to be annotated with confidence.
Our study reveals the number of members in nine subfamilies of ribosome-associated GTPases. Four of the subfamilies (IF2, EF-Tu, EF-G and LepA) are represented by at least one member in each bacterium studied, with one exception in LepA, as discussed below (Figs. 7, 8). The other subfamilies (Tet, RF3, SelB, TypA, CysN/NodQ) are present only in some bacteria (Figs. 4, 5, 6). In the following sections the trGTPases subfamilies are discussed in detail.
Initiation factor 2
The bacterial IF2 catalyzes the binding of initiator tRNA to the initiating 30S subunit [26, 27]. In the next step the GTP-bound IF2 catalyzes formation of the 70S ribosome [28, 29]. Ribosome-stimulated GTP hydrolysis is required for rapid dissociation of the factor from the ribosome .
The gene for IF2 was recognized in all the genomes analyzed. This indicates that IF2 is absolutely conserved in all bacteria (Figs. 4, 5, 6). Moreover, previous analysis has identified the gene for IF2 as universally conserved in all domains of life [30, 31]. It is also consistent with the fact that deletion of the gene for IF2 is lethal in Escherichia coli . In contrast to several other ribosome-associated GTPases described below, the gene for IF2 has not been duplicated in any of the genomes analyzed; all bacteria contain only one copy.
Escherichia coli, other members of the family Enterobacteriaceae and Bacillus subtilis all contain two or three isoforms of IF2, resulting from the use of different in-frame start codons [33–35]. Both the longer and shorter isoforms contain the major functional domains of the protein, including the GTPase domain, and are functionally active in biochemical assays [29, 36]. However, an optimal ratio of isoforms is required to achieve maximal growth rate [37, 38]. A conserved domain (IF2N) has been described in the N-terminus of the protein . In Escherichia coli the longer isoform contains two copies of the IF2N domain and the shorter isoforms have one copy. In our collection of IF2 sequences the tandem organization of the IF2N domain was found in 134 cases out of the 191 analyzed. This suggests that these proteins are annotated as the longer isoforms. Although the presence of IF2 isoforms has been experimentally proven in several organisms [33–35], an experimental study using a wider phylogenetic range of bacteria is needed to clarify the generality of an internal initiation event occurring between the two IF2N domains. In 57 IF2 sequences, only one IF2N domain was found (marked with symbol "N" in Figs. 4, 5, 6). This suggests that in these organisms only the shorter isoform of IF2 is present.
Elongation factor Tu
EF-Tu in complex with GTP brings aminoacyl-tRNA into the A site of the ribosome . The factor is released from the ribosome after GTP hydrolysis . GTP hydrolysis separates two steps in the selection of the correct codon-anticodon interaction: initial selection occurs before hydrolysis and proofreading occurs afterwards [2, 41]. This double-stage selection of aminoacyl-tRNA allows the accuracy of translation to be increased [41–43]. Exchange of EF-Tu-bound GDP with GTP relies on a specific G-nucleotide exchange factor, EF-Ts [44–46].
We found the gene for EF-Tu in all genomes analyzed (Figs. 4, 5, 6, 8). This agrees with the previous notion that this trGTPase is universally conserved in all three domains of life [13, 14]. In our dataset, 267 proteins (from 191 organisms) belong to the EF-Tu family, encoded in 1 to 2 copies of the gene per genome. Most of the bacteria with two EF-Tu genes belong to the phylum Proteobacteria (45 species), but there are also additional genes in Firmicutes (class clostridia) (3), Deinococcus-Thermus (2), Actinobacteria (2) and Aquificae (1). For Proteobacteria, it has been argued that the observed phylogenetic distribution is best accounted for by the presence of two gene copies in the ancestral genome followed by differential loss of the second copy .
The function of EF-Tu is essential for the cell and its gene cannot be deleted . In Escherichia coli, where two EF-Tu-coding genes are present, either of them may be deleted without affecting the viability of the cell. Interestingly, if the organism has two copies of the EF-Tu gene, then the two copies are nearly identical. Gene conversion is assumed to be the mechanism behind this similarity. This was proved to be the case in Salmonella typhimurium [49–51]. A similar mechanism maintains the uniformity of sequences of different ribosomal RNA operons in some genomes [52–54].
The genomes analyzed in the current study contain between 1 and 14 ribosomal RNA operons per genome. The larger number of ribosomal RNA operons might indicate the need for more ribosomes and other components of the translational machinery, including EF-Tu. We therefore asked whether there are more rRNA operons in genomes containing two gene copies for EF-Tu than in those of bacteria with only one EF-Tu-coding gene (Figs. 4, 5, 6; data not shown). No clear correlation can be found because there are genomes with many rRNA operons and one EF-Tu gene copy (Bacillus), and genomes with only one rRNA operon and two EF-Tu gene copies (Ehrlichia, Anaplasma, Wolbachia, Nitrosomonas). The EF-Tu gene copy number rather follows the phylogenic clades: most of the Proteobacteria have two and most of the other phylogenetic groups have one.
Elongation factor G
EF-G catalyzes the translocation of peptidyl-tRNA from the ribosomal A site to the P site and of deaminoacylated tRNA from the P site to the E site [2, 55]. The exact mechanism by which GTP is used in this process is currently under discussion [56–60]. In addition to its role in translocation, EF-G is required to recycle the ribosomes from their post-termination state to a new round of initiation [61–64].
Consistent with the observation that EF-G is the third trGTPase universally conserved in all three domains of life [13, 14], we found the gene in all the genomes analyzed (Figs. 4, 5, 6, 8). In the model organisms Escherichia coli and Bacillus subtili EF-G is encoded by one essential gene. Surprisingly, we found that in 47 of the 191 genomes analyzed there are two genes for proteins of the EF-G subfamily, and in 10 genomes there are three copies (Figs. 4, 5, 64-6, 8). Multiple gene copies for EF-G are found widely in the bacterial phylogenetic tree, being observed in most of the phyla analyzed.
In contrast to EF-Tu, the copies EF-G genes in one genome differ considerably; the gene conversion mechanisms that work in case of the EF-Tu coding genes do not seem to operate in the case of EF-G. It is currently not clear whether the two copies of EF-G are functionally similar or whether one form might have acquired a different function.
In one case we know that a separate group of ribosome-associated GTPases has evolved from the EF-G subfamily. This is the subfamily of tetracycline resistance proteins [65, 66]. In the current work we use the abbreviation "Tet" for all tetracycline-resistance proteins that act by ribosomal protection. These proteins bind to the ribosome, hydrolyze GTP and cause release of tetracycline from the ribosome [67–69]. Antibiotic-free ribosomes able to translate mRNA are produced in this process.
We found Tet-coding genes in 20 genomes (Fig. 4, 5, 6). In one case (Clostridium acetobutylicum) two copies were found. The bacterial groups containing the Tet proteins include the producers of tetracyclines (Streptomyces), symbionts in the mammalian gut (Lactobacillus, Bacteroides, Bifidobacterium) and mammalian pathogens (Bacillus, Staphylococcus, Streptococcus, Clostridium). These are the groups most likely to have been in contact with tetracycline and have therefore acquired the resistance genes. The genes for Tet proteins are also present in the plant pathogen Agrobacterium tumefaciens. It is currently not clear why the genes have survived in the genome of this organism.
The function of LepA in the cell is unclear. This protein, which was originally found in association with the cell membrane fraction, exhibits considerable similarity to the translation factor GTPases . LepA crosslinks with ribosome-bound oxazolidinone antibiotics indicating that it can bind to the ribosome . LepA has the unique property of back-translocating posttranslocational ribosomes . The results suggest that it recognizes ribosomes after a defective translocation reaction and induces a back-translocation, thus giving EF-G a second chance to translocate the tRNAs correctly . The gene has been inactivated in Escherichia coli  and Staphylococcus aureus ; the knockout strains are viable. It is therefore surprising to find that the presence of LepA coding genes in bacterial genomes is highly conserved and has very similar pattern to the IF2 genes: almost every genome has one copy (Figs. 4, 5, 6, 8). However, there are two exceptions: one of the sequenced strains of Streptococcus pyogenes has no LepA gene and Pirellula has two copies. The near-universal presence of LepA in bacteria suggests that this protein has an important function.
Release factor 3
The first steps in the termination of translation utilize two types of release factor. Type I release factors (RF1 or RF2) recognize the termination codons and induce hydrolysis of the ester bond connecting the newly-made protein to the last tRNA [74, 75]. The type II release factor (RF3) catalyses a GTPase-dependent release of the type I release factor from the ribosome [76, 77].
It has been observed that an Escherichia coli strain with an inactivated RF3 gene is viable although its growth is disturbed [78, 79]. This suggests that RF3 activity is not essential for the bacterial cell. This is in agreement with the present results showing that 119 of the 191 genomes analyzed contain the gene for RF3 but 72 do not (Figs. 4, 5, 6, 8). As expected, the gene is missing from most of the small genomes (Mycoplasma, Chlamydia, Rickettsia, Wigglesworthia), where only the core set of genes for the basic processes of gene expression have been preserved. In addition, several other groups of bacteria with large genomes contain no RF3 (Bacillus, Mycobacterium, Streptomyces).
In this context it is important to note that the GTPases involved in translation termination differ among the three superkingdoms. Bacterial release factor RF-3 is derived from the translocation factor EF-G family, whereas eukaryotic release factor eRF3 is a paralog of elongation factor EF-Tu/EF-1α; there is no corresponding release factor in Archaea [16, 80]. It is currently not clear how the termination of translation works in organisms lacking RF3. The independence of the evolutionary origins of bacterial and eukaryotic RF3 suggests that loss of the gene could be compensated by duplication of a gene for another trGTPase, followed by diversion to take over the function of RF3. As in bacteria, the lack of RF3 does not correlate with the duplication of genes for other ribosome-associated GTPases (Figs. 4, 5, 6); our analysis does not support this scenario. Another possibility is suggested by the biochemical function of RF3 in the recycling of type I release factors: the weaker binding of type I release factors to the ribosome might compensate for the lack of RF3 function. The fact that weaker binding can compensate for the inactive GTPase has been demonstrated for a trGTPase: the eukaryotic homologue of IF2, eIF5B . In the case of RF3, this prediction awaits experimental investigation.
During synthesis of some proteins, co-translational incorporation of selenocysteine occurs . It has been shown that specific UGA termination codons are used for selenocysteine insertion . Incorporation of selenocysteine is directed by a specific RNA hairpin that follows the UGA codon [83–85]. This hairpin binds a ternary complex comprising translation factor SelB, GTP and selenocysteine-specific aminoacyl-tRNA [85–87]. In this way, selenocysteine-tRNA is directed to the ribosome containing a UGA codon in the A site.
Bacillus subtilis has no selenocysteine-specific tRNA or SelB protein . Therefore, co-translational incorporation of selenocysteine does not occur in this organism. There is a high concentration of selenium in soil, the natural environment of Bacillus subtilis. It has been suggested that random incorporation of selenocysteine into proteins occurs in this organism because cysteinyl-tRNA synthetase cannot distinguish between cysteine and selenocysteine .
The distribution of the selenocysteine incorporation system in different bacteria has been analyzed previously . SelB, analyzed in the current study, might be used as a marker for this system. Our analysis, in agreement with previous results , indicates that only 39 of the 191 genomes analyzed contain a gene for SelB (Figs. 4, 5, 6, 8). It is obvious that the lack of SelB is not confined to soil bacteria. In fact, many human symbionts and pathogens do not contain SelB. On the other hand, Pseudomonas putida, a soil bacterium, contains the gene.
Another surprising feature of the distribution of SelB is its sporadic presence in several bacterial groups. For example, Clostridium perfringens contains the gene but Clostridium tetani does not;Treponema denticola has the gene but Treponema pallidum does not; Mycobacterium avium has it and other Mycobacteria do not. It has been proposed  that this pattern is the result of two mechanisms, primarily speciation and differential gene loss, with some contribution from lateral gene transfer.
It has been shown that TypA regulates multiple cell surface and virulence-associated components in enteropathogenic Escherichia coli [90–92] and is required for growth at low temperatures . In Sinorhizobium meliloti, TypA is required for growth under certain stress conditions . Recently it has been proposed that TypA provides transcript-selective translational control . It has been shown to function as a translation factor required specifically for expression of the global transcriptional modulator Fis . It has been proposed that TypA destabilizes unusually strong interactions between the 5' untranslated region of fis mRNA and the ribosome . It binds to ribosomes at a site coinciding with that for EF-G and has a GTPase activity that is sensitive to high GDP:GTP ratios and is stimulated by 70S ribosomes programmed with mRNA and aminoacylated tRNAs . However, the molecular details of TypA action remain unknown.
Our analysis shows that 165 bacteria have one copy of a gene coding for TypA and 26 genomes have none (Figs. 4, 5, 6). The presence of this gene clearly correlates with genome size: it is present in all genomes larger than 2.8 Mb (Fig. 8). Indeed, if we exclude Treponema denticola, the largest genome lacking TypA is 1.5 Mb (Figs. 4, 5, 6). Genomes smaller than 1.5 Mb usually lack this gene.
In Escherichia coli, CysD and CysN are the two subunits of an ATP sulfurylase (ATPS) that produces adenosine-5'-phosphosulfate (APS) from ATP and sulfate, coupled with GTP hydrolysis. APS is then phosphorylated by an APS kinase, CysC, to produce 3'-phosphoadenosine-5'-phosphosulfate (PAPS), which is then used in amino acid biosynthesis . In addition, Sinorhizobium meliloti (old name Rhizobium meliloti) appears to carry out the same chemistry for the sulfation of nodulation factors, oligosaccharides that are active in the roots of the host plant [97, 98]. In Sinorhizobium, a heterodimeric complex comprising NodP and NodQ appears to possess ATP sulfurylase and APS kinase activities. Indeed, NodP shows strong amino acid sequence similarity to CysD, while NodQ appears to encode both CysN- and CysC-related sequences in a single ORF (the N and C termini of NodQ correspond to CysN and CysC, respectively) .
The gene for CysN/NodQ arose from an archaeal or eukaryotic elongation factor 1 α(EF-1 α) by lateral gene transfer followed by a change in the function of the gene product . The bacterial CysN has retained its GTPase activity that in this enzyme regulates production of APS. On the other hand it has lost the requirement for the ribosome to trigger its GTPase activity and probably has no function in translation .
Our analysis indicates that 74 genomes code for proteins of the CysN/NodQ subfamily (Figs. 4, 5, 6). In some genomes (Nocardia, Sinorhizobium) there are three genes for such proteins. We also used the APS kinase domain (CysC), absent from CysN but present in NodQ, to annotate the CysN and NodQ proteins separately. The NodQ coding gene was found in 21 genomes and the CysN coding gene in 56 (Figs. 4, 5, 6). Interestingly, Nocardia and Sinorhizobium have one gene for CysN and two genes for NodQ.
It is important to note that a phylogenetically-unrelated ATPS unable to hydrolyze GTP is present in many organisms [101, 102]. As this protein family is present in all three domains of life we propose that it could be called ATPS1. Consistent with this proposal, the CysN/NodQ proteins that are present only in bacteria could be called ATPS2.
We have identified the genes coding for ATPS1 and marked them by asterisks in the ATPS column of Figs. 4, 5, 6. The results show that 106 genomes out of the 191 analyzed code for either CysN/NodQ or its functional analogue, ATPS1. The presence of either ATPS1 or ATPS2 (CysN/NodQ) mostly follows the phylogenetic grouping of bacteria: Proteobacteria, Actinobacteria, Bacteroides and Spirochaetes usually contain ATPS2 and Bacilli, Cyanobacteria and the Thermus-Deinococcus group contain ATPS1. The data also indicate that no gene for ATPS was identified in 85 genomes. It is currently not clear how sulfur assimilation occurs in these organisms.
Our current understanding of the molecular mechanisms of trGTPases is based on studies using a very limited number of model organisms. The distribution of genes for trGTPase subfamilies in bacterial genomes suggests that there are considerable differences in the use of trGTPases in different bacteria. For example, RF3 has been considered a member of the "classical" set of trGTPases. It is now clear that many bacterial genomes do not code for this protein. On the other hand, LepA has been considered an obscure, auxiliary GTPase. The nearly ubiquitous presence of the gene for LepA in bacterial genomes calls for more attention to this protein. The unexpected divergence of the EF-G subfamily in many bacteria also points to a very exciting, still unanswered question.
Collection of sequences
The complete sequences of 191 bacterial genomes and annotated protein sequences were obtained from the RefSeq database  created on 10th of January, 2005. Additional unannotated genes were searched by running TBLASTN  against the intergenic regions of all 191 genomes with the following translational GTPases: IF-2, EF-Tu, SelB, EF-G, RF-3, TypA, LepA, CysN from Escherichia coli and Tet from Bacillus cereus. Matches with similarity more than 40% and match/query length ratios more than 70% were added, and thus the "updated protein database" was created.
The preliminary trGTPase dataset was obtained by running BLAST  against "updated protein database" with E-value cutoff 1 using Escherichia coli EF-Tu as a query. Multiple alignment was created by aligning all sequences against Hidden Markov Model GTP_EFTU from Pfam  using program HMMALIGN . Unaligned ends and columns that contained more than 70% gaps were removed by the multiple sequence alignment editor BELVU . Sequences with disrupted Walker A (G-1), Walker B (G-3) or guanine-specific binding domain (G-4) were rejected.
A phylogenetic tree was built using only the GTPase domain with the programs PROTDIST (JTT distances), NEIGHBOR and CONSENSE from the PHYLIP package . Alternative trees for EF-G/Tet branch were drawn using TREEPUZZLE . Trees were visualized using MEGA3 . Nine clearly separated branches on the tree with high bootstrap values (> 85%) were used to build branch-specific HMMs. Sequences from each branch were aligned using CLUSTALW  and poorly-aligned ends were trimmed with BELVU . From each branch-specific alignment a global HMM was built using HMMBUILD and calibrated using HMMCALIBRATE . These HMMs were used for more specific searches and grouping of trGTPases with HMMSEARCH  against the "updated protein database". Searches were repeated at higher sensitivity levels until no more trGTPases were detected (Table 1).
To avoid artificial grouping of other GTPases into trGTPase subfamilies, thirty outgroup HMMs were built starting with 28 known TRAFAC GTPases, excluding trGTPases , CysC and CysD. Additional members of each outgroup were collected by running a BLAST search against "updated protein database" and keeping matches with E < 1e-40 and match length > 80% of query.
The APS kinase domain PF01583.8 from the Pfam database , absent from CysN and present in NodQ, was used to identify NodQ proteins. The ATP sulfurylase phylogenetically unrelated to CysN (ATPS1) was identified with Pfam domain PF01747.7 [101, 102, 105]. The IF2N domain was identified using Pfam domain PF04760 .
Manual validation of trGTPase genes
We used a two-step decision scheme to eliminate protein sequences that cannot act as functional trGTPases. The first filter is based on minimal acceptable protein length (set at 2/3 of the average length of the members of a given subfamily) and integrity of the GTPase domain consensus elements (G1, G3, G4), which eliminates partial proteins (usually parts of pseudogenes annotated as ORFs) and proteins that are not GTPases . We progressed further by analyzing why these seemingly non-functional proteins gave high scores in homology searches with HMMSEARCH. To analyze these cases at the genome level we used Artemis  and SHOWORF, PLOTORF and PRETTYPLOT from the EMBOSS package . We found that in most cases a functional protein could be restored by an alternative gene start or frame-shift (17 cases), which we consider "restorable functionality". In only a few cases is there a partial gene in the genome (1) or genes inactivated by insertion (2 cases, 4 parts). Proteins with "restorable functionality" were added to the set of identified trGTPases.
To calculate a 16S rRNA-based phylogenetic tree, aligned rRNA sequences were obtained from the ribosomal RNA database RDP-II . Columns with more than 80% gaps were removed from the alignment. The phylogenetic tree of ribosomal genes was calculated using fastDNAml .
Nilsson J, Nissen P: Elongation factors on the ribosome. Curr Opin Struct Biol. 2005, 15 (3): 349-354. 10.1016/j.sbi.2005.05.004.
Ramakrishnan V: Ribosome structure and the mechanism of translation. Cell. 2002, 108 (4): 557-572. 10.1016/S0092-8674(02)00619-0.
Allen GS, Zavialov A, Gursky R, Ehrenberg M, Frank J: The cryo-EM structure of a translation initiation complex from Escherichia coli. Cell. 2005, 121 (5): 703-712. 10.1016/j.cell.2005.03.023.
Sergiev PV, Bogdanov AA, Dontsova OA: How can elongation factors EF-G and EF-Tu discriminate the functional state of the ribosome using the same binding site?. FEBS Lett. 2005, 579 (25): 5439-5442.
Marzi S, Knight W, Brandi L, Caserta E, Soboleva N, Hill WE, Gualerzi CO, Lodmell JS: Ribosomal localization of translation initiation factor IF2. Rna. 2003, 9 (8): 958-969. 10.1261/rna.2116303.
Cameron DM, Thompson J, March PE, Dahlberg AE: Initiation factor IF2, thiostrepton and micrococcin prevent the binding of elongation factor G to the Escherichia coli ribosome. J Mol Biol. 2002, 319 (1): 27-35. 10.1016/S0022-2836(02)00235-8.
Cousineau B, Leclerc F, Cedergren R: On the origin of protein synthesis factors: a gene duplication/fusion model. J Mol Evol. 1997, 45 (6): 661-670. 10.1007/PL00006270.
Sikora AE, Zielke R, Datta K, Maddock JR: The Vibrio harveyi GTPase CgtAV is essential and is associated with the 50S ribosomal subunit. J Bacteriol. 2006, 188 (3): 1205-1210. 10.1128/JB.188.3.1205-1210.2006.
Himeno H, Hanawa-Suetsugu K, Kimura T, Takagi K, Sugiyama W, Shirata S, Mikami T, Odagiri F, Osanai Y, Watanabe D, Goto S, Kalachnyuk L, Ushida C, Muto A: A novel GTPase activated by the small subunit of ribosome. Nucleic Acids Res. 2004, 32 (17): 5303-5309. 10.1093/nar/gkh861.
Daigle DM, Brown ED: Studies of the interaction of Escherichia coli YjeQ with the ribosome in vitro. J Bacteriol. 2004, 186 (5): 1381-1387. 10.1128/JB.186.5.1381-1387.2004.
Zhang S, Haldenwang WG: Guanine nucleotides stabilize the binding of Bacillus subtilis Obg to ribosomes. Biochem Biophys Res Commun. 2004, 322 (2): 565-569. 10.1016/j.bbrc.2004.07.154.
Wout P, Pu K, Sullivan SM, Reese V, Zhou S, Lin B, Maddock JR: The Escherichia coli GTPase CgtAE cofractionates with the 50S ribosomal subunit and interacts with SpoT, a ppGpp synthetase/hydrolase. J Bacteriol. 2004, 186 (16): 5249-5257. 10.1128/JB.186.16.5249-5257.2004.
Caldon CE, Yoong P, March PE: Evolution of a molecular switch: universal bacterial GTPases regulate ribosome function. Mol Microbiol. 2001, 41 (2): 289-297. 10.1046/j.1365-2958.2001.02536.x.
Caldon CE, March PE: Function of the universally conserved bacterial GTPases. Curr Opin Microbiol. 2003, 6 (2): 135-139. 10.1016/S1369-5274(03)00037-7.
Pandit SB, Srinivasan N: Survey for g-proteins in the prokaryotic genomes: prediction of functional roles based on classification. Proteins. 2003, 52 (4): 585-597. 10.1002/prot.10420.
Leipe DD, Wolf YI, Koonin EV, Aravind L: Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol. 2002, 317 (1): 41-72. 10.1006/jmbi.2001.5378.
Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, 33 (Database issue): D501-4. 10.1093/nar/gki025.
Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.
Bourne HR, Sanders DA, McCormick F: The GTPase superfamily: a conserved switch for diverse cell functions. Nature. 1990, 348 (6297): 125-132. 10.1038/348125a0.
Butler JS, Springer M, Grunberg-Manago M: AUU-to-AUG mutation in the initiator codon of the translation initiation factor IF3 abolishes translational autocontrol of its own gene (infC) in vivo. Proc Natl Acad Sci U S A. 1987, 84 (12): 4022-4025. 10.1073/pnas.84.12.4022.
Binns N, Masters M: Expression of the Escherichia coli pcnB gene is translationally limited using an inefficient start codon: a second chromosomal example of translation initiated at AUU. Mol Microbiol. 2002, 44 (5): 1287-1298. 10.1046/j.1365-2958.2002.02945.x.
Gurvich OL, Baranov PV, Zhou J, Hammer AW, Gesteland RF, Atkins JF: Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli. Embo J. 2003, 22 (21): 5941-5950. 10.1093/emboj/cdg561.
Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.63. 2004, [http://evolution.genetics.washington.edu/phylip.html]
Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18 (3): 502-504. 10.1093/bioinformatics/18.3.502.
Garrity GM Bell, J. A., Lilburn, T. G.: Taxonomic Outline of the Procaryotes. Bergey's Manual of Systematic Bacteriology, Second Edition. 2004, Springer-Verlag, 5:
Gualerzi CO, Pon CL: Initiation of mRNA translation in prokaryotes. Biochemistry. 1990, 29 (25): 5881-5889. 10.1021/bi00477a001.
Laursen BS, Sorensen HP, Mortensen KK, Sperling-Petersen HU: Initiation of protein synthesis in bacteria. Microbiol Mol Biol Rev. 2005, 69 (1): 101-123. 10.1128/MMBR.69.1.101-123.2005.
Grunberg-Manago M, Dessen P, Pantaloni D, Godefroy-Colburn T, Wolfe AD, Dondon J: Light-scattering studies showing the effect of initiation factors on the reversible dissociation of Escherichia coli ribosomes. J Mol Biol. 1975, 94 (3): 461-478. 10.1016/0022-2836(75)90215-6.
Antoun A, Pavlov MY, Andersson K, Tenson T, Ehrenberg M: The roles of initiation factor 2 and guanosine triphosphate in initiation of protein synthesis. Embo J. 2003, 22 (20): 5593-5601. 10.1093/emboj/cdg525.
Lee JH, Choi SK, Roll-Mecak A, Burley SK, Dever TE: Universal conservation in translation initiation revealed by human and archaeal homologs of bacterial translation initiation factor IF2. Proc Natl Acad Sci U S A. 1999, 96 (8): 4342-4347. 10.1073/pnas.96.8.4342.
Kyrpides NC, Woese CR: Archaeal translation initiation revisited: the initiation factor 2 and eukaryotic initiation factor 2B alpha-beta-delta subunit families. Proc Natl Acad Sci U S A. 1998, 95 (7): 3726-3730. 10.1073/pnas.95.7.3726.
Hashimoto M, Ichimura T, Mizoguchi H, Tanaka K, Fujimitsu K, Keyamura K, Ote T, Yamakawa T, Yamazaki Y, Mori H, Katayama T, Kato J: Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome. Mol Microbiol. 2005, 55 (1): 137-149. 10.1111/j.1365-2958.2004.04386.x.
Laursen BS, de ASSA, Hedegaard J, Moreno JM, Mortensen KK, Sperling-Petersen HU: Structural requirements of the mRNA for intracistronic translation initiation of the enterobacterial infB gene. Genes Cells. 2002, 7 (9): 901-910. 10.1046/j.1365-2443.2002.00571.x.
Nyengaard NR, Mortensen KK, Lassen SF, Hershey JW, Sperling-Petersen HU: Tandem translation of E. coli initiation factor IF2 beta: purification and characterization in vitro of two active forms. Biochem Biophys Res Commun. 1991, 181 (3): 1572-1579. 10.1016/0006-291X(91)92118-4.
Hubert M, Nyengaard NR, Shazand K, Mortensen KK, Lassen SF, Grunberg-Manago M, Sperling-Petersen HU: Tandem translation of Bacillus subtilis initiation factor IF2 in E. coli. Over-expression of infBB.su in E. coli and purification of alpha- and beta-forms of IF2B.su. FEBS Lett. 1992, 312 (2-3): 132-138. 10.1016/0014-5793(92)80920-C.
Antoun A, Pavlov MY, Tenson T, Ehrenberg MM: Ribosome formation from subunits studied by stopped-flow and Rayleigh light scattering. Biol Proced Online. 2004, 6: 35-54. 10.1251/bpo71.
Howe JG, Hershey JW: Initiation factor and ribosome levels are coordinately controlled in Escherichia coli growing at different rates. J Biol Chem. 1983, 258 (3): 1954-1959.
Sacerdot C, Vachon G, Laalami S, Morel-Deville F, Cenatiempo Y, Grunberg-Manago M: Both forms of translational initiation factor IF2 (alpha and beta) are required for maximal growth of Escherichia coli. Evidence for two translational initiation codons for IF2 beta. J Mol Biol. 1992, 225 (1): 67-80. 10.1016/0022-2836(92)91026-L.
Laursen BS, Mortensen KK, Sperling-Petersen HU, Hoffman DW: A conserved structural motif at the N terminus of bacterial translation initiation factor IF2. J Biol Chem. 2003, 278 (18): 16320-16328. 10.1074/jbc.M212960200.
Rodnina MV, Gromadski KB, Kothe U, Wieden HJ: Recognition and selection of tRNA in translation. FEBS Lett. 2005, 579 (4): 938-942. 10.1016/j.febslet.2004.11.048.
Ogle JM, Ramakrishnan V: Structural insights into translational fidelity. Annu Rev Biochem. 2005, 74: 129-177. 10.1146/annurev.biochem.74.061903.155440.
Thompson RC, Stone PJ: Proofreading of the codon-anticodon interaction on ribosomes. Proc Natl Acad Sci U S A. 1977, 74 (1): 198-202. 10.1073/pnas.74.1.198.
Ruusala T, Ehrenberg M, Kurland CG: Is there proofreading during polypeptide synthesis?. Embo J. 1982, 1 (6): 741-745.
Kawashima T, Berthet-Colominas C, Wulff M, Cusack S, Leberman R: The structure of the Escherichia coli EF-Tu.EF-Ts complex at 2.5 A resolution. Nature. 1996, 379 (6565): 511-518. 10.1038/379511a0.
Wang Y, Jiang Y, Meyering-Voss M, Sprinzl M, Sigler PB: Crystal structure of the EF-Tu.EF-Ts complex from Thermus thermophilus. Nat Struct Biol. 1997, 4 (8): 650-656. 10.1038/nsb0897-650.
Kaziro Y: The role of guanosine 5'-triphosphate in polypeptide chain elongation. Biochim Biophys Acta. 1978, 505(1): 95-127.
Lathe WC, Bork P: Evolution of tuf genes: ancient duplication, differential loss and gene conversion. FEBS Lett. 2001, 502 (3): 113-116. 10.1016/S0014-5793(01)02639-4.
Vijgenboom E, Bosch L: Translational frameshifts induced by mutant species of the polypeptide chain elongation factor Tu of Escherichia coli. J Biol Chem. 1989, 264 (22): 13012-13017.
Abdulkarim F, Hughes D: Homologous recombination between the tuf genes of Salmonella typhimurium. J Mol Biol. 1996, 260 (4): 506-522. 10.1006/jmbi.1996.0418.
Hughes D: Co-evolution of the tuf genes links gene conversion with the generation of chromosomal inversions. J Mol Biol. 2000, 297 (2): 355-364. 10.1006/jmbi.2000.3587.
Arwidsson O, Hughes D: Evidence against reciprocal recombination as the basis for tuf gene conversion in Salmonella enterica serovar Typhimurium. J Mol Biol. 2004, 338 (3): 463-467. 10.1016/j.jmb.2004.03.002.
Liao D: Gene conversion drives within genic sequences: concerted evolution of ribosomal RNA genes in bacteria and archaea. J Mol Evol. 2000, 51 (4): 305-317.
Hillis DM, Moritz C, Porter CA, Baker RJ: Evidence for biased gene conversion in concerted evolution of ribosomal DNA. Science. 1991, 251 (4991): 308-310. 10.1126/science.1987647.
Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF: Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J Bacteriol. 2004, 186 (9): 2629-2635. 10.1128/JB.186.9.2629-2635.2004.
Dorner S, Brunelle JL, Sharma D, Green R: The hybrid state of tRNA binding is an authentic translation elongation intermediate. Nat Struct Mol Biol. 2006, 13 (3): 234-241. 10.1038/nsmb1060.
Zavialov AV, Hauryliuk VV, Ehrenberg M: Guanine-nucleotide exchange on ribosome-bound elongation factor G initiates the translocation of tRNAs. J Biol. 2005, 4 (2): 9-10.1186/jbiol24.
Katunin VI, Savelsbergh A, Rodnina MV, Wintermeyer W: Coupling of GTP hydrolysis by elongation factor G to translocation and factor recycling on the ribosome. Biochemistry. 2002, 41 (42): 12806-12812. 10.1021/bi0264871.
Wintermeyer W, Savelsbergh A, Semenkov YP, Katunin VI, Rodnina MV: Mechanism of elongation factor G function in tRNA translocation on the ribosome. Cold Spring Harb Symp Quant Biol. 2001, 66: 449-458. 10.1101/sqb.2001.66.449.
Savelsbergh A, Mohr D, Kothe U, Wintermeyer W, Rodnina MV: Control of phosphate release from elongation factor G by ribosomal protein L7/12. Embo J. 2005, 24 (24): 4316-4323. 10.1038/sj.emboj.7600884.
Diaconu M, Kothe U, Schlunzen F, Fischer N, Harms JM, Tonevitsky AG, Stark H, Rodnina MV, Wahl MC: Structural basis for the function of the ribosomal L7/12 stalk in factor binding and GTPase activation. Cell. 2005, 121 (7): 991-1004. 10.1016/j.cell.2005.04.015.
Hirokawa G, Kiel MC, Muto A, Selmer M, Raj VS, Liljas A, Igarashi K, Kaji H, Kaji A: Post-termination complex disassembly by ribosome recycling factor, a functional tRNA mimic. Embo J. 2002, 21 (9): 2272-2281. 10.1093/emboj/21.9.2272.
Zavialov AV, Hauryliuk VV, Ehrenberg M: Splitting of the posttermination ribosome into subunits by the concerted action of RRF and EF-G. Mol Cell. 2005, 18 (6): 675-686. 10.1016/j.molcel.2005.05.016.
Peske F, Rodnina MV, Wintermeyer W: Sequence of steps in ribosome recycling as defined by kinetic analysis. Mol Cell. 2005, 18 (4): 403-412. 10.1016/j.molcel.2005.04.009.
Fujiwara T, Ito K, Yamami T, Nakamura Y: Ribosome recycling factor disassembles the post-termination ribosomal complex independent of the ribosomal translocase activity of elongation factor G. Mol Microbiol. 2004, 53 (2): 517-528. 10.1111/j.1365-2958.2004.04156.x.
Chopra I, Roberts M: Tetracycline antibiotics: mode of action, applications, molecular biology, and epidemiology of bacterial resistance. Microbiol Mol Biol Rev. 2001, 65 (2): 232-60 ; second page, table of contents. 10.1128/MMBR.65.2.232-260.2001.
Roberts MC: Update on acquired tetracycline resistance genes. FEMS Microbiol Lett. 2005, 245 (2): 195-203. 10.1016/j.femsle.2005.02.034.
Connell SR, Tracz DM, Nierhaus KH, Taylor DE: Ribosomal protection proteins and their mechanism of tetracycline resistance. Antimicrob Agents Chemother. 2003, 47 (12): 3675-3681. 10.1128/AAC.47.12.3675-3681.2003.
Connell SR, Trieber CA, Dinos GP, Einfeldt E, Taylor DE, Nierhaus KH: Mechanism of Tet(O)-mediated tetracycline resistance. Embo J. 2003, 22 (4): 945-953. 10.1093/emboj/cdg093.
Spahn CM, Blaha G, Agrawal RK, Penczek P, Grassucci RA, Trieber CA, Connell SR, Taylor DE, Nierhaus KH, Frank J: Localization of the ribosomal protection protein Tet(O) on the ribosome and the mechanism of tetracycline resistance. Mol Cell. 2001, 7 (5): 1037-1045. 10.1016/S1097-2765(01)00238-6.
March PE, Inouye M: GTP-binding membrane protein of Escherichia coli with sequence homology to initiation factor 2 and elongation factors Tu and G. Proc Natl Acad Sci U S A. 1985, 82 (22): 7500-7504. 10.1073/pnas.82.22.7500.
Colca JR, McDonald WG, Waldon DJ, Thomasco LM, Gadwood RC, Lund ET, Cavey GS, Mathews WR, Adams LD, Cecil ET, Pearson JD, Bock JH, Mott JE, Shinabarger DL, Xiong L, Mankin AS: Cross-linking in the living cell locates the site of action of oxazolidinone antibiotics. J Biol Chem. 2003, 278 (24): 21972-21979. 10.1074/jbc.M302109200.
Qin Y, Polacek N, Vesper O, Staub E, Einfeldt E, Wilson DN, Nierhaus KH: The highly conserved LepA is a ribosomal elongation factor that back-translocates the ribosome. Cell. 2006, 127 (4): 721-733. 10.1016/j.cell.2006.09.037.
Dibb NJ, Wolfe PB: lep operon proximal gene is not required for growth or secretion by Escherichia coli. J Bacteriol. 1986, 166 (1): 83-87.
Ito K, Uno M, Nakamura Y: A tripeptide 'anticodon' deciphers stop codons in messenger RNA. Nature. 2000, 403 (6770): 680-684. 10.1038/35001115.
Petry S, Brodersen DE, Murphy FV, Dunham CM, Selmer M, Tarry MJ, Kelley AC, Ramakrishnan V: Crystal structures of the ribosome in complex with release factors RF1 and RF2 bound to a cognate stop codon. Cell. 2005, 123 (7): 1255-1266. 10.1016/j.cell.2005.09.039.
Zavialov AV, Buckingham RH, Ehrenberg M: A posttermination ribosomal complex is the guanine nucleotide exchange factor for peptide release factor RF3. Cell. 2001, 107 (1): 115-124. 10.1016/S0092-8674(01)00508-6.
Freistroffer DV, Pavlov MY, MacDougall J, Buckingham RH, Ehrenberg M: Release factor RF3 in E.coli accelerates the dissociation of release factors RF1 and RF2 from the ribosome in a GTP-dependent manner. Embo J. 1997, 16 (13): 4126-4133. 10.1093/emboj/16.13.4126.
Grentzmann G, Brechemier-Baey D, Heurgue V, Mora L, Buckingham RH: Localization and characterization of the gene encoding release factor RF3 in Escherichia coli. Proc Natl Acad Sci U S A. 1994, 91 (13): 5848-5852. 10.1073/pnas.91.13.5848.
Mikuni O, Ito K, Moffat J, Matsumura K, McCaughan K, Nobukuni T, Tate W, Nakamura Y: Identification of the prfC gene, which encodes peptide-chain-release factor 3 of Escherichia coli. Proc Natl Acad Sci U S A. 1994, 91 (13): 5798-5802. 10.1073/pnas.91.13.5798.
Inagaki Y, Ford Doolittle W: Evolution of the eukaryotic translation termination system: origins of release factors. Mol Biol Evol. 2000, 17 (6): 882-889.
Shin BS, Maag D, Roll-Mecak A, Arefin MS, Burley SK, Lorsch JR, Dever TE: Uncoupling of initiation factor eIF5B/IF2 GTPase and translational activities by mutations that lower ribosome affinity. Cell. 2002, 111 (7): 1015-1025. 10.1016/S0092-8674(02)01171-6.
Bock A, Forchhammer K, Heider J, Leinfelder W, Sawers G, Veprek B, Zinoni F: Selenocysteine: the 21st amino acid. Mol Microbiol. 1991, 5 (3): 515-520. 10.1111/j.1365-2958.1991.tb00722.x.
Zinoni F, Heider J, Bock A: Features of the formate dehydrogenase mRNA necessary for decoding of the UGA codon as selenocysteine. Proc Natl Acad Sci U S A. 1990, 87 (12): 4660-4664. 10.1073/pnas.87.12.4660.
Huttenhofer A, Heider J, Bock A: Interaction of the Escherichia coli fdhF mRNA hairpin promoting selenocysteine incorporation with the ribosome. Nucleic Acids Res. 1996, 24 (20): 3903-3910. 10.1093/nar/24.20.3903.
Yoshizawa S, Rasubala L, Ose T, Kohda D, Fourmy D, Maenaka K: Structural basis for mRNA recognition by elongation factor SelB. Nat Struct Mol Biol. 2005, 12 (2): 198-203. 10.1038/nsmb890.
Leibundgut M, Frick C, Thanbichler M, Bock A, Ban N: Selenocysteine tRNA-specific elongation factor SelB is a structural chimaera of elongation and initiation factors. Embo J. 2005, 24 (1): 11-22. 10.1038/sj.emboj.7600505.
Commans S, Bock A: Selenocysteine inserting tRNAs: an overview. FEMS Microbiol Rev. 1999, 23 (3): 335-351. 10.1111/j.1574-6976.1999.tb00403.x.
Matsugi J, Murao K: Genomic investigation of the system for selenocysteine incorporation in the bacterial domain. Biochim Biophys Acta. 2004, 1676 (1): 23-32.
Romero H, Zhang Y, Gladyshev VN, Salinas G: Evolution of selenium utilization traits. Genome Biol. 2005, 6 (8): R66-10.1186/gb-2005-6-8-r66.
Farris M, Grant A, Richardson TB, O'Connor CD: BipA: a tyrosine-phosphorylated GTPase that mediates interactions between enteropathogenic Escherichia coli (EPEC) and epithelial cells. Mol Microbiol. 1998, 28 (2): 265-279. 10.1046/j.1365-2958.1998.00793.x.
Grant AJ, Farris M, Alefounder P, Williams PH, Woodward MJ, O'Connor CD: Co-ordination of pathogenicity island expression by the BipA GTPase in enteropathogenic Escherichia coli (EPEC). Mol Microbiol. 2003, 48 (2): 507-521. 10.1046/j.1365-2958.2003.t01-1-03447.x.
Rowe S, Hodson N, Griffiths G, Roberts IS: Regulation of the Escherichia coli K5 capsule gene cluster: evidence for the roles of H-NS, BipA, and integration host factor in regulation of group 2 capsule gene clusters in pathogenic E. coli. J Bacteriol. 2000, 182 (10): 2741-2745. 10.1128/JB.182.10.2741-2745.2000.
Pfennig PL, Flower AM: BipA is required for growth of Escherichia coi K12 at low temperature. Mol Genet Genomics. 2001, 266 (2): 313-317. 10.1007/s004380100559.
Kiss E, Huguet T, Poinsot V, Batut J: The typA gene is required for stress adaptation as well as for symbiosis of Sinorhizobium meliloti 1021 with certain Medicago truncatula lines. Mol Plant Microbe Interact. 2004, 17 (3): 235-244.
Owens RM, Pritchard G, Skipp P, Hodey M, Connell SR, Nierhaus KH, O'Connor CD: A dedicated translation factor controls the synthesis of the global regulator Fis. Embo J. 2004, 23 (16): 3375-3385. 10.1038/sj.emboj.7600343.
Kredich KM: Biosynthesis of Cysteine. Escherichia coli and Salmonella: Cellular and Molecular Biology Neidhardt FC, Curtis III R, Ingraham JL, Lin ECC, Low KB, Magasanik B, Reznikoff WS, Riley M, Schaechter M, Umbarger HE , editors. 1996, Washington DC , ASM Press, 2: 514–527-
Schwedock JS, Long SR: Rhizobium meliloti genes involved in sulfate activation: the two copies of nodPQ and a new locus, saa. Genetics. 1992, 132 (4): 899-909.
Schwedock J, Long SR: ATP sulphurylase activity of the nodP and nodQ gene products of Rhizobium meliloti. Nature. 1990, 348 (6302): 644-647. 10.1038/348644a0.
Inagaki Y, Doolittle WF, Baldauf SL, Roger AJ: Lateral transfer of an EF-1alpha gene: origin and evolution of the large subunit of ATP sulfurylase in eubacteria. Curr Biol. 2002, 12 (9): 772-776. 10.1016/S0960-9822(02)00816-3.
Mougous JD, Lee DH, Hubbard SC, Schelle MW, Vocadlo DJ, Berger JM, Bertozzi CR: Molecular basis for G protein control of the prokaryotic ATP sulfurylase. Mol Cell. 2006, 21 (1): 109-122. 10.1016/j.molcel.2005.10.034.
Rosenthal E, Leustek T: A multifunctional Urechis caupo protein, PAPS synthetase, has both ATP sulfurylase and APS kinase activities. Gene. 1995, 165 (2): 243-248. 10.1016/0378-1119(95)00450-K.
Kurima K, Warman ML, Krishnan S, Domowicz M, Krueger RC, Deyrup A, Schwartz NB: A member of a family of sulfate-activating enzymes causes murine brachymorphism. Proc Natl Acad Sci U S A. 1998, 95 (15): 8681-8685. 10.1073/pnas.95.15.8681.
NCBI: Bacterial sequence database. [ftp://ftp.ncbi.nih.gov/genomes/Bacteria/]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, 32 (Database issue): D138-41. 10.1093/nar/gkh121.
Sonnhammer EL, Hollich V: Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinformatics. 2005, 6 (1): 108-10.1186/1471-2105-6-108.
Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5 (2): 150-163. 10.1093/bib/5.2.150.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
MacRae IJ, Rose AB, Segel IH: Adenosine 5'-phosphosulfate kinase from Penicillium chrysogenum. site-directed mutagenesis at putative phosphoryl-accepting and ATP P-loop residues. J Biol Chem. 1998, 273 (44): 28583-28589. 10.1074/jbc.273.44.28583.
Bourne HR, Sanders DA, McCormick F: The GTPase superfamily: conserved structure and molecular mechanism. Nature. 1991, 349 (6305): 117-127. 10.1038/349117a0.
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.
Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.
Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005, 33 (Database issue): D294-6. 10.1093/nar/gki038.
Olsen GJ, Matsuda H, Hagstrom R, Overbeek R: fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput Appl Biosci. 1994, 10 (1): 41-48.
We thank Umesh Varshney for discussions initiating the current study. We thank Ülo Maiväli, Niilo Kaldalu and Måns Ehrenberg for valuable comments on the manuscript. This work was supported by The Wellcome Trust International Senior Fellowship (070210/Z/03/Z)(TT), by the Estonian Science Foundation grant no. 6768 (TT) and by the Estonian Science Foundation grant no. 6041 (MR). The English language was corrected by Biomedes, UK.
TM carried out the analysis and helped to draft the manuscript. MR and TT conceived of the study, and participated in its design and coordination and drafted the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 2: Full list of proteins found by the program HMMSEARCH with the threshold E-1. The first part contains a list of 1314 intact proteins; the second part contains a list of exceptions that were later included in the dataset (17 proteins); the third part contains a list of excluded proteins (5). (XLS 152 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.