In silico identification of genes involved in selenium metabolism: evidence for a third selenium utilization trait

Background Selenium (Se) is a trace element that occurs in proteins in the form of selenocysteine (Sec) and in tRNAs in the form of selenouridine (SeU). Selenophosphate synthetase (SelD) is required for both utilization traits. However, previous research also revealed SelDs in two organisms lacking Sec and SeU, suggesting a possible additional use of Se that is dependent on SelD. Results In this study, we conducted comparative genomics and phylogenetic analyses to characterize genes involved in Se utilization. Candidate genes identified included SelA/SelB and YbbB that define Sec and SeU pathways, respectively, and NADH oxidoreductase that is predicted to generate a SelD substrate. In addition, among 227 organisms containing SelD, 10 prokaryotes were identified that lacked SelA/SelB and YbbB. Investigation of selD neighboring genes in these organisms revealed a SirA-like protein and two hypothetical proteins HP1 and HP2 that were strongly linked to a novel Se utilization. With these new signature proteins, 32 bacteria and archaea were found that utilized these proteins, likely as part of the new Se utilization trait. Metabolic labeling of one organism containing an orphan SelD, Enterococcus faecalis, with 75Se revealed a protein containing labile Se species that could be released by treatment with reducing agents, suggesting non-Sec utilization of Se in this organism. Conclusion These studies suggest the occurrence of a third Se utilization trait in bacteria and archaea.


Background
Selenium (Se) is an essential micronutrient for many organisms in the three domains of life. The best known biological functions of Se are exerted by selenocysteine (Sec) residues [1][2][3]. Sec, known as the 21 st amino acid, is co-translationally inserted into proteins by recoding an opal (UGA) codon from stop to Sec function [4][5][6]. These UGA codons are recognized by a complex molecular machinery that has common core components, but also differences among the three domains of life [3,4,[7][8][9][10].
The mechanism of Sec insertion into protein in response to UGA has been most thoroughly elucidated in Escherichia coli [3,4,[11][12][13]. Bacterial selenoprotein mRNAs carry a Sec insertion sequence (SECIS) element immediately downstream of Sec-encoding UGA codons [4,5]. The SECIS element binds a Sec-specific elongation factor, SelB, and forms a complex with tRNA Sec (SelC), whose UCA anticodon matches the UGA codon. tRNA Sec is initially acylated with serine by seryl-tRNA synthetase and is then converted to Sec-tRNA Sec by Sec synthase (SelA). SelA utilizes selenophosphate as the selenium donor, which in turn is synthesized by selenophosphate synthetase (SelD).
In some prokaryotes, Se (also in the form of selenophosphate) is used for the biosynthesis of a modified tRNA nucleotide, 5-methylaminomethyl-2-selenouridine (mnm 5 Se 2 U, or SeU), which is located in the wobble position of the anticodons of tRNA Lys , tRNA Glu , and tRNA Gln [14][15][16]. The proposed function of SeU involves codonanticodon interactions that help base pair discrimination at the wobble position and/or translation efficiency [16,17]. A 2-selenouridine synthase (YbbB) is necessary to replace a sulfur atom in 2-thiouridine in these tRNAs with Se [18].
In addition to Sec and SeU, Se can be utilized in the form of a cofactor in certain molybdenum (Mo)-containing hydroxylases [19][20][21][22][23]. Nicotinic acid hydroxylase and xanthine dehydrogenase are the best known representatives of this protein class. In these enzymes, Se is covalently bound to Mo in the active site, but the specific structure of the Se cofactor is not known. In nicotinic acid hydroxylase, Se is lost during protein storage and during simple SDS-PAGE procedures [20]. These properties made it difficult to characterize this class of proteins and determine the mechanism of Se cofactor structure and biosynthesis.
Recently, we analyzed evolutionary dynamics of Sec and SeU utilization traits (i.e., analyzed genes involved in the corresponding biosynthetic pathways) in prokaryotes and reported the occurrence of orphan SelD proteins in two organisms lacking known components of Sec and SeU traits or genes encoding selenoproteins [24]. These organisms included a bacterium, Enterococcus faecalis, and an archaeon, Haloarcula marismortui. The SelD sequences in the two organisms are typical SelDs containing a conserved Cys residue in the predicted active site and clustering with other SelD sequences [24]. These proteins could be distinguished from thiamine-monophosphate kinase, the hydrogenase maturation factor HypE, and other proteins that, like SelDs, have an aminoimidazole ribonucleotide synthetase (AIRS) domain. The curious presence of orphan SelD in these prokaryotes suggested an additional, unknown use of Se that is dependent on SelD.
In this study, we carried out searches in completely sequenced prokaryotic genomes for machinery involved in Se utilization. Known components, i.e., SelA, SelB and YbbB, could be easily identified by comparative genom-ics, and the analyses also have generated evidence for additional proteins involved. Since neighboring genes of selD may provide potential information regarding Se utilization, we further employed comparative genomics tools to identify candidate genes involved in the third, SelD-based trait, in prokaryotes. Finally, we identified many organisms containing this new trait and carried out experimental analyses in one such organism. Overall, these data provide evidence for an additional use of Se in nature.

Characterization of SelD-dependent pathways of Se utilization
Since SelD (COG0709) has been shown to be a key factor for Se utilization by generating a Se donor compound (and therefore discriminating Se from sulfur for further use), identification of functional linkages involving SelD may help characterize the pathways of Se utilization. We initially used STRING [ [25], the interactor was set to COG-mode] to examine such functional linkages based on neighborhood, gene fusion and co-occurrence analyses. The protein with the best score was YbbB, a SeU synthase (COG2603). This gene was often located in the same operon with selD and the two proteins also showed similar patterns of occurrence [24]. The next SelD link was SelB (COG3276), which was also identified by gene neighborhood and co-occurrence, but the linkage was independent of YbbB (because selD formed operons with either ybbB or selB but rarely with both). As expected, SelB was most closely linked with Sec synthase SelA (COG1921), which also showed a strong association with SelD.
The following SelD link was NADH oxidoreductase homolog (COG1252), which was fused with SelD in cyanobacteria and several other organisms [24]. This association suggests that NADH oxidoreductase may be the reductant for a Se compound and that the reduced form of this compound may be utilized by SelD for biosynthesis of selenophosphate. Like YbbB and SelA/SelB, NADH oxidoreductase function likely corresponds to the known use of Se. Excluding spurious predictions due to protein misannotation, the following hit was a SirA-like protein that belonged to COG0425 (predicted redox protein, regulator of disulfide bond formation). This protein was associated with SelD through gene neighborhood (i.e., location of genes next to each other, or in close vicinity, in the genome) predictions of STRING. Thus, STRING-based analysis identified known proteins involved in two pathways of Se utilization (e.g., SelA/SelB in the Sec pathway, and YbbB in the SeU pathway) and suggested the role of NADH dehydrogenase in generating a SelD substrate and of SirA-like protein in an unknown SelD-linked process.

Organisms with orphan SelDs
Next, we examined SelD-linked processes in more detail, and in particular identities of SelD-linked genes that showed no association with the Sec and SeU pathways. Among 589 sequenced prokaryotic genomes, we identified 227 SelD-containing organisms (219 bacteria and 8 archaea). Details are shown in Table S1 [see Additional file 1]. Of these, 140 bacteria and 7 archaea possessed the Sec pathway, whereas 148 bacteria and 6 archaea utilized SeU (the two traits partially overlapped). In addition, 10 SelD-containing organisms were found that had orphan SelD (i.e., they had SelD but lacked both SelABC and YbbB), including previously described E. faecalis and H. marismortui. Additional such organisms included Anaerostipes caccae, Clostridium butyricum, C. phytofermentans, Faecalibacterium prausnitzii, Ruminococcus gnavus, R. obeum, R. torques and Vibrio shilonii. It should be noted that except for E. faecalis, H. marismortui and V. shilonii, all other species belonged to Firmicutes/Clostridia where many species possess Sec and/or SeU utilization traits. However, previous studies have shown a highly dynamic evolution of both traits, which often results in the loss of these traits in bacterial phyla [24]. Moreover, of the 10 organisms with orphan SelDs, C. phytofermentans, E. faecalis and H. marismortui have been completely sequenced, and most other genomes are characterized by high sequence coverage (e.g., 9.8× for A. caccae, 8.53× for C. butyricum 8.7× for R. gnavus, 8.9× for R. obeum and 11.6× for R. torques). In addition, selD typically clusters with Sec and/or SeU biosynthesis/insertion genes, whereas in these organisms, other genes cluster with selD. Thus, the possibility that all genes involved in Sec or SeU utilization have not been sequenced or annotated in the genomes containing orphan SelD is extremely low. The genomic context of selD in the three complete orphan SelD-containing genomes is shown in Fig. 1. Genomic context of the other 7 organisms is shown in Fig. S1 [see Additional file 2].

A third pathway of Se utilization
To investigate a possible new SelD-associated pathway, 10 genes upstream and downstream of selD in the genomes with orphan SelDs were examined in detail. First, we found that the sirA-like gene (EF2566, SirA-like domain is located in the N-terminal of the protein, the C-terminal domain is a distant homolog of DsrE family) is located next to selD in several organisms in different bacterial Genomic context of selD in completely sequenced genomes containing orphan SelD Figure 1 Genomic context of selD in completely sequenced genomes containing orphan SelD. Candidate genes in the completed genomes of Clostridium phytofermentans, Enterococcus faecalis and Haloarcula marismortui ATCC 43049 plasmid pNG700 are color coded. Coding direction is also indicated. Next, we found that two additional proteins, hypothetical proteins HP1 (EF2563) and HP2 (EF2564), co-occurred in SelD-containing organisms, including all orphan SelDcontaining organisms, although homologs of each protein could also be found in organisms lacking SelD. Phylogenetic analyses of these two proteins are shown in Fig.  2. A total of 32 organisms (14% of all SelD-containing organisms, most are Firmicutes/Clostridia) containing both HP1 and HP2 proteins as well as SelD were identified. We noticed that all HP1 sequences in organisms having SelD and HP2 were clustered in one subfamily, suggesting that these sequences might be functionally linked to SelD. Orthologs of HP1 were also found in additional SelD-containing organisms (e.g., Burkholderia vietnamiensis and Azorhizobium caulinodans) which did not have HP2. Although significant similarity was observed between sequences in this putative subfamily and other homologs (e.g., e-value is 8e-13 and identity is over 33% between E. faecalis and B. vietnamiensis), multiple alignment of HP1 sequences suggested several specific residues which are only present in the SelD-linked subfamily (Fig.  3). Therefore, it appears that these HP1 proteins form a separate subfamily which is involved in the third Se utilization trait, perhaps distinguished by some of these conserved residues. In addition, most HP2 sequences were found in organisms containing both SelD and HP1, except for 5 organisms which lacked HP1 (4 SelD-lacking and 1 SelD-containing). Previously, we observed that homologs of SelA, a key factor in Sec biosynthesis in bacteria, are also found in organisms that lack the Sec-decoding trait, suggesting that SelA (or its close homologs) might have acquired a new function in these organisms [24]. Similarly, HP1 or HP2 homologs may also have additional functions in organisms lacking SelD. Co-occurrence of SelD, HP1 and HP2 might provide an initial screen for identifying organisms with additional utilization of Se. Although two thirds of these organisms also possess either Sec or SeU utilization traits, the fact that 10 out of 32 organisms belonging to three different phyla (Firmicutes/Clostridia, Firmicutes/Lactobacillales and Proteobacteria/gamma/Vibrionaceae) possess orphan Phylogenetic analysis of HP1 and HP2  SelDs argues against the possibility that these organisms are simultaneously in the process of acquiring the Sec or SeU traits or of losing such traits.
Other candidate genes in the vicinity of selD in E. faecalis genome, e.g., EF2565 (Hypothetical protein), EF2568 (COG0520, CsdB) and EF2572 (ModE), showed wider distribution than SelD itself. However, some members of these families were selD neighbors in several organisms, and some were even fused with HP1 or HP2. These included EF2569 (a MobA-related protein), EF2570 (this protein is annotated as aldehyde oxidoreductase which belongs to xanthine oxidase family, and contains both [2Fe-2S]-binding and molybdopterin-binding domains) and EF2571 (COG1975, XdhC, Xanthine and CO dehydrogenases maturation factor, XdhC/CoxF family). These observations suggest functional links among these pro-teins. Table 1 shows the genomic location of all candidate genes (including selD and sirA-like) in the 32 organisms defined by HP1, HP2 and SelD proteins. Although selD was found to be in vicinity of HP1 and HP2 genes only in 5 organisms, all of them were orphan SelD-containing prokaryotes.
The SirA-like protein was also found in all 32 organisms possessing HP1 and HP2 proteins although some of these organisms contained its slightly more distant homologs (e.g., H. marismortui and E. coli). As discussed above, selD is often clustered with sirA-like, and this situation was found in 12 out of 32 organisms. Close homologs of SirAlike protein were also detected in several organisms lacking HP1 and HP2, such as Methanococcus maripaludis (Secutilizing archaea), Clostridium perfringens and Thermoanaerobacter tengcongensis (Sec-utilizing bacteria) and Por-Multiple alignment of hypothetical protein HP1 Figure 3 Multiple alignment of hypothetical protein HP1. Representative sequences were divided into two groups: SelD-related subgroup and other homologs. Residues which are strictly conserved in the SelD-related subgroup are shown in red background. Other residues shown in white on black or grey are conserved in homologs. Organisms containing orphan SelD are highlighted in pink font.
To further examine functional linkages among SelD, SirAlike, HP1 and HP2 proteins, we, again, used STRING (neighborhood and co-occurrence tools) and searched for functional associations involving SirA-like, HP1 and HP2. Since no known COG is associated with HP1 or HP2, and SirA-like belongs to a large family of SirA proteins defined by a single COG, we selected a "search by protein sequence" option and set the interactor tool to a protein mode. Such configuration can provide maximum sensitivity although it has a slightly lower coverage compared with the COG mode [25]. Top candidates are shown in Table 2, and SelD was among these candidates for each examined protein. In addition, SelD was the top functional link for SirA-like. Although both SirA-like and HP1/ HP2 seem to be functionally linked to SelD, there are important differences among these proteins and which of these proteins best define the putative third Se utilization trait is not clear. It is possible that SirA-like protein is involved in Se metabolism in all three utilization traits whereas HP1 and HP2 are only specific for the third Se utilization trait.
Several other detected selD neighboring genes were often located in the same operon. One exception was that we could not detect homologs of MobA-related protein (COG2068) in Shigella dysenteriae. However, this genome is not yet complete. Previous studies showed that some of these genes are involved in the formation and utilization of molybdopterin (MPT), which coordinated Mo thereby generating the Mo cofactor in Mo-dependent enzymes [26][27][28]. For example, XdhC is present in various Mo-utilizing organisms and is involved in Mo cofactor binding and insertion into xanthine dehydrogenase [29]. It is possible that Se (in the form of selenophosphate) is used as an additional cofactor that supports Mo utilization in certain organisms. The predicted aldehyde oxidoreductase in E. faecalis, which belongs to Mo-dependent xanthine oxidase family [27,28,30] and is often found to be clustered with HP1 and/or HP2 (see Table 1), might be a potential user which utilizes both Se and Mo. However, phylogenetic analysis of aldehyde oxidoreductase (both large and small subunit) did not yield a subfamily formed by organisms containing the new Se utilization trait (instead, they are scattered in different branches, data not shown).

Metabolically labeled Se-binding protein in E. faecalis
To directly analyze the use of Se in E. faecalis, we metabolically labeled this organism with 75 Se under aerobic and anaerobic conditions, and in parallel we labeled E. coli strain Nova Blue cells as a control. In both conditions, a 30 kDa 75 Se-labeled band was observed in E. faecalis extracts (SDS-PAGE in the absence of reducing agents), whereas as expected, E. coli showed bands in the 80-110 kDa region and the labeling pattern was different in aerobic and anaerobic conditions (Fig. 6A). After treatment with DTT (with or without heating), the band disappeared (Fig. 6B). 2-Mercatoethanol was also effective in releasing Se, whereas these treatments had no influence on the 75 Se bands in E. coli extracts as these corresponded to Sec-containing proteins (data not shown). The observations suggest that E. faecalis does utilize Se, that this element occurs in a protein of ~30 kDa and that this Se species is labile and is not Sec. Therefore, it is possible that this 30 kDa protein may be involved in the third SelD-related, Se utilization trait. It should be noted, however, that an additional utilization of Se in E. faecalis unrelated to SelD function could not be excluded. Independent of its nature, both computational and experimental data exclude a possibility of Sec and SeU use and suggest a novel use of Se in this organism.
The large subunits of proteins in the xanthine oxidase family (COG1529, CoxL), whose molecular weights are over 80 kDa, bind MPT (27,31,32). The detected 30 kDa protein in E. faecalis is inconsistent with the aldehyde oxidoreductase large subunit. Similarly, the aldehyde oxidoreductase small subunit (CoxS), which binds 2Fe-2S, was also excluded because it is fused with the large subunit in E. faecalis. Thus, the 30 kDa Se-binding protein is unlikely to be a member of the Mo-dependent hydroxylase family. It is interesting, however, that both HP1 and HP2 have similar predicted molecular weights (28.5 kDa and 28.4 kDa, respectively). Thus, one of them might be the detected protein.
We carried out a number of chromatographic steps to purify the 30 kDa Se-binding protein. It was binding both DEAE-Sepharose and Phenyl-Sepharose and could be enriched on these columns. However, lability and consistent loss of Se precluded its identification. We also attempted two-dimensional PAGE analysis of the Se-binding protein (under non-reducing conditions), but no 75 Se radioactive spot was found following this procedure, suggesting that two-dimensional PAGE conditions also led to the release of Se from the Se-binding protein.   appears to be the third trait of Se utilization in prokaryotes.

Conclusion
In this study, we carried out comparative genomics and phylogenetic analyses to identify new genes linked to Se utilization in prokaryotes. We identified several organisms with orphan SelD that we predict to possess the third Se utilization trait, which is not limited to these species. SelD, HP1, HP2 as well as SirA-like were identified as the best candidate signature genes for this trait. We further directly demonstrated the use of Se in E. faecalis by detecting a 30 kDa protein containing a non-Sec, non-SeU labile Se species. It cannot be excluded that Se is used as a co-factor for certain Mo hydroxylases (known to contain a labile Se cofactor), but current evidence does not provide strong support for this possibility. Further studies are required to determine whether the 30 kDa Se-binding protein or other proteins in organisms with orphan SelDs represent the use of Se in this organism, or it is an intermediate state for further delivery to other proteins, such as Mo-dependent hydroxylases.

Databases, genomes and sequences
Sequenced prokaryotic genomes from current Entrez Microbial Genome Project were used in this study (541 bacterial and 48 archaeal genomes; Feb 1, 2008). Due to the large number of sequenced strains for some bacterial species, we utilized only one strain from each species (e.g., Escherichia coli K12 represented all Escherichia coli).
We used E. coli SelA, SelB, SelD, and YbbB sequences as queries to search for components of Sec-decoding and SeU traits based on previously used criteria [24]. In addition, we selected 10 genes upstream and downstream of selD in the genomes with orphan SelDs. A list of selDflanking genes in E. faecalis is shown in Table 3. For each of these proteins (including SelD), TBLASTN [33] was initially used to identify genes coding for homologs with a cutoff of E-value ≤ 0.01. Orthologous proteins were defined as bidirectional best hits [34]. When necessary, orthologs were also confirmed by genomic location analysis or building phylogenetic trees for the corresponding protein families.

Multiple sequence alignment and phylogenetic analysis
Sequences were aligned with CLUSTALW [35] using default parameters. Ambiguous alignments in highly var-Metabolic labeling of E. faecalis with 75 Se and analysis of Se-binding proteins iable (gap-rich) regions were excluded. The resulting multiple alignments were then checked for conservation of functional residues and manually edited. Phylogenetic analyses were performed using PHYLIP programs [36]. Pairwise distance matrices were calculated by PROTDIST to estimate the expected amino acid replacements per position. Neighbor-joining (NJ) trees were obtained with NEIGHBOR and the most parsimonious trees were determined with PROTPARS. Robustness of these trees was evaluated by maximum likelihood (ML) analysis with PHYML [37] and Bayesian estimation of phylogeny with MrBayes [38].

Metabolic labeling of E. faecalis with 75 Se and purification of Se-binding proteins
To examine for the occurrence of Se-binding proteins in E. faecalis, 50 ml of E. faecalis strain 29212 (ATCC) in BHI media (Gibco) and E. coli strain Nova Blue (Novagen) in LB media (MP Bio) were metabolically labeled with 50 μCi of 75 Se ([ 75 Se]selenious acid (specific activity, 1,000 Ci/mmol) Research Reactor Facility, University of Missouri (Columbia, Mo.)) for 24 h. Bacterial cultures were grown at 37°C in the dark with extensive agitation (250 rpm) to provide sufficient aeration. For anaerobic growth (preferred for E. faecalis), the cultures were cultivated in fully filled, parafilm-sealed tubes without shaking. Cells were collected, resuspended in PBS buffer and sonicated. 30 μg of total soluble protein from each organism were resolved by 10% native or SDS-PAGE under non-reducing conditions and transferred onto a PVDF membrane (Invitrogen). 75 Se-labeled proteins were visualized with a PhosphorImager. To enrich for Se-binding proteins, 50 ml of E. faecalis cells were labeled with 75 Se for 24 h at 37°C without shaking in low oxygen conditions and the labeled cells were mixed with 5 g of unlabeled cells that were cultured separately using the same procedure. Cells were washed twice in cold PBS, resuspended in PBS containing EDTA-free protease inhibitor mixture (Roche), and Sebinding proteins were fractionated under non-reducing conditions on DEAE-Sepharose and Phenyl-Sepharose (GE Healthcare) columns following 75 Se radioactivity in protein fractions. Fractions containing 75 Se were collected and analyzed by SDS-PAGE.

Analysis of E. faecalis Se-binding proteins
Chromatographically enriched Se-binding proteins from E. faecalis were subjected heat and reducing agent treatments. Protein samples (~30 μg of total protein) were treated by heating (90°C for 10 min) in SDS sample buffer (Invitrogen) with or without 10 mM DTT or 2-mercaptoethanol, and then subjected to SDS-PAGE.

Notes
During the review of this manuscript, another study was published (Haft DH and Self WT. Biol Direct. 2008. 3: 4) which proposed HP1 (EF2563) and selD as markers for