Comparative genomics of bacterial and plant folate synthesis and salvage: predictions and validations

Background Folate synthesis and salvage pathways are relatively well known from classical biochemistry and genetics but they have not been subjected to comparative genomic analysis. The availability of genome sequences from hundreds of diverse bacteria, and from Arabidopsis thaliana, enabled such an analysis using the SEED database and its tools. This study reports the results of the analysis and integrates them with new and existing experimental data. Results Based on sequence similarity and the clustering, fusion, and phylogenetic distribution of genes, several functional predictions emerged from this analysis. For bacteria, these included the existence of novel GTP cyclohydrolase I and folylpolyglutamate synthase gene families, and of a trifunctional p-aminobenzoate synthesis gene. For plants and bacteria, the predictions comprised the identities of a 'missing' folate synthesis gene (folQ) and of a folate transporter, and the absence from plants of a folate salvage enzyme. Genetic and biochemical tests bore out these predictions. Conclusion For bacteria, these results demonstrate that much can be learnt from comparative genomics, even for well-explored primary metabolic pathways. For plants, the findings particularly illustrate the potential for rapid functional assignment of unknown genes that have prokaryotic homologs, by analyzing which genes are associated with the latter. More generally, our data indicate how combined genomic analysis of both plants and prokaryotes can be more powerful than isolated examination of either group alone.

Results: Based on sequence similarity and the clustering, fusion, and phylogenetic distribution of genes, several functional predictions emerged from this analysis. For bacteria, these included the existence of novel GTP cyclohydrolase I and folylpolyglutamate synthase gene families, and of a trifunctional p-aminobenzoate synthesis gene. For plants and bacteria, the predictions comprised the identities of a 'missing' folate synthesis gene (folQ) and of a folate transporter, and the absence from plants of a folate salvage enzyme. Genetic and biochemical tests bore out these predictions.

Conclusion:
For bacteria, these results demonstrate that much can be learnt from comparative genomics, even for well-explored primary metabolic pathways. For plants, the findings particularly illustrate the potential for rapid functional assignment of unknown genes that have prokaryotic homologs, by analyzing which genes are associated with the latter. More generally, our data indicate how combined genomic analysis of both plants and prokaryotes can be more powerful than isolated examination of either group alone.

Background
Folates are tripartite molecules comprising pterin, p-aminobenzoate (pABA), and glutamate moieties to which one-carbon units at various oxidation levels can be attached at the N5 and N10 positions ( Figure 1). In natural folates the pterin ring is in the dihydro or tetrahydro state, and a short, γ-linked polyglutamyl tail of up to about eight residues is usually attached to the first glutamate.
Tetrahydrofolates serve as cofactors in one-carbon transfer reactions during the synthesis of purines, formylmethionyl-tRNA, thymidylate, pantothenate, glycine, serine, and methionine [1] (Figure 2). Most folate-dependent enzymes strongly prefer polyglutamates to monoglutamates, but the opposite is usually true of folate transporters so that polyglutamylation is generally considered to favor folate retention within cells and subcellular compartments [2,3].
Plants, fungi, certain protists, and most bacteria make folates de novo, starting from GTP and chorismate, but higher animals lack key enzymes of the synthetic pathway and so require dietary folate [4][5][6][7]. Folates are crucial to human nutrition and health [3], and antifolate drugs are widely used in cancer chemotherapy and as antimicrobials [3,7,8]. For these reasons, folate synthesis and salvage pathways have been extensively characterized in model organisms, and the folate synthesis pathway in both bacteria and plants has been engineered in order to boost the folate content of foods [9][10][11].
In the pABA branch of the pathway, chorismate is aminated to aminodeoxychorismate (ADC) by ADC synthase (EC 6.3.5.8) using the amide group of glutamine as amino donor [5]. ADC is then converted to pABA by ADC lyase (EC 4.  [14]. Although the biosynthetic steps are the same in plants and bacteria, the plant pathway is split between three subcellular compartments, with pterin synthesis in the cytosol, pABA synthesis in chloroplasts, and the other steps in mitochondria ( Figure 4) [6]. FPGS isoforms are present in all three of these compartments, as are folates themselves [15,16]. Folates -both poly-and monoglutamates -are also found in plant vacuoles [16]. The highly compartmented nature of folate synthesis in plants implies the existence of pterin and folate transporters that are integral components of the pathway.
Folate-related salvage pathways are of three kinds. The first ('intact folate salvage') ( Figure 3, green color) enables utilization of supplied folic acid and DHF, and relies on a DHFR activity to reduce these oxidized folates to THF, and on an FPGS activity [7]. DHFR activity is also required to recycle the DHF produced in the reaction catalyzed by thymidylate synthase (TS, EC 2.1.1.45). The second kind of salvage ('pterin salvage') ( Figure 3, yellow color), known in Leishmania and other trypanosomatid parasites, involves the reduction of fully oxidized pterins to the dihydro and tetrahydro levels by pteridine reductase 1 (PTR1, EC 1.5.1.33) [17]. This enables oxidized pterins to be used (after reduction to dihydro forms) for folate synthesis, and (after reduction to tetrahydro forms) as cofactors for aromatic hydroxylases and other pterindependent enzymes. Finally, some bacteria, plants, and protists probably carry out a more radical kind of salvage, The structure of tetrahydrofolate Figure 1 The structure of tetrahydrofolate. In natural folates, the pterin ring exists in tetrahydro form (as shown) or in 7,8dihydro form (as in DHF). The ring is fully oxidized in folic acid, which is not a natural folate. Folates usually have a γlinked polyglutamyl tail of up to about eight residues attached to the first glutamate. One-carbon units (formyl, methyl, etc.) can be coupled to the N5 and/or N10 positions. in which the pterin and pABA-glutamate fragments produced by folate breakdown are recycled for folate synthesis [18]. This type of salvage has been little studied and will not be considered further in this article.
Genes for all the enzymes of folate synthesis have been identified in model organisms such as Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana [4][5][6]. Likewise, the intact folate salvage pathway has been well characterized in mammals, the malaria parasite Plasmodium, and Lactobacillus casei [7,19,20], and pterin salvage in Leishmania [17]. However, analysis of the distribution of known folate synthesis and salvage genes in hundreds of bacterial genomes using the SEED platform [21] reveals that much remains to be learnt about both synthesis and salvage.
The SEED is a freely available, open-source database that provides efficient ways to discover new genes or pathways, to generate predictions about gene function, and to improve annotations, based on a 'functional subsystem approach' [21]. This approach has much in common with metabolic reconstruction [22,23]. A functional subsystem may be defined as a set of functional roles (usually ten to twenty) jointly involved in a biological process. A typical subsystem is a group of enzymes, transporters, and regulatory components that participate in a metabolic pathway such as folate synthesis or salvage. Subsystem analysis examines which components are actually present in a genome and which should be present but cannot be identified, and so provides a picture of what is actually missing. This sets the stage to pursue the 'missing genes', also termed 'pathway holes' [24][25][26]. Homology-based Major folate-dependent reactions of one-carbon metabolism Figure 2 Major folate-dependent reactions of one-carbon metabolism. The gene names are for E. coli (except for sarcosine dehydrogenase). Note that the formation of 5-formyl-THF from 5,10-methenyl-THF occurs via a second catalytic activity of serine hydroxymethyltransferase (glyA), and that 5-formyl-THF is reconverted to 5,10-methenyl-THF by 5-formyl-THF cycloligase (ygfA). For simplicity, THF is not shown as a participant in most reactions in which it is consumed or released. GCV, glycine cleavage complex, comprising the products of the gcvT, gcvH, gcvP, and lpd genes; SDH, sarcosine dehydrogenase (not present in E. coli). searches alone are usually unable to locate missing genes that have not been previously identified in any genome ('globally missing genes') or those that are missing due to non-orthologous gene replacement ('locally missing genes') [27].

Met
In this study, we first predicted the pathways (de novo folate synthesis, intact folate salvage, and pterin salvage) present in around four hundred sequenced bacteria and identified cases of missing genes for almost every step of the synthesis pathway. Candidates for such missing genes in bacteria and plants were then predicted using comparative genomic tools and representative candidates were tested experimentally.

Results and Discussion
Are folates essential in all bacteria? As folate-dependent formylation of the initiator tRNA is a hallmark of bacterial translation and bacteria cannot import formylmethionyl-tRNA [28], we investigated the distribution of the fmt gene encoding methionyl-tRNA formyltransferase (EC 2.1.2.9) as a signature gene for a folate requirement. Homologs of fmt are found in all sequenced genomes except Mycoplasma hyopneumoniae and Onion yellows phytoplasma OY-M (Table 1). We confirmed the observation [29] that M. hyopneumoniae lacks all the enzymes of folate-mediated one-carbon metabolism except for glycine hydroxymethyltransferase (GlyA), which has aldolase activities that do not require   it contains a thyA homolog like most Mycoplasma species and therefore presumably requires intact folates.

Intact folate transport and salvage
As just discussed, folate is most probably essential for all sequenced bacteria except M. hyopneumoniae. However, not all bacteria synthesize folate de novo but instead rely on an external supply [see Additional File 1, variant 001; see "Methods" for an explanation of the variant code]. To predict the absence of the de novo synthesis pathway, the HPPK (FolK) and DHPS (FolP) proteins were used as signature proteins (for reasons described below). Many bacteria lack homologs of both these genes (Table 1) and so almost certainly rely on reducing and glutamylating intact Compartmentation of the folate synthesis pathway in plants Vacuole THF-Glu n folates taken up from the environment. These are mainly host-associated bacteria such as Mycoplasma or Treponema or organisms that live in folate-rich environments such as Lactobacilli. Chloroplasts and vacuoles must likewise take up folates from the cytoplasm (Figure 4), and there is also evidence for folate uptake by intact plant cells [6].

(i) Transport
Systems that mediate folate uptake in auxotrophs such as Lactobacillus casei and L. salivarius have been partially biochemically characterized [33,34], but the corresponding genes remain unknown. Whatever they are, they are most likely unrelated to mammalian folate carriers (i.e., the reduced folate carrier, the folate receptor, the intestinal folate carrier, and the mitochondrial folate carrier) since these lack close homologs among bacteria and plants. However, cyanobacteria, which are folate prototrophs, have a protein with significant similarity to a folate carrier from Leishmania species (FT1), and the cyanobacterial protein has a close homolog in plants (52% amino acid identity), as well as several more distant relatives in plants and in alpha-, beta-, and gamma-proteobacteria. We showed first that the cyanobacterial protein (Synechocystis slr0642) conferred the ability to transport folates and folate analogs when expressed in E. coli, and then that its plant homolog (Arabidopsis At2g32040) did the same [35]. We further showed that the Arabidopsis At2g32040 protein is located in the chloroplast envelope [35]. The weak slr0642 homolog in some alpha-proteobacteria (Silicibacter, Roseobacter) clusters with the folate-dependent enzyme sarcosine dehydrogenase, suggesting that this protein may also be a folate transporter.
Thus, despite progress in identifying folate transporters in cyanobacteria and in the chloroplast envelope, there are as yet no candidates for the folate carriers in many folaterequiring bacterial taxa, or in plant mitochondrial, vacuolar, and plasma membranes. These still-missing genes are future prospects for discovery by comparative genomics methods [36].
(ii) Reduction As noted above, DHFR is essential in both de novo and salvage pathways. Most bacteria have a folA gene (DHFR0), but two other bacterial enzymes able to reduce DHF are now known: FolM (DHFR1) belonging to the short-chain dehydrogenase/reductase (SDR) family [37], and a flavindependent dihydropteroate reductase that is fused to dihydropteroate synthase (DHFR2). [38]. The trypanosomatid enzyme PTR1 can also reduce DHF and folic acid [17]. As folM occurs in E. coli and other bacteria that also have a folA gene, its normal function is most probably not folate reduction, as discussed in a later section. The annotation of DHFR0 family members is complicated by their similarity to pyrimidine dehydrogenase family members (Pfam01872), which are numerous in Actinomycetes like Streptomyces coelicolor. At this stage we named them all DHFR0 but further genetic or biochemical analysis is needed to check these assignments.
Analysis of the distribution of DHFR genes in bacterial genomes reinforced the conclusions [32] that many bacteria such as Prochlorococcus marinus lack any recognizable DHFR proteins, and that most of these organisms use ThyX and not ThyA. Even if a high capacity for DHF reduction is not needed in ThyX-dependent organisms [39], these do require some DHFR activity to complete the de novo or salvage pathways so the corresponding gene(s) have yet to be identified in these organisms [32] (see Additional File 1, variants 106, 116, 006).

(iii) Glutamylation
FolC-like proteins can have FPGS activity alone [20] or both DHFS and FPGS activities [40], which complicates annotation. Although the bifunctional type has a unique dihydropteroate binding site [41], it overlaps the rest of the substrate binding site and we could not derive a motif to distinguish mono-and bifunctional enzymes. We therefore annotated them all as bifunctional. By analogy with the Lactococcus. lactis situation, we predict that organisms reliant on the salvage pathway (see Additional File 1, variants 001 and 011) will have a monofunctional FPGS.
The folC gene is missing in the Mycoplasma species that contain an fmt, a thyA and a folA gene and must therefore rely on a salvage pathway (Table 1). This absence points to three possibilities for these species: (a) they import folate polyglutamates; (b) they have a novel type of FPGS gene; or (c) they import monoglutamyl folates and polyglutamylation is not needed. We favor the last hypothesis as there is evidence for monoglutamyl folate uptake in Mycoplasma mycoides [42]. A similar situation must exist in bacteria such as Borrelia burgdorferi that lack all folate synthesis genes but contain THF-dependent enzymes such as Fmt (Table 1).

De novo folate biosynthesis
The majority of sequenced bacteria (250 out of 400) contain all genes of the pathway and are therefore predicted to be prototrophic for folate (see examples in Table 2 and Additional File 1, variant 111). However, a substantial minority lack just one or a few genes of the pterin or pABA branches, and detailed analysis of these cases reveals several biologically significant points.

(i) The pterin branch
The first enzyme of this branch, GCHY-I, is encoded in E. coli by the folE gene. A recent analysis of the distribution of folE genes among bacterial genomes showed the folE gene to be locally missing in one-third of them [43]. Another protein family, COG1469, was found to responsible for 7,8-dihydroneopterin triphosphate formation in these organisms. This protein was named GCHY-IB and the corresponding gene folE2 [43] ( Table 2). Further analysis revealed that a few bacteria such as Wolbachia, Chlamydia, and Chlamydophila species lack both folE and folE2 homologs whereas they contain the signature genes of the pathway folKP (see Table 2 and additional File 1, variants 701, 702), suggesting that another family of GCHY-I enzymes has yet to be identified. For instance, at least certain Chlamydia species are known to synthesize folates de novo [44], but lack folE and folE2. A candidate for the missing GCYH-I enzyme was the Chlamydia trachomatis protein CT610 and its homologs, which cluster with the folABKP folate genes in Chlamydia and Wolbachia species ( Figure 5A). The protein is homologous to the pyrroloquinoline quinone (PQQ) biosynthesis protein PqqC that catalyzes an overall eight-electron oxidation, leading to a pyrrole and pyridine ring, but their active sites are not conserved, consistent with a different enzymatic activity [45]. The CT610 gene was cloned in pBAD24 but failed to complement the dT auxotrophy of the E. coli folE mutant. The strong linkage of CT610 homologs with folate genes certainly points to a function in folate metabolism as other de novo folate genes than folE are missing in chlamydiae such as folQ or pabAabc (Table 2), but further studies are needed to determine its functional role.
The second step of folate synthesis is the removal of pyrophosphate. Although an enzyme mediating this step had been demonstrated in E. coli [46], no gene was known from any organism. We identified a DHNTP pyrophosphatase (FolQ) candidate in L. lactis as part of the folKEPQC gene cluster (Table 2) [12]. FolQ belongs to the Nudix (Nucleoside diphosphate X) hydrolase family [47]. Biochemical and genetic tests confirmed DHNTP pyrophosphatase activity [12]. Furthermore, the closest Arabidopsis homolog of L. lactis FolQ was also shown to have this activity [12].
Since the Nudix family is large and functionally heterogeneous it is not very amenable to projection of annotations just by homology. FolQ homologs with a high homology score occur in rather few bacteria, so that the DHNTP pyrophosphatase gene is still missing in most genomes, including E. coli. Other putative phosphohydrolases unrelated to FolQ, FolQ2 members of the HDIG superfamily, are found in some folate-related gene clusters ( Figure 5B), such as CPE1020 in Clostridium perfringens; these genes are good candidates for alternatives to FolQ but again have a limited phylogenetic distribution leaving the problem open in most bacterial species ( Table 2).
The third specific enzyme of the pathway, DHNA, is encoded in E. coli by the folB gene. This gene and its paralog folX [13] appear to be missing in many phylogenetically diverse bacteria such as Geobacter metallireducens. Genome and functional context analysis allows the prediction that the DHNA role is played by members of the transaldolase (EC 2.2.1.2) family (e.g. DVU1658 in Desulfovibrio vulgaris). Specifically, about half the bacteria that lack DHNA have a transaldolase encoding gene that clusters with folK genes in several organisms ( Figure 5C). This prediction awaits experimental validation as this transaldolase family is broad and only some of its members might encode a DHNA aldolase. Some genomes such as Rickettsia felis lack both FolB and transaldolase homologs while containing all the other de novo enzymes (see Table  2 and additional File 1, variant 401), again suggesting that another family of FolB enzymes has yet to be identified unless the pathway is on its way to elimination in these organisms specifically.
HPPK (FolK) and DHPS (FolP) are distinctive proteins found in all organisms that make folate de novo and so, as noted above, these were used as pathway signature genes. A few sporadic organisms apparently lack one of the two genes, but further analysis shows that this is usually because of a gene-calling problem (a homolog can be found using the tblastn algorithm) or because the corresponding genome is still incomplete. Some organisms, however, have two folP genes or two folK genes ( Table 2). Are these functionally redundant or catalyzing different reactions? In most cases one paralog is clustered with folate genes and the other clusters with genes involved in different pathways (see Table 2 and additional File 1). For instance, in the high-GC gram-positive group the second folP (folP2) clusters with cell wall synthesis genes. In Mycobacterium leprae the folP2 gene does not complement an E. coli folP mutant whereas the copy that clusters with the folate genes (folP1) does, suggesting that folP2 is involved in another pathway [49].
FolK is duplicated in many organisms. In most cases such as Shewanella denitrificans (Table 2), one copy is in a folate operon and the other in a pantothenate operon but there are several cases where both genes are close to other folate biosynthesis genes (see also Additional File 1). Only experimental testing will show whether both copies are active. It is of note that an internal duplication of FolK and fusion with FolB is found in Bifidobacterium longum.
The sequenced chlamydiae all lack homologs of folC (DHFS/FPGS) but have folPK homologs (see Table 2 and additional File 1), making folC a locally missing gene in this group. Inspection revealed that a member of gene family COG1478 is clustered in chlamydiae with folate biosynthesis genes ( Figure 5A, folC2). This COG1478 family contains the F 420 :γ-glutamyl ligase CofE of Archaea and Mycobacteria [50]. CofE catalyzes the GTP-dependent successive addition of two γ-linked L-glutamates to the Llactyl phosphodiester of 7,8-didemethyl-8-hydroxy-5deazariboflavin (F 420 ), a reaction analogous to that mediated by FolC. Chlamydiae almost certainly do not make F 420 since they lack all the other known cof genes [50]. We accordingly predicted that the CofE homolog in chlamydiae has FolC activity. A cofE homolog (CT611) was shown to complement the methionine and glycine requirements of the E. coli folC mutant SF4 [40] indicating that CT611 can indeed functionally replace FolC ( Figure  6). The E. coli folC gene from the ASKA collection [51] was used as a positive control.
(ii) The pABA branch We adopted the nomenclature of Xie et al. [52] for the pABA branch genes. These genes are hard to annotate for several reasons. In the first place, they can be fused in various combinations. A fusion between the subunits of ADC synthase (PabAa and PabAb) is a common arrangement, as is fusion between PabAa and ADC lyase (PabAc). In one genome, Corynebacterium diphtheriae, our analysis indicated a triple fusion. The functions of this PabAa-PabAb-PabAc fusion gene (DIP1790) were tested experimentally. The gene was cloned into an expression vector and introduced into an E. coli pabAa pabAb mutant (strain  BN1163), which cannot grow on minimal medium unless it expresses a recombinant enzyme with ADC synthase activity. A bifunctional PabAa-PabAb ADC synthase protein from Arabidopsis served as a positive control. Like the positive control, expression of the DIP1790 protein restored pABA prototrophy (Figure 7). This result shows that the DIP1790 protein has ADC synthase activity but does not demonstrate ADC lyase activity because the BN1163 strain has endogenous ADC lyase (PabAc). Enzyme assays were therefore used to test DIP1790 for ADC lyase activity. BN1163 cultures harboring plasmids encoding DIP1790, Arabidopsis ADC synthase, and E. coli PabAc were grown and induced, and proteins were extracted. Extracts of cells expressing DIP1790 were incubated with chorismate and glutamine, without or with E. coli PabAc; pABA was formed in the absence of PabAc whereas, as expected, Arabidopsis PabAa-PabAb formed pABA only if PabAc was added. Reaction rates (nmol pABA h -1 mg -1 protein) were: DIP1790 -PabAc, 7.0; DIP170 + PabAc, 6.0; Arabidopsis ADCS -PabAc, <0.01; Arabidopsis ADCS + PabAc, 4.0. These data establish that DIP1790 has ADC lyase as well as ADC synthase activity.
Another difficulty in annotating the pabAabc genes is that most organisms contain paralogs of pabAa and pabAb (trpAa and trpAb, respectively) that participate in tryptophan biosynthesis [52], and in some cases the PabAb (amidotransferase) subunit is shared between the pABA and tryptophan pathways [53]. Finally, PabAc belongs to the large branched-chain amino acid aminotransferase Clustering of predicted folate-related genes with known folate synthesis genes Figure 5 Clustering of predicted folate-related genes with known folate synthesis genes. Gene names are as described in the text or given below. [For full gene and genome names, see Additional File 1.] Matching colors correspond to orthologous genes. Pale grey arrows are non-folate related genes. A. Clustering of folC2 and pqqC-like genes. 5-fcl, 5-formyl-THF cycloligase. B. Clustering of folQ2 genes. fhs, formate-tetrahydrofolate ligase; dhfr, dihydrofolate reductase. C. Clustering of folB2 (fructose-6-phosphate aldolase-like) genes. D. Clustering of folM genes. D family (EC 2.6.1.42) and is hard to distinguish from these enzymes. These problems mean that the current SEED annotation of the pABA branch of folate synthesis should be taken as tentative. That said, analysis of the distribution of these genes reveals that most bacteria make pABA from chorismate. As expected, many intracellular bacteria lack all pabA genes. In cases where the organisms have the pterin branch but lack all enzymes of the pABA branch, annotation problems cannot be ruled out but an alternative pathway for the biosynthesis of pABA, starting for example with dehydroquinate instead of chorismate, could also be the answer [54].

Pterin salvage
The Leishmania pterin reductase PTR1 is a member of the SDR family, but has a highly characteristic motif TGX 3 RXG (in place of the TGX 3 GXG motif that is typical of this family) [55]. This motif is shared with E. coli FolM and similar SDR family proteins in a variety of bacterial taxa. Several of the folM-like genes are clustered with genes of the pterin branch of folate synthesis (Figure 8), suggesting a function in folate or pterin synthesis Since E. coli [56] and other bacteria [57] are known to contain tetrahydromonapterin or other tetrahydropterins that could serve as cofactors for pterin-dependent enzymes, we predict that folM-like genes are not primarily involved in folate synthesis but rather are pteridine reductases that, like PTR1, produce and/or reduce 7,8-dihydropterins.
(Note that such reductases are distinct from 6,7-dihydropterin reductases [also termed quinonoid pteridine reductases], of which E. coli has two [58,59].) Consistent with this prediction, the recombinant FolM protein catalyzes reduction of dihydrobiopterin to the tetrahydro form; unlike PTR1, however, it does not mediate reduction of fully oxidized biopterin to the dihydro form [37]. Supporting the latter observation, we found that an E. coli GCHY-I mutant (which is unable to make pterins) can use the dihydro but not the oxidized forms of neopterin, monapterin, or 6-hydroxymethylpterin to support folate synthesis [60]. Futhermore, expression of a typical folM-like gene (Xylella fastidiosa PD0677, Figure 5D, Table  2) from a plasmid did not enable this mutant to use oxidized pterins, indicating that -like FolM -the PD0677 gene product does not act on oxidized pterins (Figure 8). In control experiments in which Leishmania PTR1 was expressed from a plasmid, the mutant was able to use oxidized pterins, confirming that it is oxidized pterin reduction (and not uptake) that is lacking in E. coli (Figure 8) [60].
Searching the Arabidopsis genome revealed some 86 members of the SDR family, of which none had the TGX 3 RXG motif. This led to the prediction that Arabidopsis would be unable to salvage oxidized pterins, which was verified by showing that 6-hydroxymethylpterin was Complementation of folC function by Chlamydia trachomatis CT611 Figure 6 Complementation of folC function by Chlamydia trachomatis CT611. Complementation of E. coli folC mutant SF4 by a pBAD24 plasmid harboring CT611 from Chlamydia trachomatis on MS minimal medium with or without glycine plus methionine. E. coli folC was included as a positive control. 1, pCA24N::EcfolC; 2, pBAD24::CT611; 3, pBAD24 alone. SF4 shows slower growth even in the presence of the added amino acids. The appropriate antibiotics and inducers were included in the media as indicated. Amp, ampicillin; Ara, arabinose; Cm, chloramphenicol. + Gly/Met, IPTG/Cm + Gly/Met, Ara/Amp -Gly/Met, Ara/IPTG not reduced in vivo or in vitro, and was not incorporated into folates [60].

Conclusion
This analysis and integration study demonstrates that simple phylogenomic analysis of a biochemical pathwayeven a well-known one -can unearth globally missing (e.g., folQ) or locally missing (e.g., folE2 or folC2) genes in bacteria and plants and reveals that many open questions remain (such as the missing folQ, folB, folE cases listed in Table 2). It can also identify, or suggest functions for, additional genes related to the pathway (e.g., folM). Such analysis can thus lead to discovery of potential new drug or herbicide targets such as GCHY-IB, which occurs in many pathogenic bacteria but not in mammals, or the chloroplast folate carrier that is likewise absent from mammals.
It should be noted that content of the current SEED folate subsystem captures the present status of an ongoing annotation effort, that the content will be refined and improved as more bacterial and plant genomes are added, and that further predictions are expected to emerge. Finally, we emphasize that the predictions herein are offered with the hope that others will find them useful in their own research.

Bioinformatics
Analysis of the folate subsystem was performed in the SEED database [61]. Results are available in the 'Folate biosynthesis sub-system' on the public SEED server at [62]. The snapshot of this analysis on the SEED database is given in the additional file. Phylogenetic pattern searches were made on the NMPDR SEED server at [63] to find candidates for the missing folE and folC genes. We also used the Blast tools and resources at NCBI [64] and the comparative genomics platforms STRING [65] for additional gene clustering analysis tools.
Annotations for paralog families were made using physical clustering on the chromosome when possible or by building phylogenetic trees using the ClustalW tool [66] integrated in SEED or deriving specific protein motifs. Pseudogenes (i.e., those encoding clearly aberrant proteins) were ignored; these are not uncommon in the folate pathways of intracellular parasites undergoing genome reduction [67].
The 'variant code' is used in SEED to schematize the type of pathways found in a given organism [21]. A three-digit code was used. Digit one describes the pterin branch of the pathway: 1 = complete, 0 = HPPK and DHPS missing, 4 = DHNA missing, 7 = GCHY-I missing. Digit two describes the pABA branch: 1 = two or three of the pabAabc genes present, 0 = all pabAabc genes missing or just one present. Digit three describes the salvage pathway: 1 = complete; 0 = FPGS and DHFR missing; 2 = FPGS missing; 6 = DHFR missing. Variant -1 represents genomes with no pathway genes but no need for them because no folatedependent enzymes are present. Particular care was given to annotation of fused proteins, which are common in both branches of the pathway; SEED has annotation tools to deal with fusion proteins [21].
The Corynebacterium diphtheriae DIP1790 gene was cloned into pGEM-T Easy (Promega), after amplification from genomic DNA (obtained from the American Type Culture Collection) using the primers 5'-GCGGCCGCCACAG-GAAACAGCTATGGTTATGCAACGCGCGCA-3' and 5'-GAGCTCTCACACTTGGGCGATATTCT-3'. The SstI site in the gene was ablated by PCR using the internal primers 5'-TCATCACCGAaCtTGAAGGCA-3' and 5'-TTTGCCT-TCAaGtTCGGTGATG-3' (changed nucleotides in lower case). The modified gene was ligated into pGEM-T Easy and verified by sequencing. It was then excised with NotI and SstI and ligated into pLOI707HE [72]. This construct was used to transform E. coli BN1163. Complementation tests were made using minimal medium, appropriately supplemented as above.
The Xylella fastidiosa Temecula1 PD0677 amplicon preceded by a Shine-Dalgarno sequence and a stop codon in frame with LacZα was cloned between the EcoRI and KpnI sites of pBluescript SK-. The PCR template was genomic DNA from the American Type Culture Collection; primers were 5'-AGTCAGAATTCGTGAAGGAAACAGCTATGTCA-GATCCCTCTAAAGTC-3' and 5'-AGTAGGTACCTCATGT-CAGCGTGCGGCC-3'; amplification was with KOD HiFi polymerase. The deduced amino acid sequence differed from that published in having serine not isoleucine at position 57. The PTR1 construct was as described [60]. The constructs were introduced into E. coli folE deletant cells [35]. Transformants were grown on LB plates supplemented appropriately as above.