Evolution of electron transfer out of the cell: comparative genomics of six Geobacter genomes

Background Geobacter species grow by transferring electrons out of the cell - either to Fe(III)-oxides or to man-made substances like energy-harvesting electrodes. Study of Geobacter sulfurreducens has shown that TCA cycle enzymes, inner-membrane respiratory enzymes, and periplasmic and outer-membrane cytochromes are required. Here we present comparative analysis of six Geobacter genomes, including species from the clade that predominates in the subsurface. Conservation of proteins across the genomes was determined to better understand the evolution of Geobacter species and to create a metabolic model applicable to subsurface environments. Results The results showed that enzymes for acetate transport and oxidation, and for proton transport across the inner membrane were well conserved. An NADH dehydrogenase, the ATP synthase, and several TCA cycle enzymes were among the best conserved in the genomes. However, most of the cytochromes required for Fe(III)-reduction were not, including many of the outer-membrane cytochromes. While conservation of cytochromes was poor, an abundance and diversity of cytochromes were found in every genome, with duplications apparent in several species. Conclusions These results indicate there is a common pathway for acetate oxidation and energy generation across the family and in the last common ancestor. They also suggest that while cytochromes are important for extracellular electron transport, the path of electrons across the periplasm and outer membrane is variable. This combination of abundant cytochromes with weak sequence conservation suggests they may not be specific terminal reductases, but rather may be important in their heme-bearing capacity, as sinks for electrons between the inner-membrane electron transport chain and the extracellular acceptor.


Background
Species of the Geobacter clade specialize in the oxidation of organic compounds to carbon dioxide coupled to the reduction of insoluble, extracellular electron acceptors [1]. These species play an important role in pristine sediments and soils where they oxidize fermentation by-products like acetate and reduce naturally occurring insoluble Fe(III) and Mn(IV) oxides [1]. In addition, they play important roles in three biotechnical applications: they are able to degrade hydrocarbon contaminants in soils, they are able to insolubilize uranium in contaminated aquifers, and finally, they are able to transfer electrons from a variety of substrates onto graphite electrodes, from which electricity can be harvested [2][3][4].
The mechanisms of electron transfer to Fe(III) and extracellular electron acceptors generally are not well understood [1]. While soluble electron acceptors like oxygen and nitrate can diffuse into the cell, Geobacter species must transfer electrons onto an essentially insoluble, and therefore extracellular, electron acceptor. Geobacter sulfurreducens is currently the model organism for the Geobacteraceae family; the genome is sequenced [5] and there is a genetic system [6]. G. sulfurreducens completely oxidizes the electron donor acetate to carbon dioxide via TCA cycle reactions [7]. Electrons are then transferred into the inner membrane, presumably via NADH dehydrogenase(s) [8], and a succinate dehydrogenase [9]. Electron transfer out of the inner membrane, through the periplasm and outer membrane to Fe(III) presumably requires c-type cytochromes. Several cytochromes have been shown to be required for growth by Fe(III) reduction, both in G. sulfurreducens [10][11][12][13][14][15][16] and in the other well-studied dissimilatory Fe(III) reducer, Shewanella oneidensis [17,18]. However, a specific electron transport chain to extracellular Fe(III) has not been determined for any organism.
The genomes of several closely related Fe(III)-reducing organisms in the Geobacter family have recently been sequenced. This work compares the complete or 10×coverage draft genome sequences of six species: G. sulfurreducens, Geobacter metallireducens, Geobacter uraniireducens, Geobacter bemidjiensis, Geobacter strain FRC-32 and Geobacter lovleyi. The six Geobacter genomes were compared and conservation of electron transport proteins was determined in order to identify electron transport genes that may be critical for the reduction of Fe(III) and other terminal electron acceptors, to better understand the evolution of the family, and to help provide foundational data for modeling of subsurface bioremediation.

Results and Discussion
Identification of the protein families in the six Geobacter genomes The general features of each of the six genomes are presented in Table 1. Orthologous proteins, those proteins predicted to have similar functions in the different species, were identified by Markov clustering of sets of reciprocal best BLAST matches [19]. Using all 22,434 protein coding genes in the six genomes (see Additional file 1), 4,062 protein families with at least two orthologs were defined (see Additional file 2). The families contained 17,620 (79%) of all proteins. 4815 proteins were found in only one genome, and 2,196 of these were considered to be from lateral gene transfer (see Additional file 1) (discussed below). A functional role was associated with each family using the G. sulfurreducens in silico model annotation [20] and COG categorization [21].
For each protein family, its phyletic pattern -the pattern of which species encode the proteins in that family -was determined (see Additional files 2 and 3). By far the most common pattern was conservation across all species, 35% of the proteins (7,774) were in families that included at least one ortholog from each genome (see Additional file 3). The second most common pattern was conservation in all of the species except G. lovleyi -6% (1,246) of the proteins had this phyletic pattern (see Additional file 3).
Forty-three protein families had at least 10 members (see Additional file 2). Thirteen of these large families were putative transposases which can be expected to be present in many copies in a genome. Other large protein families included three cytochrome families (protein family IDs 23, 31, and 45), a nickel-dependent hydrogenase family (ID 41), and two histidine kinase sensor/regulator families (IDs 7, 39) ( Table 2).

Phylogenetics
A phylogeny of the family was constructed using the 697 protein families that had a single ortholog in each of the six genomes and the outgroup species Pelobacter propionicus (see Additional file 4). These proteins from each genome were concatenated then aligned, and this alignment was used to create a Bayseian model of the phylogeny (Figure 1). These proteins included many in addition to the housekeeping genes classically used to determine phylogeny, including proteins involved in information storage, metabolism, cell signaling, and those with no known function (see Additional file 4). The resulting phylogeny supports 16S rDNA phylogeny [22], and shows that the subsurface species from the contaminated bioremediation sites, G. uraniireducens, G. strain FRC-32, and G. bemidjiensis form a group distinct from the model organisms ( Figure 1). Analysis of the clustering of proteins into families showed that relatively few protein families (89, made up of 283 proteins) were found only in the three subsurface species, including a hydrogenase discussed below (see Additional file 5).

Conservation of acetate and hydrogen metabolism
In all Geobacter species, acetate is the primary electron donor and it is oxidized via the TCA cycle, generating NADH, NADPH, and reduced ferredoxin ( Figure 2) [7,23,24]. Acetate transporters were conserved in all the Geobacter species. There were two families of acetate transporters [25] that were conserved in all six species; one family contained a single ortholog from each species (GSU0518 in family 1057), and the second had multiple orthologs in each species (GSU1068, GSU1070, GSU2352 in 15) (see Additional file 6).
The genes for the eight reactions for acetate oxidation via the TCA cycle were conserved in all species ( Figure  2). All of the subunits for acetyl-CoA transferase, citrate synthase, aconitase, isocitrate dehydrogenase, keto/oxoacid ferredoxin oxidoreductase, succinate dehydrogenase (complex II), fumarase, and malate dehydrogenase were conserved in all six of the species (see Additional file 6).
G. sulfurreducens, G. bemidjiensis, G. strain FRC-32, and G. lovleyi can also use hydrogen as an electron donor in addition to acetate. In G. sulfurreducens the enzyme required for hydrogen oxidation has been identified as a four-subunit NiFe hydrogenase (GSU0782-GSU0785) [26]. This uptake hydrogenase was not conserved in all six of the Geobacter species; orthologs to the four subunits of this enzyme were found in three of the species that oxidize hydrogen: G. sulfurreducens, G. bemidjiensis, and G. lovleyi, and in one that does not: G. uraniireducens (see Additional file 6). No orthologs to   this hydrogenase were found in G. strain FRC, which can oxidize hydrogen (see Additional file 6). However, G. strain FRC-32 and the other species isolated from the subsurface all encoded an additional hydrogenase, a four-subunit hydrogenase found only in these species (families 177, 178, 3032, and 3033, see Additional file 5).
The four genes were similar to the heterotetrameric hydrogenase found in Pyrococcus that is a cytoplasmic, NADP-using hydrogenase [27] (Figure 3). After acetate or hydrogen oxidation, the electrons are transferred into an inner-membrane bound electron transport chain, and protons are pumped out of the cytoplasm for ATP synthesis via an ATP synthase (Figure 2). G. sulfurreducens encodes two NADH dehydrogenases (complex I), one with 12 subunits and one with 14. This reaction is predicted to be the only one at which protons are pumped during Fe(III) or fumarate respiration [20]. All six Geobacter species contained orthologs to one of these enzymes, the 14-subunit enzyme (GSU0338-GSU0351) (see Additional file 6). The 12-subunit enzyme was conserved in all species except G. loveyi (see Additional file 6). The putative NADPH dehydrogenase [28] was conserved in all six Geobacter species (Figure 2, Additional file 6).
Regardless of whether acetate or hydrogen is the electron donor, ATP is synthesized with an inner-membrane bound ATP synthase. G. sulfurreducens encoded one ATP synthase enzyme in two gene clusters (GSU0108-GSU0114 and GSU0333-GSU0334). All six Geobacter species contained orthologs to all subunits of this enzyme ( Figure 2, Additional file 6).

The best conserved proteins in the Geobacter species
The level of sequence similarity among conserved proteins was estimated using bit score ratios between reciprocal orthologs [29]. 1266 G. sulfurreducens proteins had reciprocal orthologs in every other Geobacter genome (see Additional file 7). The average bit score ratio of these proteins was 70%. Only a small subset, 61 proteins, had an average bit score ratio of at least 90% (Table 3). This subset of very well conserved proteins contained housekeeping proteins, including ribosomal proteins, translation elongation factors, Rho transcription termination factor, and amino acid biosynthetic genes (Table 3). This set also included proteins involved in electron transfer, including subunits of the NADH dehydrogenase and the ATP synthase from the inner membrane, and the citrate synthase and acetyl-CoA transferase from the TCA cycle (Table 3)  scoring proteins (at least 85%) included subunits of many of the other TCA cycle enzymes: succinate dehydrogenase, aconitase, malate dehydrogenase, 2-oxoglutarate oxidoreductase (see Additional file 7).
Searching all six Geobacter species genomes showed that at least 100 ORFs in each genome contained at least one occurrence of the motif for covalent heme binding (CXXCH), indicating that these may be cytochromes (see Additional file 8). This was more than was found in 16 other genomes including those of Shewanella, Desulfovibrio, Rhodoferax, and Anaeromyxobacter species known to be cytochrome rich (see Additional file 8). Since this definition of cytochrome is minimal, a more stringent definition was created using 26 sequence profiles described in the protein database Interpro as ctype cytochromes. These profiles were compared against all of the proteins in the six Geobacter genomes. Proteins were considered cytochromes if their sequence contained at least one profile match and at least one CXXCH motif (see Additional file 9).
These results showed that each Geobacter genome contained an average of 79 cytochromes (Table 4). G. uraniireducens contained the most cytochromes, 104, and G. lovleyi the least, 61. On average, 2.1% of the proteins encoded in the genome of each of the Geobacter species are cytochromes (Table 4). Not only is the number of cytochromes in the genomes large, 85% of the cytochromes contain more than one heme motif -with 7.7 hemes per cytochrome on average ( Table 4).

Conservation of cytochromes
While an abundance of cytochromes are found in all of the Geobacter species, very few were conserved in all six species, in contrast to the excellent conservation seen for the other energy metabolism proteins discussed above. There were 471 cytochromes in total identified in the six Geobacter genomes (see Additional file 9). Only 64 of the cytochromes (14%) were part of a protein family that included at least one cytochrome from each of the six genomes. These 64 conserved cytochromes formed nine protein families (Table 5).
There was poor conservation across the species of many of the cytochromes that have been shown to be    Only one of the nine well-conserved cytochrome families contained a cytochrome, PpcA, known to be required for wild type levels of Fe(III) reduction [12]. At least one homolog to PpcA was found in every genome, and there were multiple homologs in most of the genomes: five in G. sulfurreducens, five in G. metallireducens, four in G. uraniireducens, three in G. bemidjiensis, two in G. strain FRC-32, and one in G. lovleyi (see Additional file 2). In related sulfate-and sulfur-reducing δ-Proteobacteria species, the most abundant and best studied cytochromes are the of the tetra-heme c 3 type [36,37], while those of PpcA family are of the tri-heme c 7 type [38].
Analysis of the well-conserved cytochromes showed that four of the nine families conserved in all species were encoded together in a single cluster in each genome ( Figure 4, Table 5). These conserved cytochromes were predicted to be 2-heme (GSU2930), 10-heme (GSU2934), 12-heme (GSU2935), and 5-heme (GSU 2937) (see Additional file 9). Also in this cluster were an inner-membrane-bound b-type cytochrome (GSU2932) and Rieske Fe-S protein (GSU2933) (Figure 4), which were clearly homologous to the core of the cytochrome bc complexes [39,40]. The b-type cytochrome and the Rieske protein were also conserved in all of the genomes (see Additional file 2). In other species, the cytochrome bc complex (complex III) catalyzes a key step in electron transport, that which provides the electrical link between the inner membrane and periplasm. However, the protein that provides this link in the Geobacteraceae has not been characterized, making this well-conserved cluster a good candidate for further analysis. This enzyme is especially important because it may be a second possible location of proton pumping in the cell, which would affect ATP yield during respiration, and may be different depending on the electron acceptor or on the redox potential of the cell. Typically, there is a single c-type cytochrome associated with this enzyme, which is a tetra-heme cytochrome in other δ-Proteobacteria [41], so the role of the multiple c-type cytochromes in this highly-conserved cluster is novel and warrants further investigation. There were three other cytochrome families that were well conserved in all six genomes: families of 3-heme, 9heme, and 12-heme cytochromes ( Table 5). None of these cytochromes have been studied.

Duplications of cytochrome genes
Twenty-eight of the 115 families that included cytochromes had more than one protein member per  genome (see Additional file 9). In other words, they included paralogs, which may represent duplicated cytochrome genes. The largest cytochrome family had 12 members from 5 genomes (family 23, Additional file 9). Several families were made up of cytochromes from only a single genome, indicating recent duplication or triplication of the cytochrome since that species diverged (families 3111, 3250, 3413, and 3597). Several cytochromes known to be required for wild type Fe(III) metabolism appeared to have been duplicated within single genomes. The OmcS family (64) had nine members, all 6-heme cytochromes, found in four of the Geobacter genomes (see Additional file 9). The G. bemidjiensis genome contained four OmcS proteins, G. sulfurreducens three, and one in both G. FRC-32 and G. uraniireducens (see Additional file 9). The OmcZ family (2307) contained four members from three genomes: G. sulfurreducens had two members (see Additional file 9). All six of the Geobacter genomes contained more than one PpcA-like protein ( Table 5).
The Orf2 cytochromes (GSU2732 and GSU2738) were conserved across all of the Geobacter species (Table 5), and also showed duplication. There are nine members in this family (51), all 8-heme or 9-heme cytochromes (see Additional file 9). G. sulfurreducens, G. metallireducens, and G. uraniireducens each contain two Orf2 cytochromes. In G. sulfurreducens, the Orf2 genes are encoded in a tandem repeat with another duplicated cytochrome (called OmcB/OmcC) known to be important for Fe(III) reduction [11] (Figure 5). Initial examination of the OmcB/OmcC family (number 1653) indicated it did not have complete conservation like the Orf2 family did, but analysis of these genes in genome context indicates that an operon of similar structure was conserved in all six species ( Figure 5). Alignment of the orf2-omcB genome regions from all six species showed that there was at least one operon with similarity to the orf2-omcB operon in each genome, and furthermore, there were tandem repeats of this operon in several of the genomes ( Figure 5). Interestingly, while the Orf2 cytochrome gene and the Orf1 gene immediately upstream were well conserved orthologs across all species, the gene immediately downstream varied. In all cases there was a multi-heme cytochrome encoded in the OmcB/C spot in the operon, but the sequence similarity to OmcB/C varied ( Figure 5). This indicates that this operon may be important in all six species, though while it appears that conservation of the sequence of the Orf2 cytochromes is important, there may be less pressure for the larger outer membrane cytochromes to maintain a specific sequence.

Lateral gene transfer
The data presented above indicates that cytochromes are abundant in each genome, but not very well conserved across the genomes. Cytochrome duplication and divergence appears to have played a role in these genotypes. In addition, to investigate whether cytochromes were less well conserved because they were acquired laterally rather than inherited vertically, genes originating from lateral gene transfer were identified using a combination of phylogenetic and BLAST-based analysis. A neighbor-joining phylogenetic tree was inferred for every protein from the six genomes and homologous sequences for each protein were selected from the non-

Geobacter lovleyi
Geobacter uraniumreducens Geobacter bemidjiensis Figure 4 The gene cluster (GSU2937 through GSU2930) encoding the putative inner-membrane cytochrome bc complex that is conserved in all six Geobacter species. Genes encoding c-type cytochromes are shown in yellow, the Fe-S cluster protein encoding gene is shown in purple, and the cytochrome b gene is shown in green. All of these protein are orthologs across all of the Geobacter genomes ( Table 5). The c-type cytochromes contain 2, 10, 12, and 5 heme-binding motifs each, respectively (see Additional file 9). redundant protein database. These trees were used to identify proteins for which the nearest relative was not from the Geobacteraceae. If the phylogeny was strongly supported (bootstrap ≥ 50) or if the phylogeny was weakly supported and the most similar sequence in the non-redundant protein was not a Geobacteraceae species, the protein was considered a lateral gene transfer candidate. 2,196 of the 21,434 proteins in these six genomes (9.8%) were predicted to have originated from recent transfer from a distantly related organism (see Additional file 1). Only 19 of the 472 predicted cytochromes (4.0%) were identified as lateral gene transfer candidates -1 in G. bemidjiensis, 5 in G. lovleyi, 6 in G. metallireducens, 2 in G. sulfurreducens, and 3 in G. uraniireducens (see Additional file 9). None of the cytochromes shown to be required for wild type electron transport in G. sulfurreducens were predicted to have originated from lateral gene transfer (see Additional file 9). These data indicated that the abundance of cytochromes in these six species cannot be explained by frequent lateral gene transfer.

Conclusions
The results show that the genes for oxidizing acetate and transferring electrons to cytoplasmic carriers, and for inner membrane electron transport, are well conserved between the Geobacter genomes. These results indicate that the Geobacter species and their last common ancestor all oxidized acetate using the same TCA cycle pathway that produces NADH, NADPH, and reduced ferredoxin. These substances are then oxidized at the inner membrane, and ATP is generated via oxidative phosphorylation. The previously unidentified site of quinol oxidation in the inner membrane is suggested to be a cytochrome bc complex encoded in an unique gene cluster that is conserved in all six species. The pathways used by the better-studied species were also found to be conserved in the newly discovered species that predominate in subsurface environments undergoing bioremediation, suggesting that the current metabolic model for G. sulfurreducens [20] provides a good foundation for broader modeling of microbial metabolism in contaminated subsurfaces during bioremediation. However, the role of the newly identified hydrogenase unique to these subsurface species merits further investigation.
In stark contrast to the conservation of the pathway for ATP generation from acetate is the lack of conservation of the enzymes that dispose of the electrons after ATP production. The six Geobacter genomes contain an average of 79 cytochrome genes each, with each cytochrome predicted to bind an average of more than 7 hemes. So an abundance of extracytoplasmic heme is clearly important in these species. However, only 14% of the cytochromes are conserved in all six of the genomes. More surprisingly, even the cytochromes that have been shown to be required in G. sulfurreducens for electron transport to Fe(III) or electrodes are not well conserved.
Cells of G. sulfurreducens have been shown to be capable of storing ca. 1.6 × 10 -17 mol electrons in the iron of their cytochromes [42]. This has lead to the proposal that cytochromes may act as electric capacitors, accepting and storing the electrons from energy metabolism for short time spans in the absence of an extracellular electron-accepting surface [43]. The data presented here indicates that in these species there is a combination of strong pressure to maintain many cytochrome genes with weak pressure to maintain the sequence of most cytochrome genes. This lack of conservation of cytochrome genes suggests that in Geobacter species there may not be a single common pathway for electron transport outside the cell, and that cytochromes may be required for general Fe-bearing capacity, as sinks for electrons between the inner-membrane electron transport chain and the extracellular acceptor.

Genome sequencing and annotation
With the exception of G. sulfurreducens [5], sequence data for the genomes were produced by the US Department of Energy Joint Genome Institute http://www.jgi. doe.gov, using a whole-genome shotgun strategy for the Sanger-sequencing of 3-Kb, 8 Geobacter bemidjiensis Figure 5 The region of the operon of omcB (dark blue) in all six Geobacter species genomes. In G. sulfurreducens the multiheme cytochrome OmcB, which is required for electron transport to extracellular acceptors, is encoded in an operon with two other genes, orf1 (red) and orf2(gray) that is duplicated in the genome [54]. Shown here are regions of the genomes that encode the orthologs to these genes in all six Geobacter genomes, with orthologs colored identically. In some cases, there were multi-heme cytochromes encoded in the position of OmcB, but the sequence similarity was too low to confidently predict orthology, so these genes are colored light blue.

Clustering orthologs into protein families
All proteins in the genomes were clustered into families of orthologs and recent paralogs using OrthoMCL [19], which uses reciprocal best similarity pairs from all-vs-all BLAST [44] to identify orthologs and recent paralogs, which are then clustered together across all the genomes using the Markov clustering algorithm [45]. A functional role was predicted for each cluster using the G. sulfurreducens in silico model annotation [20] and COG categorization [21]. The level of sequence similarity among conserved proteins was estimated using bit score ratios between reciprocal orthologs [29].

Phylogenetics
All the ORFs from the six genomes and the outgroup species Pelobacter propionicus (NC_008609) were put into orthologous groups using Hal [46], with inflation parameters from 1.1-5.0 for the clustering algorithm. The proteins used for the phylogeny were those that were part of a cluster generated with any inflation value that had exactly one member from each genome, and are listed in Additional file 4 of the supplementary material. All of the proteins in the cluster were concatenated and the resulting sequences aligned by ClustalW [47]. ProtTest [48] was used to select a model of molecular evolution and MrBayes [49] was used to create a Bayesian estimation of the phylogeny. The single gene phylogeny was inferred from a ClustalW [47] alignment of homologs to the large subunit of the hydrogenase from the NCBI non-redundant database. Distances and branching order were determined by the neighbor-joining method [50] with bootstrap values from 1000 replicates in Mega [51].

Lateral gene transfer
A phylogenetic tree was inferred using PhyloGenie [52] for every protein from the six genomes. Homologous sequences for each protein were selected by BLAST from the non-redundant protein database from NCBI http://www.ncbi.nlm.nih.gov/, alignments were created with ClustalW [47], and the phylogeny was inferred using neighbor-joining [50] and 100 bootstrapped replicates. If, for a given protein, a phylogenetic relationship with non-Geobacteraceae was strongly-supported (bootstrap ≤ 50) or if the relationship was weakly supported and the most similar sequence in the non-redundant protein database from NCBI was not a Geobacteraceae species, the protein was considered a candidate. If the next branch out contained a single sequence not from Geobacteraceae species, the query gene was defined as being from lateral transfer. If the next branch contained a single sequence from Geobacteraceae, it was not. If the sister group was a clade or was not strongly supported, the ancestral condition was inferred [53] and used to determine lateral transfer.
Additional file 1: All proteins referenced in this study. Spreadsheet with NCBI identification numbers and descriptions including name, predicted function, COG membership, protein family ID, family conservation pattern, and lateral transfer prediction.