Transfer of energy pathway genes in microbial enhanced biological phosphorus removal communities
BMC Genomics volume 16, Article number: 526 (2015)
Lateral gene transfer (LGT) is an important evolutionary process in microbial evolution. In sewage treatment plants, LGT of antibiotic resistance and xenobiotic degradation-related proteins has been suggested, but the role of LGT outside these processes is unknown. Microbial communities involved in Enhanced Biological Phosphorus Removal (EBPR) have been used to treat wastewater in the last 50 years and may provide insights into adaptation to an engineered environment. We introduce two different types of analysis to identify LGT in EBPR sewage communities, based on identifying assembled sequences with more than one strong taxonomic match, and on unusual phylogenetic patterns. We applied these methods to investigate the role of LGT in six energy-related metabolic pathways.
The analyses identified overlapping but non-identical sets of transferred enzymes. All of these were homologous with sequences from known mobile genetic elements, and many were also in close proximity to transposases and integrases in the EBPR data set. The taxonomic method had higher sensitivity than the phylogenetic method, identifying more potential LGTs. Both analyses identified the putative transfer of five enzymes within an Australian community, two in a Danish community, and none in a US-derived culture.
Our methods were able to identify sequences with unusual phylogenetic or compositional properties as candidate LGT events. The association of these candidates with known mobile elements supports the hypothesis of transfer. The results of our analysis strongly suggest that LGT has influenced the development of functionally important energy-related pathways in EBPR systems, but transfers may be unique to each community due to different operating conditions or taxonomic composition.
Enhanced biological phosphorus removal (EBPR) communities are a common form of microbial treatment developed by Banard  that removes phosphorus and occasionally nitrogen from sewage. EBPR is environmentally sustainable and affordable , with microbial communities typically seeded from the local environment or from a seed stock. Considerable effort has been put into understanding EBPR, from community diversity (e.g. [3–5]), to metabolic function (e.g. [2, 6]) and engineering (e.g. [7, 8]), with the objective of improving efficiency and stability. A substantial amount of work has gone into understanding what organisms are present in EBPR plants [4, 5, 9–11], which organisms tend to be associated with each other (e.g. [4, 5]), their ecology (e.g. [10, 12, 13]), and how to engineer the EBPR process (e.g. [11, 14]). Recently, a conceptual ecosystem model  and a core microbiome  have been proposed, based mainly on 25 plants in Denmark, revealing a taxonomically broad group of characterized and uncharacterized organisms. However, the majority of EBPR-associated organisms are not found in all EBPR samples.
To develop the EBPR process, a carbon source, typically acetate or propionate, is input to the system, and anaerobic and aerobic conditions are cycled in a bioreactor to select for phosphate accumulating organisms (PAOs). Other organisms perform functions such as fermentation and hydrolysis, and are often referred to as the “flanking community” (e.g. [5, 16]). Phosphorus uptake occurs during the anaerobic cycle, and carbon- and energy-providing polymers are stored as polyhydroxyalkanoates (PHAs). During the aerobic phase, energy stored in the PHAs is used for growth and reproduction. The type of input carbon source is taken up at different rates for different organisms in EBPR, which could affect treatment plant operation . Because anaerobic and aerobic cycling is so important for EBPR community function, emphasis has been placed on metabolic pathways related to PHA metabolism (e.g. [17, 18]), glycolysis and gluconeogenesis (e.g. [16, 17, 19]), the pentose phosphate pathway (e.g. [2, 20]), and the citric acid cycle (e.g. [2, 16, 19, 20]) as a means to understand how EBPR functions, and through usage of a particular metabolic pathway, a way to make EBPR more efficient at removing phosphate (e.g. ).
Metagenomic sequencing of two lab-scale EBPR enrichment reactors allowed the elucidation of EBPR-relevant metabolism of a major PAO, the Betaproteobacterium Candidatus Accumulibacter phosphatis (CAP) Clade IIA strain UW-1 , including phosphate uptake and PHA degradation during the aerobic phase, and PHA storage and polyphosphate degradation during the anaerobic stage. The amount of sequence generated, and the technology used (Sanger sequencing, which generates relatively long reads) allowed the eventual assembly of the first complete genome of CAP Clade IIA strain UW-1. Recently, draft genome sequences of other CAP have been obtained [21, 22]. Sequencing of a full-scale reactor metagenome from Denmark highlighted an enrichment of genes associated with biofilm and phosphate metabolism, and the taxonomic diversity of full-scale reactor communities . Despite the existence of a group of organisms considered to be common in EBPR communities, the community structure and specific strains present can vary considerably between treatment plants .
Lateral gene transfer (LGT) is a well-established mode of evolution in bacteria that can be studied through a variety of approaches using genome sequences (e.g. [24–28]). LGT plays an important role in adaptation, for example, in heavy-metal metabolism (e.g. [29, 30]), and in antibiotic resistance (e.g. [29, 31]). Transfers tend to take place between close relatives, but many examples of transfer between more distant relatives have been reported as well (e.g. [24, 32]). LGT is known to have occurred in sewage treatment plants, impacting antibiotic resistance genes (e.g. [29, 33–36]), and xenobiotic degradation (e.g. [37, 38]). Many of these transfers are mediated by mobile genetic elements (MGEs) such as plasmids and transposons (e.g. [37, 38]). Engineering of treatment plants have used plasmids to bioaugment communities to allow metabolism of xenobiotics . Other mechanisms of LGT exist, such as gene transfer agents (e.g. ) and transformation (e.g. ), but their role in sewage treatment communities is not known.
The metagenomes of two non-EBPR sludge community plasmids were sequenced , revealing substantial differences in genes from a plant with primarily industrial waste and a plant with primarily household waste. The differences suggested that the prominence of carbohydrate metabolism genes from the industrial waste plant, and the genes related to defense factors in the household waste plant, were the result of selection in each of those communities. Others have noted that transferred plasmids in non-EBPR sludge can have a mosaic of functional genes . Some evidence of LGT has been identified in PAO genomes  but no such events have been proposed from metagenome data thus far.
There are many different bioinformatic approaches for the identification of LGT events (reviewed in [27, 42]), but most rely on whole-genome sequences. Different methods can identify very different sets of genes as putatively acquired via LGT (e,g, [28, 43, 44]). Metagenomic data introduce several challenges that make identification of LGT difficult, in particular, metagenome sequence fragments are short (typically < 1000 nucleotides in length) and of uncertain provenance in the community. Incorrectly assembled chimeric contigs often combine sequences from multiple members of the same genus, species or strain [45–47]. Chimeric contigs are more common in more diverse communities (e.g. ), and when using short-read sequencing technology with closely related strains , and can often lead to incorrect classification of contigs.
Despite these challenges, it would be an important step to develop sequence-based approaches to identify LGT within an environment to further our understanding of microbial adaptation. Approaches such as genetic exchange networks  could identify transfers between multiple taxonomic groups. Here we develop and apply two different analyses to identify candidate LGT events in EBPR metagenomic data for six relevant metabolic pathways. We focus on class-level gene transfers to avoid any errors in assembly at lower taxonomic levels that can affect the accuracy of classification. Our first method, classification discordance, exploits disagreement between taxonomic classifications of genes and longer assemblies. Our second method relies on phylogenetic incongruence. Both are then filtered by homology with known MGEs to identify putative cases of LGT that have been putatively transferred through MGEs.
The EBPR enrichment culture metagenomes for lab-scale bioreactors in Madison, Wisconsin, United States of America (USA) and Brisbane, Australia (AU) that comprised the first EBPR metagenome study , both sequenced using Sanger sequencing, were downloaded on April 21st 2009 from the Joint Genome Institute (http://genome.jgi.doe.gov/OZEBPRsludge/OZEBPRsludge.download.htm, projectID=201007& metagenome=4463936.3). The USA community is composed of 15,866 contigs and assemblies, 25,312,906 nucleotides, with reads an average of 986 nucleotides in length, and the AU community 11,188 contigs and 24,385,629 nucleotides, with reads an average of 1038 nucleotides in length. The EBPR metagenome for a full-scale bioreactor in Aalborg, Denmark (DK) that performs nitrogen removal in addition to phosphate removal , sequenced using Illumina GAII (2 x 72 paired end), was downloaded from the SEED (http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeOverview& metagenome=4463936.3), and is composed of 269,385 contigs and 145,725,513 nucleotides of sequence data. We used the assemblies and predicted genes and putative proteins as generated by the original sequencing projects.
Mobile genetic element sequence data consisted of MGEs from the Phast  and the ACLAME databases . The Phast database is composed primarily of viral sequences and the ACLAME database is composed of plasmids, phage genomes and transposons. We also included the complete NCBI plasmid database, and added other plasmids from NCBI that were not in the plasmid database, but matched the search terms “sewage treatment”, “waste-water” and “wastewater”. In total, this amalgamated database contained 7,584,934 sequences.
Taxonomic and functional annotation of metagenomic contigs
Class-level taxonomic classification of contigs was done using RITA . RITA uses a reference database to assign a taxonomic classification to sequence data using both homology and nucleotide composition. We used RITA v1.0.1 with a reference data set of over 2986 genomes representing 65 different taxonomic classes (Additional file 1), using USEARCH v4.1.93  for homology searches and FCP v1.0.3  for nucleotide composition matching. RITA performs taxonomic classification and assigns sequences to one of four confidence groups based on the strength of evidence in favor of that classification. Sequences with identical taxonomic predictions from both homology and composition were assigned to Group I. Group II comprised sequences where the expectation value for the best-matching genome was at least 10 orders of magnitude smaller than the best-matching genome from a different class. Group III assignments are made when the NB likelihood score for the best-matching genome is at least 1.5 times greater than the NB likelihood for the best-matching genome from another class. Group IV assignments are based only on the best NB likelihood value. Accuracy of classifications increases with longer contigs , so only contigs at least 1000 nucleotides in length were used.
Sequences were functionally annotated through a BLASTP (version 2.2.23)  protein similarity search. Annotations were based on the top hit to a reference data set of microbial proteins from the NCBI Protein Clusters database  with a 60 % alignment length of the predicted protein with the reference sequence, an expectation value of 10−5 or smaller, and neither the predicted protein or reference sequence greater than 1.2 times the length of the other. Additional annotations for enzymes were assigned using a publicly available version (58.1) of the KEGG database . A subset of KEGG pathways and their enzymes (see Table 1 and Additional file 2) related to EBPR metabolism during anaerobic and aerobic cycling, carbon feed source, and nitrogen metabolism were subjected to detailed analysis and were annotated with a more recent version (67.1) of KEGG: butanoate metabolism (BM) for EBPR PHA metabolism, citric acid cycle (CAC), glycolysis/gluconeogenesis (GG), pentose phosphate pathway (PPP), propanoate metabolism (PM) for EBPR propionate metabolism (propionate is the propanoate ion), and nitrogen metabolism (NM).
Identification of putative LGT events
Sequenced reference genomes are typically used for the identification of LGT, but metagenomes rarely produce reliable complete genome sequences. The complete genome of CAP Clade IIA strain UW-1 was however reconstructed from the USA EBPR metagenome. We used this genome to look for initial evidence of LGT in this EBPR community. We performed homology searches, using BLAST, of its genome against itself and 2773 reference genomes, and MGEs used in the EBPR-MGE homology searches. The top hits with a minimum of 60 % shared alignment were used as evidence of potential LGT.
We used two complementary approaches to identify putative LGT events in the EBPR metagenomes. The first approach identified strong disagreement between taxonomic classifications (“classification discordance”) of entire contigs and individual genes within those contigs. The second approach considered incongruence in phylogenetic trees as evidence of LGT. LGT identified by the two approaches were then filtered by homology with known MGEs.
Poor assembly could lead to chimeric contigs and spurious LGT inference. To assess the possible effects of misassembly, we examined the quality of the assemblies from which LGTs were inferred. Reads are available for the AU and USA communities (http://genome.jgi.doe.gov/OZEBPRsludge/OZEBPRsludge.download.htm, projectID=201007), but not the DK community, and none of the three communities had explicit mappings of reads to contigs. We mapped reads back to contigs using a BLASTN search (version 2.2.23), where 70 % of a read was required to align to a contig with a maximum expectation value of 10−30, and searched for reads that spanned some or all of the putatively transferred ORF and its neighbors.
The taxonomic classification of a whole contig suggests the lineage of the organism from which it was sequenced, but individual protein-coding open reading frames (ORFs) from the contig may differ in their taxonomic assignments. Such disagreements can suggest LGT events with an implied direction of transfer; the donor is the classification of the ORF, and the recipient is the classification of the entire contig. Each predicted ORF was classified at the class level using RITA with the same command-line parameters used above for the contig classifications, with ORFs from group I and group II RITA classifications considered as accurate.
Spuriously classified ORFs originating from classified contigs meeting our length requirements would lead to a questionable inference of LGT. To prevent this, we filtered out candidate transferred ORFs whose best composition-based prediction (i.e., the Naïve Bayes likelihood score) was not at least 15 % better than the contig prediction. If this criterion was satisfied, then the contig was considered a transfer recipient of the implicated ORF.
Phylogenetic methods incorporate models of the evolutionary process, providing a more accurate representation of evolutionary relationships amongst homologous sequences. We first performed all-versus-all BLAST (version 2.2.23) searches within each community to identify clusters of putative homologous proteins. These sets were then compared with 1642 reference prokaryotic genomes to expand and join clusters. Clusters were represented as an undirected graph using the “networkx” Python package (1.8.1). In the network, a node represents each sequence, and an undirected edge represents a homologous relationship between two sequences. For an edge to be drawn between two EBPR proteins, they must have 70 % sequence identity, and share 60 % alignment length with an e-value of 10−5 or smaller. This network was expanded by drawing edges between the nodes, the reference genome sequences and EBPR homologs meeting the BLAST similarity requirements. The network was then split into connected components, or a set of nodes that are connected to each other by a path of edges, where each connected component is considered a cluster.
The resulting clusters were often very large (≥1000 sequences), and included distantly related proteins of little use to LGT inference. To obtain sub-clusters, we constructed phylogenies and extracted subtrees. Sequence alignments were constructed from large clusters using MUSCLE (version 3.8.31)  with default settings, and trees were constructed using FastTree (version 2.1.4)  with the WAG model of amino acid evolution . We then manually extracted subtrees where FastTree Shimodaira-Hasegawa (SH)-test-based [60, 61] branch support values of at least 70 % denoted clusters of closely related sequences. Subtree extraction, alignment and phylogeny construction was repeated until subtrees comprised a maximum of approximately 200 sequences.
For detecting LGT, phylogenies are typically compared against a reference species tree (e.g. ). However, because EBPR community structure can vary over time (e.g. ) and metagenomes can represent incomplete samples of the total genetic material , crucial taxa including donors of genetic material may not be present in the sample. We used the DendroPy library  to calculate the patristic (branch-length) distances between sequences in the same phylogenetic tree, finding for each EBPR sequence the closest EBPR sequence from the same community and the closest reference sequence with an absolute branch length of 0.3 substitutions per site. Sequences with shorter branch lengths should be closest relatives.
Identifying candidate mobile genetic elements
Potential LGTs from each of the phylogenetic incongruence and classification discordance methods were then filtered by sequences that have homologs, as identified using BLAST with a maximum e-value of 10−30 against our custom database of MGEs. We included EBPR sequences with hits to MGE sequences of a different taxonomic class from the EBPR sequence, and had an alignment length of at least 60 % of the query EBPR sequence and 60 % of the subject MGE sequence.
The published CAP genome was used to find evidence of recent LGT, possibly in the context of the EBPR community. Of the 4562 sequences in the CAP genome, 1438 sequences had hits to genomes outside the Betaproteobacteria with the same e-value as the top CAP hit, suggesting the acquisition of many genes by CAP. The high degree of similarity indicates the possibility that many of these transfers occurred very recently. The observation of these recent transfers led us to search for LGT events in all sampled EBPR community members.
Filtering out contigs that were less than 1000 nucleotides in length reduced the size of the Sanger-sequenced datasets to ~48 % (USA) and ~65 % (AU) of their original sizes, while the Illumina-sequenced DK reactor metagenome was reduced to only ~6 % (Table 2). This result should be expected given the differences in read length and the expected differences in diversity between lab-scale reactors and full-scale reactors [11, 19]. The DK community had the largest number of taxonomic classes represented in the filtered contigs (63), followed by AU (53) and USA (39; see Additional file 3). For all communities, RITA classification Groups I-III accounted for the vast majority of classifications, although the relative proportion of contigs assigned to these groups varied (see Additional file 4). The number of potentially transferred ORFs from the retained contigs also varied by community, analysis type, and the six energy-related pathways.
Our first approach to identify putative LGT compared the taxonomic classification of an entire contig with the classification of its predicted ORFs. Of the ORFs that had hits to the metabolic pathways of interest, at least 50 % from each community (US: 20 ORFs, 68.9 %, AU: 88 ORFs, 55.7 %, and DK: 58 ORFs, 54.2 %) satisfied the criteria for discordance. All LGTs suggested by this method had hits to annotated MGEs from our database. The number of inferred transfers, the implicated enzymes and the participating taxonomic groups vary among metabolic pathways and communities (Additional file 5). However, some members appear to be more common recipients or donors of gene transfer in all communities and metabolic pathways, with Betaproteobacteria to Gammaproteobacteria (21 transfers) in AU the most common direction of transfer (Additional file 6). LGT events with Alphaproteobacteria as donor and Betaproteobacteria as recipient were the only pattern identified in all three communities.
Of the six pathways, the pentose phosphate pathway is the only pathway to not have any detected transfers in the DK community (Additional file 5), most likely due to lack of annotated enzymes. Certain pathways have enzymes that appear to have been transferred in all three communities: butanoate metabolism (enoyl-CoA hydratase: EC 220.127.116.11), glycolysis and gluconeogenesis (glucokinase: EC 18.104.22.168), nitrogen metabolism (asparagine synthase: EC 22.214.171.124, cytochrome-c oxidase: EC 126.96.36.199) and propanoate metabolism (EC 188.8.131.52). For example, for butanoate metabolism and propanoate metabolism, enzyme 184.108.40.206 is commonly transferred across all three communities, with directed networks suggesting transfers from Alphaproteobacteria and Betaproteobacteria to Gammaproteobacteria in the AU community, from Betaproteobacteria to Alphaproteobacteria in the USA community, and from Acidobacteria to Deltaproteobacteria (Fig. 1). These genetic exchange networks suggest that PAOs (e.g. Betaproteobacteria) and competing glycogen accumulating organisms (GAOs) (e.g., from Gammaproteobacteria and Alphaproteobacteria) may be involved in transfers of core metabolic enzymes. There also appears to be parallel transfer of genes between taxonomic groups across communities. For example, in glycolysis and gluconeogenesis, 6-phosphofructokinase (EC 220.127.116.11) shows evidence of transfer from Chloroflexi to the Betaproteobacteria in the USA and AU, but not in DK.
Transfers may be localized at key in some pathways, for example, where alternative paths between certain metabolites are not present, suggesting an important role for the transfer in the metabolism of the recipients. For example, in glycolysis and gluconeogenesis, glyceraldehyde-3-phosphate dehydrogenase (EC 18.104.22.168) and phosphoglycerate mutase (EC 22.214.171.124) are transferred in the DK community, and phosphoglycerate kinase (EC 126.96.36.199), phosphoglycerate mutase (EC 188.8.131.52) and phosphopyruvate hydratase (EC 184.108.40.206) are transferred in the AU community (Fig. 2). These enzymes are involved in a single path for reactions leading from glyceraldehyde-3-phosphate to phosphoenolpyruvate. Missing enzymes in pathways would increase the need for other enzymes to catalyze key reactions. LGT is one way that genes can be acquired by organisms that need specific enzymes for reactions in pathways. In gluconeogenesis and glycolysis, for example, polyphosphate glucokinase (EC 220.127.116.11) is missing in the AU and USA communities, but glucokinase (18.104.22.168) also catalyzes the reaction ß-D-Glucose to ß-D-Fructose-6-phosphate and shows evidence of LGT in all three communities (Fig. 2). Figures for the other five pathways, indicating gene transfers and the direction of transfer can be found in Additional files 7, 8, 9, 10, 11, 12, 13, 14 and 15.
Closer scrutiny of the transfers in the directed networks suggests multiple class-level transfers of the same enzyme between specific taxonomic groups. For example, on long contigs, for transfers to the Gammaproteobacteria in the AU community, enoyl-CoA hydratase (EC 22.214.171.124) has been identified as transferred once from the Alphaprotebacteria to the Gammaproteobacteria, and three times from the Betaproteobacteria to three different Gammaproteobacterial contigs. Inspecting the genes on the contig reveals two transposases on one contig and a single transposase on the other (Fig. 3). Classification of sequences in each contig indicates a mixed taxonomic history, suggesting that the present distribution of genes has arisen from a series of independent LGT events.
A set of 987 trees covering 46,031 proteins from 1622 reference organisms were extracted from an initial set of 981 trees covering 243,031 proteins from 1642 reference organisms. The direction of transfer is difficult to infer as metagenomic sequencing and quality-filtering approaches remove possible within-community donor and recipient lineages, and tree topologies often cannot distinguish which of two implicated lineages is the most likely donor. A total of 14, 27, and 1 (DK, AU, and USA communities, respectively) predicted EBPR proteins differed from a reference sequence or an EBPR protein of a different taxonomic class by less than 0.3 substitutions per site (Additional file 16). This represented 4.71 %, 8.84 %, and 0.65 % (DK, AU, USA communities, respectively) of all sequences whose closest relative was a member of a different taxonomic class, but did not meet the 0.3 substitutions per site branch length cutoff.
The recipient of the single proposed transfer within the USA community is classified as Gammaproteobacteria, with predicted function associated with glycolysis and gluconeogenesis (EC 126.96.36.199). The AU community accounted for the majority of transfers, with some transfers identified on the same contig, but not evenly distributed across each metabolic pathway. The DK community had the largest number of inferred transfers in the citric acid cycle and nitrogen metabolism pathway. In the AU community, transfers consistently involved sequences belonging to contigs classified as Gammaproteobacteria and Betaproteobacteria, with Alphaproteobacteria, Bacilli and Chlorobia also implicated in transfer of some of the metabolic pathways. For the DK community, no common taxonomic groups were shared across metabolic pathways, and no sequences identified as transferred were classified as Betaproteobacteria. The Cytophagia were implicated in three pathways (butanoate metabolism, citric acid cycle and nitrogen metabolism), while a mixture of the Alphaproteobacteria, Bacteroidia, Flavobacteriia, Gammaproteobacteria, Methanomicrobia, Sphingobacteria are other classes present in the other three pathways (gluconeogenesis and glycolysis, pentose phosphate pathway, propanoate metabolism).
MGE homology and a common prediction set
Each method of LGT detection differs in its ability to identify different types of LGT events. All high-confidence LGT events have homology with sequences in known MGEs. A substantial number of sequences from each community had hits to known MGEs: 11,718 of 30,516 sequences from the DK community, 16,156 of 24,956 sequences from the AU community, and 15,530 of 22,662 sequences from the USA community. Of those MGE homologs, 2097 DK, 824 AU, and 875 USA community sequences are enzymes in KEGG pathways (Additional file 17).
Given the very high proportion of metagenomic sequences matching to MGEs, we used additional criteria to support inferences of LGT. To obtain a high-confidence set of transfers, we examined the intersection of the two analyses for each of the six pathways (Fig. 4). Pathways differed by the percent of shared transfers, with each detection method sharing a different percentage of transfers. Up to 55 % of LGT events predicted by the classification discordance approach were shared with the phylogenetic approach. This wide variation in shared LGT events is not correlated to the number of detected LGT events, and illustrates the tendency of each approach to find different types of transfers.
A total of ten sequences, representing five enzymes, were identified as putatively transferred by the two approaches: enoyl-CoA hydratase (EC 188.8.131.52), acetyl-CoA C-acetyltransferase (EC 184.108.40.206), cytochrome-c oxidase (EC 220.127.116.11), phosphoglycerate kinase (EC 18.104.22.168), and 6-phosphofructokinase (EC 22.214.171.124). Of those ten sequences, eight were identified in the AU community, two in the DK community, and none in the USA community. All of the identified enzymes were present on plasmids in the ACLAME database, suggesting a possible mode of transfer. Both analyses almost always identified the same taxonomic classes as donors or as the top hit. The only transfer in DK, enzyme 126.96.36.199, was associated with nitrogen metabolism. Enzyme 188.8.131.52 was unique to GM. Two enzymes, 184.108.40.206 and 220.127.116.11, are common to BM and PM while 18.104.22.168 is common to glycolysis and gluconeogenesis and the pentose phosphate pathway. No common transfers were found that belonged to the CAC. For AU, six of the eight recipient contigs were classified as Gammaproteobacteria, with the remainder Betaproteobacteria and Chlorobia. For DK, the recipient contigs were classified as Bacilli and Cytophagia.
Closer inspection of the contigs that contained the transfers from each analysis provides further support for these putative LGT events. In total, ten of 88 contigs from the AU community, none of the 55 from the DK community, and three of the 19 from the USA community had integrases or transposases on contigs that contained transferred genes from both the classification discordance and phylogenetic incongruence methods. This subset of contigs with integrases and transposases are about two times (AU) or three times (USA) longer, and contain more genes: two to three (AU) or three (USA) more than all contigs with LGTs (Table 3). The relationship between LGT detection and contig length does indicate that longer contigs are more suitable for identification of LGT, and aid the identification of transposases and integrases. This could explain why the DK community did not have any identified transposases and integrases on contigs with an LGT: the majority of contigs were likely too short.
Since DK reads were not available and USA LGTs were not part of the shared set of transfers from both analyses, we were only able to assess coverage of AU LGTs. Only seven of the eight AU LGTs had matching reads, but all homologous reads had an expectation value of 0.0. Of the 57 reads with at least partial homology to the putatively transferred ORFs, 50 had alignments that extended into adjacent ORFs, suggesting that the inferred events were not due to misassembly. Two putatively transferred ORFs each had an aligned read that spanned the full length of the ORF (Additional file 18). One LGT had two reads that did not extend into neighbouring ORFs, and started or ended in intergenic regions. Alignments for the remaining five reads partially covered the putatively transferred ORFs.
Using a series of approaches that are applicable to metagenomic data, we found strong evidence that LGT has impacted six energy metabolism pathways in EBPR communities. Some genes appear to have been independently transferred in more than one community. Although some groups are associated with multiple LGT events, no clear patterns of donor/recipient partners emerged for all three communities. The common set of transfers between the two analyses, and MGE homology filtering, provide the strongest evidence for LGT. The majority of transfers shared by both analyses were identified in the AU community, none in the USA community, and only two genes transferred in the DK community, which were the only shared transfers identified in nitrogen metabolism. Differences in predicted events across the three communities may represent independent evolutionary trajectories, differences in local community composition, or biases in observation due to incomplete sampling of the metagenome.
Our contig length and ORF taxonomic quality-filtering approaches favored the detection of a relatively small set of high-confidence LGT predictions. Although choosing the class level decreases the number of potential LGTs found and precludes detection of LGT between members of the same class, the long-range transfers we have identified show the strongest evidence for discordance. Our use of contigs in excess of 1000 nucleotides long considerably reduced the proportion of sequences being retained, especially for the DK community, where the average contig length was 504 nucleotides. However, longer contigs are better for detecting LGT (Table 3). This could be due to a higher probability that genes from a different source are found on longer contigs, or inaccurate classification due to short contigs. Additionally, longer contigs were needed to identify transposases and integrases in tandem with our genes of interest.
Mapping of metagenomic reads to contigs validated most of our LGT inferences; however, one putatively transferred ORF in our high-confidence set did not have any matching reads. Accuracy of assemblies, including metagenomic assemblies, depends on sequencing technology and the complexity of communities [65, 66]. Less-complex communities (~10 genomes) have the most accurate assemblies with Sanger sequencing, and complex communities (100+ genomes) have the most accurate assemblies with Illumina sequencing . Regardless of assembly accuracy, it is unclear why this ORF should be present in the assembled contigs, while having no corresponding match in the reads used to generate those contigs.
Different methods of detecting LGT are often biased towards finding certain types of transfer events [43, 44]. Our approaches do not identify transfers at lower taxonomic levels and are biased towards detection of complete genes. Naïve Bayes likelihood ORF filtering should eliminate many dubious classifications, but does not provide any information about the age of the transfer event. The phylogenetic approach provides information about age of transfers, but identified the fewest candidate LGT events. This is because it requires that the donor lineage in the community or a close relative be sampled, and LGT events that do not appreciably distort the tree will not be detected by this approach.
Different EBPR plants have distinct population characteristics [4, 9], with different operational parameters between the sampled EBPR communities, and full-scale plants being more complex and dynamic than lab-scale reactors [10, 11, 19]. All three communities use different carbon feeds: molasses in DK , propionate in AU and acetate in USA . Propionate has been shown to be a more desirable carbon source relative to acetate, providing PAOs a selective advantage over competitors, and resulting in a more stable community over time [12, 17, 67, 68]. The propanoate metabolic pathway, which shows different amounts of evidence for LGT between the three communities, with very few transfers in the acetate-fed USA community, and a large number of transfers in the propionate-fed AU community, especially between the Betaproteobacteria and Gammaproteobacteria. The DK community has an intermediate number of transfers, but with more taxonomic groups implicated than the AU community. The taxonomic composition of EBPR communities is known to change over time , and with changing carbon sources ; this variability may also manifest through gene exchange between constituents of the community.
Focusing on LGT in energy-related metabolic pathways considered relevant to EBPR function provides context to the role of LGT in EBPR communities. LGT events not in the six energy pathways are also likely to be important in EBPR communities, such as phosphate metabolism, bacteriophage resistance, and flocculation/biofilm formation. Future analyses should also focus on other metabolic pathways for insights into alternative metabolism, and in broad functional categories for overall community functional aspects of LGT. Additional sequencing of EBPR communities would provide further insight into whether there are common LGT events.
Banard JL. A review of biological phosphorus removal in activated sludge. Water Sa. 1976;2:136–44.
Oehmen A, Lemos PC, Carvalho G, Yuan Z, Keller J, Blackall LL, et al. Advances in enhanced biological phosphorus removal: from micro to macro scale. Water Research. 2007;41:2271–300.
He S, Gu AZ, McMahon KD. Fine-scale differences between Accumulibacter-like bacteria in enhanced biological phosphorus removal activated sludge. Water Sci Technol. 2006;54:111–7.
Mielczarek AT, Nguyen HTT, Nielsen JL, Nielsen PH. Population dynamics of bacteria involved in enhanced biological phosphorus removal in Danish wastewater treatment plants. Water Research. 2013;47:1529–44.
Nielsen PH, Saunders AM, Hansen AA, Larsen P, Nielsen JL. Microbial communities involved in enhanced biological phosphorus removal from wastewater—a model system in environmental biotechnology. Current Opinion in Biotechnology. 2012;23:452–9.
Yuan Z, Pratt S, Batstone DJ. Phosphorus recovery from wastewater through microbial processes. Current Opinion in Biotechnology. 2012;23:878–83.
Tu Y, Schuler AJ. Low acetate concentrations favor polyphosphate-accumulating organisms over glycogen-accumulating organisms in enhanced biological phosphorus removal from wastewater. Environmental Science & Technology. 2013;47:3816–24.
Zhang T, Liu Y, Fang HHP. Effect of pH change on the performance and microbial community of enhanced biological phosphate removal process. Biotechnology and Bioengineering. 2005;92:173–82.
He S, McMahon KD. Microbiology of “Candidatus Accumulibacter” in activated sludge. Microbial Biotechnology. 2011;4:603–19.
Kong Y, Ong SL, Ng WJ, Liu W-T. Diversity and distribution of a deeply branched novel proteobacterial group found in anaerobic-aerobic activated sludge processes. Environmental Microbiology. 2002;4:753–7.
Wong M-T, Mino T, Seviour RJ, Onuki M, Liu W-T. In situ identification and characterization of the microbial community structure of full-scale enhanced biological phosphorous removal plants in Japan. Water Research. 2005;39:2901–14.
Gonzalez-Gil G, Holliger C. Dynamics of microbial community structure and enhanced biological phosphorus removal of propionate- and acetate-cultivated aerobic granules. Applied and Environmental Microbiology. 2011;77:8041–51.
He S, Gu AZ, McMahon KD. Progress toward understanding the distribution of Accumulibacter among full-scale enhanced biological phosphorus removal systems. Microb Ecol. 2008;55:229–36.
Zhang T, Zhang X-X, Ye L. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge. PLoS ONE. 2011;6:e26041.
Nielsen PH, Mielczarek AT, Kragelund C, Nielsen JL, Saunders AM, Kong Y, et al. A conceptual ecosystem model of microbial communities in enhanced biological phosphorus removal plants. Water Research. 2010;44:5070–88.
García Martín H, Ivanova N, Kunin V, Warnecke F, Barry KW, McHardy AC, et al. Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol. 2006;24:1263–9.
Oehmen A, Yuan Z, Blackall LL, Keller J. Comparison of acetate and propionate uptake by polyphosphate accumulating organisms and glycogen accumulating organisms. Biotechnology and Bioengineering. 2005;91:162–8.
Seviour RJ, Mino T, Onuki M. The microbiology of biological phosphorus removal in activated sludge systems. FEMS Microbiology Reviews. 2003;27:99–127.
Lanham AB, Oehmen A, Saunders AM, Carvalho G, Nielsen PH, Reis MAM. Metabolic versatility in full-scale wastewater treatment plants performing enhanced biological phosphorus removal. Water Research. 2013;47:7032–41.
McIlroy SJ, Albertsen M, Andresen EK, Saunders AM, Kristiansen R, Stokholm-Bjerregaard M, et al. “Candidatus Competibacter-”lineage genomes retrieved from metagenomes reveal functional metabolic diversity. The ISME Journal. 2013;8:613–24.
Flowers JJ, He S, Malfatti S, del Rio TG, Tringe SG, Hugenholtz P, et al. Comparative genomics of two “Candidatus Accumulibacter” clades performing biological phosphorus removal. The ISME Journal. 2013;7:2301–14.
Mao Y, Yu K, Xia Y, Chao Y, Zhang T. Genome reconstruction and gene expression of “Candidatus Accumulibacter phosphatis” Clade IB performing biological phosphorus removal. Environmental Science & Technology. 2014;48:10363–71.
Albertsen M, Hansen LBS, Saunders AM, Nielsen PH, Nielsen KL. A metagenome of a full-scale microbial community carrying out enhanced biological phosphorus removal. The ISME Journal. 2011;6:1094–106.
Beiko RG, Harlow TJ, Ragan MA. Highways of gene sharing in prokaryotes. Proceedings of the National Academy of Sciences of the United States of America. 2005;102:14332–7.
Koonin EV, Galperin MY. Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Current Opinion in Genetics & Development. 1997;7:757–63.
Lawrence JG, Ochman H. Amelioration of bacterial genomes: rates of change and exchange. Journal of Molecular Evolution. 1997;44:383–97.
Ragan MA. Detection of lateral gene transfer among microbialgenomes. Current Opinion in Genetics & Development. 2001;11:620–6.
Ragan MA. On surrogate methods for detecting lateral gene transfer. FEMS Microbiology Letters. 2001;201:187–91.
Sentchilo V, Mayer AP, Guy L, Miyazaki R, Green Tringe S, Barry K, et al. Community-wide plasmid gene mobilization and selection. The ISME Journal. 2013;6:1173–86.
Sobecky PA, Coombs JM. Horizontal gene transfer in metal and radionuclide contaminated soils. In: Gogarten MBB, Gogarten JP, Lorraine O, editors. Horizontal Gene Transfer, Genomes in Flux. Clifton, NJ: Humana Press; 2009. p. 455–72.
Barlow M. What antimicrobial resistance has taught us about horizontal gene transfer. In: Gogarten MBB, Gogarten JP, Lorraine O, editors. Horizontal Gene Transfer, Genomes in Flux. Clifton, NJ: Humana Press; 2009. p. 397–411.
Popa O, Hazkani-Covo E, Landan G, Martin W, Dagan T. Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Research. 2011;21:599–609.
Hong H, Ko H-J, Choi I-G, Park W. Previously undescribed plasmids recovered from activated sludge confer tetracycline resistance and phenotypic changes to Acinetobacter oleivorans DR1. Microb Ecol. 2014;67:369–79.
Ma L, Zhang X-X, Zhao F, Wu B, Cheng S, Yang L. Sewage treatment plant serves as a hot-spot reservoir of integrons and gene cassettes. Journal of Environmental Biology. 2013;34(2 Spec No):391–9.
Szczepanowski R, Bekel T, Goesmann A, Krause L, Krömeke H, Kaiser O, et al. Insight into the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to antimicrobial drugs analysed by the 454-pyrosequencing technology. Journal of Biotechnology. 2008;136:54–64.
Zhang, T., Zhang, X.-X., & Ye, L. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge. PLoS ONE. 2011; 6(e26041).
Schlüter A, Krahn I, Kollin F, Bönemann G, Stiens M, Szczepanowski R, et al. IncP-1-beta plasmid pGNB1 isolated from a bacterial community from a wastewater treatment plant mediates decolorization of triphenylmethane dyes. Applied and Environmental Microbiology. 2007;73:6345–50.
Top EM, Springael D, Boon N. Catabolic mobile genetic elements and their potential use in bioaugmentation of polluted soils and waters. FEMS Microbiology Ecology. 2002;42:199–208.
Bathe S, Schwarzenbeck N, Hausner M. Plasmid-mediated bioaugmentation of activated sludge bacteria in a sequencing batch moving bed reactor using pNB2. Letters in Applied Microbiology. 2005;41:242–7.
Lang AS, Zhaxybayeva O, Beatty JT. Gene transfer agents: phage-like elements of genetic exchange. Nature Reviews Microbiology. 2012;10:472–82.
Thomas CM, Nielsen KM. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol. 2005;3:711–21.
Zhaxybayeva O. Detection and quantitative assessment of horizontal gene transfer. In: Gogarten MBB, Gogarten JP, Lorraine O, editors. Horizontal Gene Transfer, Genomes in Flux. Clifton, NJ: Humana Press; 2009. p. 195–213.
Lawrence JG, Ochman H. Reconciling the many faces of lateral gene transfer. Trends Microbiol. 2002;10:1–4.
Ragan MA, Harlow TJ, Beiko RG. Do different surrogate methods detect lateral genetic transfer events of different relative ages? Trends in Microbiology. 2006;14:4–8.
Charuvaka A, Rangwala H. Evaluation of short read metagenomic assembly. BMC Genomics. 2011;12 Suppl 2:S8.
Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007;4:495–500.
Pignatelli M, Moya A. Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS ONE. 2011;6:e19984.
Skippington E, Ragan MA. Lateral genetic transfer and the construction of genetic exchange communities. FEMS Microbiology Reviews. 2011;35:707–35.
Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. PHAST: a fast phage search tool. Nucleic Acids Res. 2011;39(Web Server issue):W347–52.
Leplae R, Lima-Mendez G, Toussaint A. ACLAME: a CLAssification of Mobile genetic Elements, update 2010. Nucleic Acids Res. 2010;38(Database issue):D57–61.
Macdonald NJ, Parks DH, Beiko RG. Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucleic Acids Res. 2012;40:e111.
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–1.
Parks DH, Macdonald NJ, Beiko RG. Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics. 2011;12:328.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–10.
Klimke W, Agarwala R, Badretdin A, Chetvernin S, Ciufo S, Fedorov B, et al. The National Center for Biotechnology Information's Protein Clusters Database. Nucleic Acids Res. 2009;37(Database issue):D216–23.
Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 2000;28:27.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution. 2009;26:1641–50.
Goldman N, Whelan S. Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Molecular Biology and Evolution. 2000;17:975–8.
Shimodaira H, Hasegawa M. Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference. Molecular Biology and Evolution. 1999;16:1114–6.
Goldman N, Anderson JP, Rodrigo AG. Likelihood-based tests of topologies in phylogenetics. Systematic Biology. 2000;49:652–70.
Slater FR, Johnson CR, Blackall LL, Beiko RG, Bond PL. Monitoring associations between clade-level variation, overall community structure and ecosystem function in enhanced biological phosphorus removal (EBPR) systems using terminal-restriction fragment length polymorphism (T-RFLP). Water Research. 2010;44:4908–23.
Ni J, Yan Q, Yu Y. How much metagenomic sequencing is enough to achieve a given goal? Scientific Reports. 2013;3:1968.
Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010;26:1569–71.
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nature Reviews Genetics. 2014;15:121–32.
Mende DR, Waller AS, Sunagawa S, Järvelin AI, Chan MM, Arumugam M, et al. Assessment of metagenomic assembly using simulated next generation sequencing data. PLoS ONE. 2012;7:e31386.
Chen Y, Randall AA, McCue T. The efficiency of enhanced biological phosphorus removal from real wastewater affected by different ratios of acetic to propionic acid. Water Research. 2004;38:27–36.
Thomas M, Wright P, Blackall L, Urbain V, Keller J. Optimisation of Noosa BNR plant to improve performance and reduce operating costs. Water Sci Technol. 2003;47:141–8.
DHJW was supported by the TULA Foundation and Genome Atlantic. RGB acknowledges the support of the Canada Research Chairs program. Computational analyses were supported by a grant from the Canada Foundation for Innovation to RGB. We are also grateful to Christian Blouin, Norbert Zeh, Elizabeth Edwards, Cheryl Devine, and Frances Slater for their feedback and enlightening discussions.
The authors declare that they have no competing interests.
Both authors conceived of the approach, designed the analyses, interpreted results, and wrote the manuscript. DHJW carried out the analyses. Both authors read and approved the final manuscript.
Count of RITA reference genomes by taxonomic class.
List of enzymes for each KEGG pathway and their names.
Classes and counts of contigs classified at the class level greater than 1000 nucleotides long for each community. Classes shared between each community are also listed. Not all classes could be classified to the class level, and are identified by the taxonomic group they are classified to, followed by ‘(class)’.
Count of contigs classified under each RITA classification method. Group 1 is when homology and nucleotide composition both agree. Group 2 is where the USEARCH expectation value for the best-matching genome was at least 10 orders of magnitude smaller than the best-matching genome from a different class. Group 3 assignments are made when the NB likelihood score for the best-matching genome is at least 1.5 times greater than the NB likelihood for the best-matching genome from another class. Group 4 assignments are based only on the best NB likelihood value.
Potential LGTs from taxonomic discordance analysis. Taxonomic discordance analysis (Naïve Bayes filtered and unfiltered) predicted proteins identified as transferred for each pathway, their EC annotation, class-level taxonomic classification of source (ORF) and recipient (contig).
Direction of transfers in classification discordance analysis.
KEGG citrate cycle pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG propanoate metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG pentose phosphate pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG nitrogen metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG partial nitrogen metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities in the inner membranes of Nitromonas europaea, archaea and bacteria. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG partial nitrogen metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities in Rhodospirillum rubum. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG partial nitrogen metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities in Kuenenia stuttgartiensis, Candidatus Methylomairabilis oxyfera, Klebsiella oxytoca and Synechococcus PCC7942 . Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG partial nitrogen metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
KEGG butanoate metabolism pathway and directed LGT for the Denmark (DK), Australian (AU) and United States (USA) EBPR communities. Dashed boxes indicate LGT, with solid symbols indicating LGT predicted within a community, and hollow symbols indicating missing enzymes in a community. Greyed out enzymes are not found in any community. See Table 1 for enzyme names and Additional file 18 for taxonomic abbreviation guide.
Potential LGT events from phylogenetic analysis. Phylogenetic analysis predicted proteins identified as transferred for each pathway, their EC annotation, and class-level taxonomic classification of the contig they are located.
Sequences from each community with homologs to known mobile genetic elements. EBPR sequences with homologs to mobile genetic elements for each pathway, their EC annotation, and class-level taxonomic classification of the contig they are located.
List of taxonomic classes and their abbreviations.
About this article
Cite this article
Wong, D.HJ., Beiko, R.G. Transfer of energy pathway genes in microbial enhanced biological phosphorus removal communities. BMC Genomics 16, 526 (2015). https://doi.org/10.1186/s12864-015-1752-5
- Enhanced biological phosphorus removal
- Lateral gene transfer
- Mobile genetic elements