- Research article
- Open Access
The alignment of enzymatic steps reveals similar metabolic pathways and probable recruitment events in Gammaproteobacteria
© Poot-Hernandez et al. 2015
- Received: 3 February 2015
- Accepted: 19 October 2015
- Published: 17 November 2015
It is generally accepted that gene duplication followed by functional divergence is one of the main sources of metabolic diversity. In this regard, there is an increasing interest in the development of methods that allow the systematic identification of these evolutionary events in metabolism. Here, we used a method not based on biomolecular sequence analysis to compare and identify common and variable routes in the metabolism of 40 Gammaproteobacteria species.
The metabolic maps deposited in the KEGG database were transformed into linear Enzymatic Step Sequences (ESS) by using the breadth-first search algorithm. These ESS represent subsequent enzymes linked to each other, where their catalytic activities are encoded in the Enzyme Commission numbers. The ESS were compared in an all-against-all (pairwise comparisons) approach by using a dynamic programming algorithm, leaving only a set of significant pairs.
Results and conclusion
From these comparisons, we identified a set of functionally conserved enzymatic steps in different metabolic maps, in which cell wall components and fatty acid and lysine biosynthesis were included. In addition, we found that pathways associated with biosynthesis share a higher proportion of similar ESS than degradation pathways and secondary metabolism pathways. Also, maps associated with the metabolism of similar compounds contain a high proportion of similar ESS, such as those maps from nucleotide metabolism pathways, in particular the inosine monophosphate pathway. Furthermore, diverse ESS associated with the low part of the glycolysis pathway were identified as functionally similar to multiple metabolic pathways. In summary, our comparisons may help to identify similar reactions in different metabolic pathways and could reinforce the patchwork model in the evolution of metabolism in Gammaproteobacteria.
- Pathway alignment
- Enzyme commission number
The study of the evolution of metabolism is central to understanding the adaptive processes of cellular life, the emergence of high levels of organization (multicellularity), and the diversity and complexity of the living world [1, 2]. At present, the large-scale information derived from genomic and proteomic studies has allowed the development of databases devoted to organizing the metabolic processes, such as the KEGG  and MetaCyc . The information contained in these databases can be used to generate an integrative perspective of cellular functioning.
Metabolism can be considered one of the most ancient biological networks, where the nodes represent substrates and/or enzymes and the edges represent the relationships among them. From this perspective, the study of metabolic networks has focused on describing topological properties and has showed the existence of a structured network architecture [5–7]. Another relevant feature of metabolic networks is their modularity [8, 9], where each module is a discrete entity of elementary components (enzymes and substrates) that performs a certain task, separable from the functions of other modules. The elements of each module are related to each other and may be subjected to the same evolutionary process, such as amino acid biosynthesis, where a high rate of duplication events has been identified . In this regard, metabolic pathways exhibit high retention of duplicates within functional modules and a preferential biochemical coupling of reactions. This retention of duplicates may result from the biochemical rules governing substrate-enzyme-product relationships [11–13].
In this work, we ask whether there are groups of similar reactions in different or in the same metabolic pathways, which might suggest a transfer of enzymatic activities, and whether these groups can be used to define common and variable metabolic pathways in 40 organisms belonging to the Gammaproteobacteria division. Gammaproteobacteria are excellent models to consider because they contain a large diversity of species , such as the bacterium Escherichia coliK-12, for which a large number of molecular and functional mechanisms have been elucidated. In addition, Gammaproteobacteria include organisms widely distributed throughout diverse environments, such as the endocommensal bacterium Ruthia magnifica , obligate endosymbionts Baumannia sp. and Buchnera sp., photoautotrophs such as Halorhodospira halophile , and mammal pathogens, such as Yersinia spp. and Vibrio spp., among others [17, 18].
To this end, we implemented a general strategy that considers the transformation of the metabolic maps deposited in the KEGG database into linear Enzymatic Step Sequences (ESS) and their posterior comparison with a dynamic programming sequence alignment algorithm. From these comparisons, we show that maps associated with the metabolism of similar compounds also contain a high proportion of similar ESS. In addition, we evaluate the possible contribution of two ancient pathways, glycolysis and IMP, to the metabolic pathways growth. Finally, we consider that our comparisons may provide clues reinforcing the patchwork model in the evolution of metabolism in Gammaproteobacteria.
Construction and comparison of ESS
A natural observation that emerged from these sequences concerns their redundancy, i.e., identical ESS derived from different organisms. To reduce this redundancy and to facilitate the subsequent analysis, identical sequences were identified and excluded, leaving a representative of them and defining the non redundant ESS (nrESS) dataset. From this filtering, 7970 different nrESS were considered for posterior analyses. The nrESS length histogram was similar to that for the complete set of ESS, with a mean length of 5.4 and a mode equal to 4 (Fig. 1a). In this report, we refer only to the nrESS.
In a second step, the nrESS were compared by using the dynamic programming Needleman and Wunsh (NW) algorithm in an all-against-all strategy. The alignment generated by this algorithm was evaluated by using an entropy based normalized function that yields values in the interval from 0 to 1. Hence, values close to 0 mean less entropy and more homogeneous columns in the alignment, reflecting more similar nrESS. Conversely, values close to 1 reflect dissimilar nrESS.
From these comparisons, we found that the distribution of the scores resembled an extreme value Gumbel distribution (Additional file 2: Figure S2), with the highest proportion of the scores close to 1, i.e., the major proportion of alignments occurs between dissimilar nrESS. To evaluate the statistical significance of all comparisons, 10 random databases were generated by shuffling the real nrESS, maintaining the EC composition and length sizes. The random databases were analyzed in the same all-against-all fashion, and the resulting scores were compared against real alignment scores. In Fig. 1c we show the cumulative histogram of the alignment scores of the real and random datasets. Based on this analysis, scores close to 0 are overrepresented in real data compared to random nrESS. To evaluate this overrepresentation, the deviation of the real dataset relative to the mean ± standard deviation of the 10 random datasets was calculated (Fig. 1d). According to these data, the real and random scores intersect at 0.65, suggesting that this value is the limit to identify distant similarities; therefore, an alignment with a score of ≈ 0.65 may be considered clearly random. Based on this information, a significant alignment threshold was established to analyze the most of the nrESS, with not compromising the statistical relevance. Therefore, a score of ≤0.3 was established as threshold. This value represents the higher dispersion (>195 SD) of the random data (Fig. 1d, asterisk) with the lowest loss of nrESS, i.e., more than 99 % of the real nrESS were included (Additional file 2: Figure S3). This threshold also corresponds to 0.26 % of all nrESS alignments (81,520 of 31,756,465) and includes 7907 out of 7970 nrESS. In contrast, from the alignments associated with the 10 random databases (31,756,465 for each dataset), only 0.04 ± 0.001 % (13,827 ± 308) of the total alignments exhibited a threshold of ≤0.3. These results show that our method can be used to identify similar nrESS with significant scores, excluding the possibility of finding such similar nrESS by random chance. Here, we report information concerning our comparisons of these nrESS related to metabolism in diverse bacterial organisms.
Pairwise alignments of nrESS identify a core of common metabolic pathways in Gammaproteobacteria
To assess the nrESS similarity of each metabolic map as an indicator of functional conservation, we used the alignments that occurred within them (green edges in Fig. 2a), and we named this dataset the Metabolic Map Functional Conserved Dataset (MMFCD). The proportion of each metabolic type represented in this dataset is shown in Fig. 2b, and corresponds primarily to nrESS of the metabolism of nucleotides, followed by the metabolism of carbohydrates, cofactors and vitamins, amino acids, and lipids. In contrast, the pathways for xenobiotic biodegradation and metabolism, biosynthesis and other secondary metabolism, metabolism of other amino acids, and metabolism of terpenoids and poliketides, among others, represent less than 5 % of the total nrESS included in the dataset. From these alignments, we mapped the position of the highly similar nrESS in the corresponding metabolic map to determine the proportion of the functionally conserved EC numbers in relation to the total EC numbers present in Gammaproteobacteria (Fig. 2c). Using this information, we classified the metabolic maps into four groups: 1) maps with more than 70 % of the EC numbers identical, i.e. highly functionally conserved; 2) moderately functionally conserved maps, with percentages between 30 % and 69 %; 3) barely functionally conserved, i.e., those maps with percentages between 1 % and 29 %; finally, 4) variable maps, i.e., with 0 % EC classified as functional conserved. From these data, less than one-third of the analyzed maps (24 of 86) were classified as highly or moderately functionally conserved, while more than two-thirds were considered as barely functionally conserved or variable. All these data showed that more than half of the metabolic maps analyzed did not exhibit common nrESS in Gammaproteobacteria and, by consequence, may be considered variable, suggesting a high variability in the metabolism of this bacterial division.
In detail, maps classified as highly functionally conserved are related to important processes, like the pathways for fatty acid biosynthesis (map00061), metabolism of some amino acids (00290 for valine, leucine, and isoleucine biosynthesis; 00300 for lysine biosynthesis), components of the cell wall (00540 for lipopolysaccharide biosynthesis; 00550 for peptidoglycan biosynthesis), metabolism of some cofactors (00770 for pantothenate and CoA biosynthesis; 00780 for biotin metabolism; 00785 for lipoic acid metabolism), and novobiocin biosynthesis (00401). These functional similarity also correlate with the fact that amino acid metabolism pathways for valine, leucine, isoleucine, and lysine have been identified as pathways with diverse duplicated genes in the three cellular domains of life [10, 23].
The second group includes those maps defined as moderately functionally conserved. In this category were included the pathways for metabolism of purines (00230) and pyrimidines (00240), glycolysis/gluconeogenesis (00010), the citrate cycle (00020), metabolism of glycerophospholipids (00564), terpenoids backbone (00900), and some cofactors, like riboflavin (00740), nicotinamide (00760), folate (00790), and thiamine (00730). It is interesting that the central part of glycolysis (00010), the Embden-Meyerhof pathway, is partially conserved among Gammaproteobacteria (Fig. 2d), whereas the core pathway that comprises the tricarbon compounds is widely functionally conserved among the analyzed organisms, including the oxidation of pyruvate to acetyl CoA. In the hexose section, the enzymatic steps catalyzed by 6-phosphofructokinase (EC220.127.116.11) and fructose biphosphate aldolase (EC18.104.22.168) are considered variable. A similar result was observed with the glycolysis input, where the mechanisms by which the hexoses enter the pathway are variable. In addition, the enzymatic steps to transform pyruvate to lactate and the ethanolic fermentation from acetate are also variable. In a similar way, gluconeogenesis from oxaloacetate is partially functionally conserved in Gammaproteobacteria, where the enzymes allowing the input from the oxaloacetate (phosphoenol pyruvate carboxykinase, EC22.214.171.124 and 126.96.36.199) and the enzyme that dephosphorylates fructose 1,6-bisphosphate to fructose 6-phosphate (fructose biphosphatase, EC188.8.131.52) are considered variable. These results are congruent with those from a previous study, where it was concluded that glycolysis is a plastic pathway and that the lower part of the glycolysis pathway is the more conserved section among the three cellular domains .
A similar conservation pattern is observed in other metabolic maps classified as moderately functionally conserved, such as the pathway for glycerophospholipid metabolism (00564). We found that the biosynthetic pathways to CDP-diacylglycerol and then to phosphatidyl glycerol, phosphatidyl serine, and phosphatidyl ethanolamine are conserved, while the biosynthetic pathway to phosphatidyl choline and the degradation pathways are variable. A similar result arises for the biosynthesis of cofactors like thiamine-diphosphate (map 00730), riboflavin (map 00740), NAD+ and NADP+ (map 00760), and tetrahydrofolate (map 00790). In conjunction, it is possible to deduce a functional conservation pattern for Gammaproteobacteria, where some metabolic maps contain a biosynthesis-related core of similar enzymatic steps, and some variable steps that include the degradation of various compounds. These variable or dispensable steps may represent possible alternative pathways in different organisms and/or in different ecological niches, as has been previously suggested [10, 28].
The group of metabolic maps classified as barely functionally conserved includes important processes, such as amino acid metabolism, fatty acid degradation (beta-oxidation), and glycerolipid metabolism. In this context, we identified many variable reactions in the map that describes alanine, aspartate, and glutamate metabolism (map 00250), suggesting the existence of alternative pathways to produce these compounds. In this regard, there are three possible enzymes that catalyze the conversion of L-glutamate to L-glutamine: one of them by a ligation reaction (glutamine synthetase, EC184.108.40.206) and two by reversible hydrolysis (glutaminase, EC220.127.116.11, and L-glutamine (L-asparagine) amidohydrolase, EC18.104.22.168). In particular, the L-glutamine (L-asparagine) amidohydrolase also catalyzes the deamination of asparagine to aspartate. This finding suggests more flexible networks for the production of amino acids and reinforces the notion of various alternative enzymes for the production of amino acids . A similar observation arises for cysteine and methionine metabolism (map 00270), for which alternative pathways were also identified. For example, the pathway to produce methionine from aspartate (module M00017) is not completely conserved in Gammaproteobacteria; nevertheless, there are some alternative enzymes that may work as alternative paths for the synthesis of methionine. Interestingly, some of these alternative enzymes were identified as functionally conserved in this work, suggesting not only the absence of a conserved canonical route but also important alternative enzymatic steps.
Finally, the variable maps include a high diversity of metabolisms types. Some of them contain few or fragmented enzymatic steps present in at least one Gammaproteobacteria species, suggesting the absence of those metabolic maps in this clade. However, other maps contain many enzymes present in Gammaproteobacteria; such as those for seleno compound metabolism, galactose metabolism, pentose phosphate and pentose metabolism, and glucuronate metabolism, among others. In general, the maps classified in this category represent pathways for degradation of uncommon compounds (xenobiotics) or for alternative carbon sources (carbohydrate metabolism). Altogether, these observations in addition to supporting the previously proposed idea concerning the reduced conservation of degradation related pathways; reinforce the notion of differential enzyme recruitment across the clade. Also, our results support the proposed preponderance of central carbon and anabolic pathways in the evolution of metabolism [2, 8, 27, 29].
In summary, all these data allow the identification of a core of similar enzymatic steps in Gammaproteobacteria. This core includes primarily reactions of the central carbon metabolism (low part of glycolysis and tricarboxylic acid cycle), and the biosynthetic pathways for nucleotides, cofactors and some amino acids. In addition this core is complemented with a set of variable pathways that primarily includes degradation pathways for carbohydrates, amino acids and xenobiotics that may be essential to the particular life style of each organism.
The complete set of functional conservation of metabolic maps in Gammaproteobacteria is available as KEGG weblinks in Additional file 3: File S2.
Metabolic maps that convert similar compounds also share similar nrESS
Similar nrESS suggest that enzyme recruitment is a frequent event in metabolism of Gammaproteobacteria
In this work we used a simple workflow for the comparative study of metabolism through the alignment of linear sequences of ESS. The metabolic maps stored in KEGG were transformed into linear ESS by using an exhaustive and well-defined graph search algorithm. Then, the ESS were compared to identify the commonalities and differences between them. This approach allows the identification of similarities at the Enzymatic Step Sequences (ESS) level in a set of metabolic pathways. In this regard, the use of the functional information of the enzyme activity rather than the (protein and DNA) sequence information suggest that metabolism comprises a complex and dynamic network that may have different proteins to achieve the same or similar function.
Diverse methods for the alignment of biological networks have been suggested, such as protein-protein interaction networks (for some examples see references [31–33]) and metabolic networks (for some examples see references [34–38]), mainly based on protein homology and/or network topology. However they consider a small number of organisms or general metabolic maps of KEGG database. Also, many of these methods are in general difficult to compare with each other, as has been recently shown by Clack et al. .
In this work we used the alignment of linear enzymatic step sequences, similar to the previously described approaches [20, 40], where a general strategy for the systematic analysis of the metabolism in a multigenome scale was additionally implemented. The linear enzymatic alignment approach described here allows gaps using the NW algorithm, uses a random data comparison, and allows the identification of distant similarities like those observed between metabolic maps. To our knowledge, this is first time that these methods are used to compare systematically the metabolism of a well-studied and metabolic diverse clade. Therefore, our approach is able to capture directly the information contained in the individual metabolic networks of each organism.
Based on these comparisons, we detected a core of metabolic pathways associated with central carbohydrate metabolism, lipid, cell wall, and cofactors, and biosynthetic pathways. In contrast, variable maps are those associated with degradation pathways, except the glucose-related pathways and the TCA cycle. In addition, amino acid metabolism is an example of a pathway with multiple routes to yield similar compounds from different routes, characterized for alternative pathways.
In addition, two scenarios can be suggested to exemplify the growth of the metabolism. The first one, associated to the glycolysis, where a significant proportion of functional similarities from this pathway were observed in other metabolic pathways; suggesting the utilization of similar substrates/products processed by similar reactions in different metabolic maps. The second scenario is associated to the high number of hits for the IMP pathway associated to alignments within the same map, suggesting that this pathway possibly has increased its size by duplication and recruitment of its own enzymes and arising the possibility of major biochemical coupling restrictions for the recruitment of the enzymes in the IMP pathway. Therefore, the different patterns of ESS similarities of two ancient pathways suggest that the recruitment of catalytic activities in the metabolism is restricted by the metabolic context, being not a random phenomenon. Albeit our data suggest functional and, probably, evolutionary conservation of diverse catalytic steps, additional information must be considered to have a better approximation of metabolism evolution, such as gene transfer and gene loss, among other processes. For this reason, we do not exclude the possibility of diverse genetic phenomena, such as the continuum interchange of genetic material that diminishing the border between bacterial species, as it has been recently described in E. coli bacterial strains, where a small proportion of universal protein families  and a large proportion of specific families  have been found. In this regard, the functional conservation of metabolic steps was evaluated in a representative group of species selected with a genome similarity score of 0.7, as described by , capturing the general diversity of the Gammaproteobacteria metabolism.
Therefore, the method described here is able to identify alternative enzymes involved in similar metabolic processes, and although the conclusions can be restricted to the metabolism covered by Gammaproteobacteria, the method can be extended to any organism or clade for which there is metabolic information. Finally, we understand that the approach described here does not consider the effect of promiscuous enzymes, defined as those enzymes with more than two different E.C. numbers. However, previous analysis have described that around 10 % of the total enzymatic repertoire in bacterial and archaeal organisms corresponds to promiscuous enzymes [28, 44], suggesting that our results and conclusions are enough robust and can be little influenced by the multifunctional enzymes.
Selection of proteobacterial species
In this study we included the small-molecule metabolism of 40 different Gammaproteobacteria species. These organisms were selected from the 275 Gammaproteobacteria genomes included in the KEGG database as of June 2011 . We chose non redundant genomes using the criteria described in reference , with a genome similarity score of 0.7, resulting in a set of 40 non redundant Gammaproteobacteria species. These organisms are representative of the division as it is shown in Additional file 2: Figure S4. Additional file 4: Table S2 contains the list of the organisms included in the analysis.
Construction of ESS
In a posterior stage, ESS were organized in a relational database. In this database, each EC number contained in a sequence was related to its corresponding protein(s), species, metabolic map. This database has a high degree of redundancy, because an ESS may be the same in different species. Thus, a nrESS dataset was constructed by filtering identical ESS and leaving only one representative. Each ESS in the nrESS dataset is linked to the original ESS. All analyses were conducted using the nrESS dataset and referring to the original data when necessary. The ESS and nrESS data are provided as supplementary material (Additional file 5: File S1).
Comparison of nrESS by pairwise alignments
In order to identify the similarity of the nrESS, we implemented a pairwise alignment algorithm based on the Dynamic Programing Needleman and Wunsch (NW) algorithm as described in reference . This algorithm works in a similar way as the classic tools to align nucleotide or amino acid sequences (Additional file 6: Text S1). We used an EC number weight matrix derived from an entropy-based evaluation function that evaluated the similarities between EC numbers. The weight matrix describes the similarity between the 136 different three levels EC numbers. The number 9.9.9 was used to describe an enzyme with no EC assigned and that was similar only to itself. The similarity between two EC numbers ranged from 0 to 1. Values close to 0 indicate similar EC numbers, and values close to 1 indicate different EC numbers. This matrix takes into account the hierarchy of the EC numbers, giving a value of 1 to all the EC pairs that are different in the first level of classification regardless of whether the second or third numbers are identical. Therefore, the NW algorithm uses the matrix to construct an alignment that minimizes the global score. Finally, the alignment is evaluated by using the normalized entropy-based function. The score obtained with such an evaluation function also ranges from 0 to 1, where 0 indicates similar nrESS and 1 dissimilar nrESS. To analyze in more detail the similarities of the low part of the glycolysis and the IMP pathways, their ESS were compared against the nrESS. Examples of nrESS alignments are shown in Additional file 2: Figure S5.
Statistically significant ESS alignments
To determine the statistical significance of the nrESS alignments, we compared the alignment scores of the real database against the scores from10 different random databases. These random sequences were constructed by shuffling the EC number content of the entire database, maintaining the nrESS length and EC composition of the original sequences. Each random database was submitted to the same all-versus-all alignment approach used for the real data, and the distribution of alignment scores considered the mean ± SD. The threshold considered statistically significant corresponded to a score of ≤0.3, i.e., that point with higher dispersion of the real data relative to the mean random databases scores and where the loss of nrESS due to extreme dissimilarity was less than 1 %, i.e. this threshold includes the 99 % of the nrESS.
Functional conservation of enzymatic steps in metabolic maps
We used the information provided by the nrESS pairwise alignments to identify the functionally conserved enzymatic steps in Gammaproteobacteria for each metabolic map. Two nrESS were considered conserved if their alignment scores were below or equal to 0.3 and if, in conjunction, they were present in more than 75 % of the organisms. This criterion was employed because we assumed that a pair of conserved ESS would be shared by at least all of the species with genomes greater than 2000 ORFs, i.e., 30 of the 40 Gammaproteobacteria organisms. From the ESS that fulfilled this criterion, we selected those that corresponded to the same metabolic map. This subset of sequences was named the Metabolic Map Functional Conserved Dataset (MMFCD). To identify the conserved ESS, the aligned identical EC numbers from each alignment were mapped in the corresponding position in KEGG metabolic maps.
Clustering of similar metabolic maps
In order to identify the functional similarities among metabolic maps, we selected a subset of nrESS pairwise alignments with score values of ≤0.3. These alignments were used to construct a similarity matrix where each cell corresponded to the count of the alignments shared by each pair of metabolic maps. The rows representing the metabolic maps were normalized by the total alignments in each row. The matrix was used as input to a hierarchical clustering analysis with the program MeV4 (http://www.tm4.org/mev.html). The similarity between maps was calculated with the Spearman’s rank correlation, and elements were clustered with the average method. A cutoff of 0.46 of the total length of the dendogram was used to classify the metabolic maps into groups and is displayed with the E.T.E. 2 Python toolkit .
We thank to Georgina Hernandez-Montes and Anny Rodriguez for their critical reading of the manuscript and to Rosa María Gutierrez-Rios for her constructive opinions and comentaries. CAP-H acknowledges the support by a PhD fellowship (209805) from CONACyT-México. We also thank the anonymous reviewers for their comments, which help us to improve the manuscript. Support from DGAPA-UNAM (IN-209511), PAPIIT-UNAM (IN-107214) and CONACYT (155116) is gratefully acknowledged.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Caetano-Anollés G, Kim HS, Mittenthal JE. The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc Natl Acad Sci U S A. 2007;104:9358–63.PubMed CentralView ArticlePubMedGoogle Scholar
- Braakman R, Smith E. The emergence and early evolution of biological carbon-fixation. PLoS Comput Biol. 2012;8:e1002455.PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38(Database issue):D355–60.PubMed CentralView ArticlePubMedGoogle Scholar
- Caspi R, Foerster H, Fulcher C a, Hopkinson R, Ingraham J, Kaipa P, et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 2006;34(Database issue):D511–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási a L. The large-scale organization of metabolic networks. Nature. 2000;407:651–4.View ArticlePubMedGoogle Scholar
- Ravasz E, Somera a L, Mongru D a, Oltvai ZN, Barabási a L. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–5.View ArticlePubMedGoogle Scholar
- Arita M. The metabolic world of Escherichia coli is not small. Proc Natl Acad Sci U S A. 2004;101:1543–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Von Mering C, Zdobnov EM, Tsoka S, Ciccarelli FD, Pereira-Leal JB, Ouzounis CA, et al. Genome evolution reveals biochemical networks and functional modules. Proc Natl Acad Sci U S A. 2003;100:15428–33.View ArticleGoogle Scholar
- Spirin V, Gelfand MS, Mironov A a, Mirny L a. A metabolic network in the evolutionary context: multiscale structure and modularity. Proc Natl Acad Sci U S A. 2006;103:8774–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Hernández-Montes G, Díaz-Mejía JJ, Pérez-Rueda E, Segovia L. The hidden universal distribution of amino acid biosynthetic networks: a genomic perspective on their origins and evolution. Genome Biol. 2008;9:R95.PubMed CentralView ArticlePubMedGoogle Scholar
- Díaz-Mejía JJ, Pérez-Rueda E, Segovia L. A network perspective on the evolution of metabolism by gene duplication. Genome Biol. 2007;8:R26.PubMed CentralView ArticlePubMedGoogle Scholar
- Light S, Kraulis P. Network analysis of metabolic enzyme evolution in Escherichia coli. BMC Bioinformatics. 2004;5:15.PubMed CentralView ArticlePubMedGoogle Scholar
- Rison SCG, Thornton JM. Pathway evolution, structurally speaking. Curr Opin Struct Biol. 2002;12:374–82.View ArticlePubMedGoogle Scholar
- Williams KP, Gillespie JJ, Sobral BWS, Nordberg EK, Snyder EE, Shallom JM, et al. Phylogeny of gammaproteobacteria. J Bacteriol. 2010;192:2305–14.PubMed CentralView ArticlePubMedGoogle Scholar
- Newton ILG, Woyke T, Auchtung TA, Dilly GF, Dutton RJ, Fisher MC, et al. The Calyptogena magnifica chemoautotrophic symbiont genome. Science. 2007;315:998–1000.View ArticlePubMedGoogle Scholar
- Hirschler-Rea A. Isolation and characterization of spirilloid purple phototrophic bacteria forming red layers in microbial mats of Mediterranean salterns: description of Halorhodospira neutriphila sp. nov. and emendation of the genus Halorhodospira. Int J Syst Evol Microbiol. 2003;53:153–63.View ArticlePubMedGoogle Scholar
- Hoeft SE, Blum JS, Stolz JF, Tabita FR, Witte B, King GM, et al. Alkalilimnicola ehrlichii sp. nov., a novel, arsenite-oxidizing haloalkaliphilic gammaproteobacterium capable of chemoautotrophic or heterotrophic growth with nitrate or oxygen as the electron acceptor. Int J Syst Evol Microbiol. 2007;57(Pt 3):504–12.View ArticlePubMedGoogle Scholar
- Hara A, Syutsubo K, Harayama S. Alcanivorax which prevails in oil-contaminated seawater exhibits broad substrate specificity for alkane degradation. Environ Microbiol. 2003;5:746–53.View ArticlePubMedGoogle Scholar
- Chen M, Hofestaedt R. An algorithm for linear metabolic pathway alignment. In Silico Biol. 2005;5:111–28.PubMedGoogle Scholar
- Chen M, Hofestädt R. PathAligner: metabolic pathway retrieval and alignment. Appl Bioinformatics. 2004;3:241–52.View ArticlePubMedGoogle Scholar
- Chou C-H, Chang W-C, Chiu C-M, Huang C-C, Huang H-D. FMM: a web server for metabolic pathway reconstruction and comparative analysis. Nucleic Acids Res. 2009;37(Web Server issue):W129–34.PubMed CentralView ArticlePubMedGoogle Scholar
- Klein CC, Cottret L, Kielbassa J, Charles H, Gautier C, Ribeiro de Vasconcelos AT, et al. Exploration of the core metabolism of symbiotic bacteria. BMC Genomics. 2012;13:438.PubMed CentralView ArticlePubMedGoogle Scholar
- Cunchillos C, Lecointre G. Integrating the universal metabolism into a phylogenetic analysis. Mol Biol Evol. 2005;22:1–11.View ArticlePubMedGoogle Scholar
- Dandekar T, Schuster S, Snel B, Huynen M, Bork P. Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem J. 1999;343(Pt 1):115.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Morar M, Ealick SE. Structural biology of the purine biosynthetic pathway. Cell Mol Life Sci. 2008;65:3699–724.PubMed CentralView ArticlePubMedGoogle Scholar
- Armenta-Medina D, Segovia L, Perez-Rueda E. Comparative genomics of nucleotide metabolism: a tour to the past of the three cellular domains of life. BMC Genomics. 2014;15:800.PubMed CentralView ArticlePubMedGoogle Scholar
- Caetano-Anollés G, Yafremava LS, Gee H, Caetano-Anollés D, Kim HS, Mittenthal JE. The origin and evolution of modern metabolism. Int J Biochem Cell Biol. 2009;41:285–97.View ArticlePubMedGoogle Scholar
- Martínez-Núñez MA, Poot-Hernandez AC, Rodríguez-Vázquez K, Perez-Rueda E. Increments and duplication events of enzymes and transcription factors influence metabolic and regulatory diversity in prokaryotes. PLoS One. 2013;8:e69707.PubMed CentralView ArticlePubMedGoogle Scholar
- Braakman R, Smith E. The compositional and evolutionary logic of metabolism. Phys Biol. 2013;10:011001.View ArticlePubMedGoogle Scholar
- Becerra A, Lazcano A. The role of gene duplication in the evolution of purine nucleotide salvage pathways. Orig Life Evol Biosph. 1998;28:539–53.View ArticlePubMedGoogle Scholar
- Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A. 2003;100:11394–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T. PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 2004;32(Web Server issue):W83–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, et al. Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci U S A. 2005;102:1974–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Ogata H, Fujibuchi W, Goto S, Kanehisa M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 2000;28:4021–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Pinter RY, Rokhlenko O, Yeger-Lotem E, Ziv-Ukelson M. Alignment of metabolic pathways. Bioinformatics. 2005;21:3401–8.View ArticlePubMedGoogle Scholar
- Alberich R, Llabrés M, Sánchez D, Simeoni M, Tuduri M. MP-Align: alignment of metabolic pathways. BMC Syst Biol. 2014;8:58.PubMed CentralView ArticlePubMedGoogle Scholar
- Ay F, Kahveci T, DE Crécy-Lagard V. A fast and accurate algorithm for comparative analysis of metabolic pathways. J Bioinform Comput Biol. 2009;7:389–428.View ArticlePubMedGoogle Scholar
- Wernicke S, Rasche F. Simple and fast alignment of metabolic pathways by exploiting local diversity. Bioinformatics. 2007;23:1978–85.View ArticlePubMedGoogle Scholar
- Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics. 2014;30:2351–9.View ArticlePubMedGoogle Scholar
- Tohsato Y, Matsuda H, Hashimoto A. A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy. Proc Int Conf Intell Syst Mol Biol. 2000;8:376–83.PubMedGoogle Scholar
- Ku C, Nelson-Sathi S, Roettger M, Garg S, Hazkani-Covo E, Martin WF. Endosymbiotic gene transfer from prokaryotic pangenomes: Inherited chimerism in eukaryotes. Proc Natl Acad Sci. 2015;112:10139-46.Google Scholar
- Lukjancenko O, Wassenaar TM, Ussery DW. Comparison of 61 sequenced escherichia coli genomes. Microb Ecol. 2010;60:708–20.PubMed CentralView ArticlePubMedGoogle Scholar
- Moreno-Hagelsieb G, Janga SC. Operons and the effect of genome redundancy in deciphering functional relationships using phylogenetic profiles. Proteins Struct Funct Bioinforma. 2008;70:344–52.View ArticleGoogle Scholar
- Martínez-Núñez MA, Rodríguez-Vázquez K, Pérez-Rueda E. The lifestyle of prokaryotic organisms influences the repertoire of promiscuous enzymes. Proteins Struct Funct Bioinforma. 2015:n/a–n/a.Google Scholar
- Hagberg A, Swart P, Chult D. Exploring network structure, dynamics, and function using NetworkX. In: Varoquaux G, Vaught T, Millman J, editors. Proceedings of the 7th Python in Science Conference. Pasadena, CA USA; 2008. p. 11-15.Google Scholar
- Kinser J. Python for Bioinformatics. USA: Jones & Bartlett Publishers; 2008.Google Scholar
- Huerta-Cepas J, Dopazo J, Gabaldón T. ETE: a python Environment for Tree Exploration. BMC Bioinformatics. 2010;11:24.PubMed CentralView ArticlePubMedGoogle Scholar