Preferential attachment in the evolution of metabolic networks
© Light et al; licensee BioMed Central Ltd. 2005
Received: 07 July 2005
Accepted: 10 November 2005
Published: 10 November 2005
Many biological networks show some characteristics of scale-free networks. Scale-free networks can evolve through preferential attachment where new nodes are preferentially attached to well connected nodes. In networks which have evolved through preferential attachment older nodes should have a higher average connectivity than younger nodes. Here we have investigated preferential attachment in the context of metabolic networks.
The connectivities of the enzymes in the metabolic network of Escherichia coli were determined and representatives for these enzymes were located in 11 eukaryotes, 17 archaea and 46 bacteria. E. coli enzymes which have representatives in eukaryotes have a higher average connectivity while enzymes which are represented only in the prokaryotes, and especially the enzymes only present in βγ-proteobacteria, have lower connectivities than expected by chance. Interestingly, the enzymes which have been proposed as candidates for horizontal gene transfer have a higher average connectivity than the other enzymes. Furthermore, It was found that new edges are added to the highly connected enzymes at a faster rate than to enzymes with low connectivities which is consistent with preferential attachment.
Here, we have found indications of preferential attachment in the metabolic network of E. coli. A possible biological explanation for preferential attachment growth of metabolic networks is that novel enzymes created through gene duplication maintain some of the compounds involved in the original reaction, throughout its future evolution. In addition, we found that enzymes which are candidates for horizontal gene transfer have a higher average connectivity than other enzymes. This indicates that while new enzymes are attached preferentially to highly connected enzymes, these highly connected enzymes have sometimes been introduced into the E. coli genome by horizontal gene transfer. We speculate that E. coli has adjusted its metabolic network to a changing environment by replacing the relatively central enzymes for better adapted orthologs from other prokaryotic species.
Recent studies indicate that metabolic networks evolve at the local level through patchwork evolution and retrograde evolution [1–3]. Patchwork evolution, which is likely to be more important, occurs when an enzyme evolves from a broad spectrum enzyme to an enzyme with a highly specialized activity . Retrograde evolution is a process where the depletion of a substrate from the environment leads to the evolution of an enzyme which can accept a new substrate and catalyze the production of the depleted substance .
Networks with scale-free properties have been shown to evolve when two simple rules are applied: 1) The network grows by the addition of new nodes. 2) Preferential attachment: New nodes are more likely to become connected to well connected nodes in the network . While preferential attachment is often at the root of scale-freeness, a network with an power-law degree distribution might be produced through other mechanisms. Preferential attachment in the context of genetic networks may take place partly through gene duplication [13, 14]. In agreement with preferential attachment Eisenberg and Levanon  showed that the proteins which have homologs in all 3 domains of life, which are likely to be of ancient origin, have higher connectivities in the protein-protein interaction network of S. cerevisiae. In contrast, Kunin et al  recently showed that the most highly connected proteins date to after the evolution of primordial eukaryotes but before the radiation of eukaryotes to Plants, Metazoa and Protista. Here, we investigate the evidence for preferential attachment and the role of horizontal gene transfer in the metabolic network evolution of E. coli.
Connectivity and phylogenetic group
If preferential attachment is an important mechanism in the evolution of metabolic networks older enzymes should have a higher average connectivity (k) than younger enzymes. In order to investigate this prediction we extracted the enzymes and the reactions in E. coli from the EcoCyc  and KEGG  databases. The network representation of the metabolic network of E. coli was constructed using EcoCyc, see methods. The nodes in our graph represent the enzymes (complete EC numbers) catalyzing the reactions and the edges represent one or more compounds involved in the reactions. There is an edge from enzyme El to enzyme E2 if El catalyzes a reaction where compound A is produced and then E2 uses A as substrate. There can be at most one edge in each direction between the nodes in the graph. The connectivity of a node is defined as the number of edges connecting the node to other nodes in the network.
Description of the phylogenetic groups 1–5 and the number of E. coli enzymes in each group. For instance, an E. coli enzyme which has at least one representative in one or more eukaryotes but not in archaea is a group 2 enzyme. The fourth column contains the number of enzymes which are proposed examples of horizontal gene transfer. The phylogenetic classification is based on the phylogenetic tree in Figure 2.
NO. HGT ENZYMES
E. coli, eukaryotes and archaea
E. coli, eukaryotes but not archaea
E. coli and archaea
E. coli and bacteria other than βγ-proteobacteria
In conclusion we found that E. coli enzymes which have representatives in all domains of life, and in eukaryotes but not archaea, have a higher average connectivity in the metabolic network of E. coli than the presumably younger enzymes which only have representatives in γβ-proteobacteria. This finding lends support for one of the predictions of the mechanism of preferential attachment.
Connectivity and horizontal gene transfer
It has been suggested that the scale-free properties of biological networks may arise, at least partially, as a result of preferential attachment of new nodes to highly connected nodes through gene duplication [13, 14]. Preferential attachment by gene duplication may take place according to the following scenario; Initially, the duplicated gene has exactly the same function and position in the network as the template gene. Since many genes are connected to the hub of the network, the duplicated gene is by chance likely to be connected to the hub of the network. Subsequently, the duplicate gene may evolve towards another functionality but it could retain some of its original function. For instance, a multi-domain protein could loose one of its domains through deletion but retain the other domains and possibly part of its original functionality. In such a scenario the older proteins are more likely to be highly connected than the younger proteins.
An alternative scenario is preferential attachment by horizontal gene transfer (HGT); A new, or alternative, enzyme is introduced through HGT. The new enzyme is more likely to be retained in the metabolic repertory if it confers a new or improved function at a central, rather than peripheral, position of the metabolism – such as if it is connected to a highly connected enzyme or if it is itself highly connected. This is a consideration which may be particularly important in the metabolism of bacteria since some bacteria are prone to delete dispensable genes from their genomes . Arguably, connectivity is a measure which indicates the centrality and importance of an enzyme in which case horizontally transferred genes should frequently be highly connected or be connected to highly connected enzymes. According to this scenario, horizontally transferred enzymes would be preferentially attached to highly connected enzymes and/or be preferentially replacing highly connected enzymes.
Most horizontally transferred genes go through the process of amelioration, the adjustment of the transferred sequence to the base composition and codon usage of the resident genome. Therefore, most detectable HGTs have taken place relatively recently in the history of E. coli . Consequently, we can conclude that while it is true that the highly connected enzymes in the metabolic network of E. coli are often old in the sense that they are enzymes with representatives in eukaryotes, and which therefore probably originated in the last common ancestor of eukaryotes and bacteria, they are also overrepresented among the enzymes which have been introduced recently into the E. coli genome through HGT. These findings suggest that horizontally transferred genes are introduced and retained preferentially at central positions of the metabolism of E. coli.
The connectivity of essential enzymes and isozymes
Jeong et al  showed that the highly connected proteins in the protein-protein interaction network of S. cerevisiae are more likely to be indispensable to the organism than less well connected proteins. We wished to study if there was a similar correlation in the metabolic network of E. coli. We used the essentiality classification from the study of Gerdes et al  of E. coli under aerobic growth in nutrition rich medium. We calculated the mean connectivity for the essential and the dispensable enzymes respectively and found that the essential enzymes do not show a higher connectivity than expected ( , for networks where 15 compounds have been removed). It is possible that the relatively small size of the metabolic networks compared to the protein-protein interaction network is the reason a similar correlation could not be found in the metabolic network of E. coli.
The hubs are the most important nodes for the integrity of the network. If a fraction of the hubs are removed the network is likely to become fragmented into smaller components. Since these enzymes are very important for the robustness of the network it might be suspected that the EC numbers with the highest connectivities could have more than one representatives in the genome, i.e. that there are two or more isozymes representing these highly connected nodes. Isozymes in multicellular organisms are often active in different tissues while isozymes in single cellular organisms frequently have different substrate specificities or are activated in different environments (such as aerobic or anaerobic environments). We here designated a pair of enzymes as isozymes if they catalyze the same reaction but are coded for by different genes, which are not part of the same enzyme complex.
We used Expasy , SGD  and EcoCyc  to determine which enzymes in the metabolic networks of E. coli and S. cerevisiae occur as isozymes. We found 77 EC numbers that were associated with isozymes in E. coli and 97 EC numbers that were associated with isozymes in S. cerevisiae, see additional files. The mean connectivities for the isozymes and the non-isozymes were determined and the result was compared to randomized networks. We found that the isozymes do not have a noticeably higher mean connectivity than non-isozymes ( , for networks where 15 compounds have been removed). The result may indicate that isozymes are not necessarily crucial for the integrity of the metabolic network. In accordance with our result it has recently been shown that the isozymes of S. cerevisiae are not overrepresented among essential enzymes .
Connectivity and function
Kunin et al showed that the functional classes in the protein-protein network of S. cere-visiae display distinctly different connectivity levels . In a similar manner we investigated whether enzymes belonging to different functional groups are characterized by distinct connectivities.
The number of E. coli enzymes belonging to 7 functional EcoCyc classes.
Amino acid metabolism
Contrastingly, enzymes involved in lipid and sugar metabolism are on average half as well connected as the enzymes involved in nucleotide, amino acid and energy metabolism. The group of enzymes involved in lipid metabolism is less than half the size of the second smallest functional group and due to its small size the Z-score for this functional group is less reliable than for the other functional groups. The sugar metabolism enzymes are clearly over represented among the enzymes that occur in bacteria only, see Figure 5a, which was anticipated since there are many bacterial specific enzymes involved in sugar transportation .
Network growth through preferential attachment
We have investigated two predictions generated from the mechanism of preferential attachment in the evolution of the metabolic network of E. coli. First, if preferential attachment is of any significance in the evolution of the metabolic network of E. coli, the older enzymes in the network should have a higher average connectivity. We have found that E. coli enzymes which are represented in three domains of life, and in eukaryotes but not archaea, have a higher average connectivity than expected by chance. Second, another prediction generated from the hypothesis of network evolution through preferential attachment is that highly connected nodes should gain new edges at a faster rate than nodes with low connectivities. To investigate this prediction we extracted the enzymes with representatives in 3 domains of life and determined the network representing LUCA's metabolic network. In accordance with the mechanism of preferential attachment we found a positive linear correlation between connectivity in the ancient network and number of connections gained through evolution.
Further, we found that the E. coli enzymes which are believed to have undergone horizontal gene transfer (HGT enzymes) have a higher average connectivity than other enzymes (non-HGT enzymes). This is especially true for the HGT enzymes with representatives in eukaryotes, which is the most highly connected group of E. coli enzymes. This result suggests that the highly connected enzymes are often old in the sense that they are likely to have originated in LUCA and been part of the bacterial metabolic repertory for a long time. However, these ancient enzymes are sometimes relatively recent additions to the metabolic network of E. coli. It is possible that bacteria such as E. coli are adjusting their metabolic networks to a changing environment by replacing the relatively central enzymes, with high connectivities, for better adapted orthologs from other prokaryotic species.
It is well known that many novel functions in organisms are obtained through gene duplication, followed by subfunctionalization and neofunctionalization. Therefore, a possible biological explanation for the preferential attachment growth of metabolic networks, which we have now found some support for, could be that novel enzymes, which are created through gene duplication, maintain some compounds involved in the reaction catalyzed by the original enzyme throughout its future evolution. As a supplementary explanation we propose that horizontally transferred enzymes are introduced preferentially at central positions of the metabolic network of E. coli.
Databases and representation framework
We built a representation of the metabolic network of E. coli by using EcoCyc  (downloaded in March 2004) to gather the EC assigned enzymes and to determine the connectivities of the enzymes. An alternative network based on KEGG was also produced and the study was performed which generated similar results, results not shown. The connectivity of an enzyme is defined as the number of edges connecting the enzyme to other enzymes. Only one edge in each direction between any two enzymes was allowed. Furthermore we used KEGG orthology (KO) assignments  (downloaded in May 2004) to determine in which organisms the different EC numbers are represented.
The nodes in our graph represent the enzymes (complete EC numbers) catalyzing the reactions and the edges represent one or more compounds involved in the reactions. There is an edge from enzyme El to enzyme E2 if El catalyzes a reaction where compound A is produced and then E2 uses A as substrate. The network representation used in our study has been used before for metabolic network analysis where it has been referred to as 'protein-centric' graphs  or 'reaction graphs' , see Figure 1b. Our representation of the full metabolic network of E. coli consists of 486 nodes and 99 917 edges.
One problematic aspect with metabolic network analysis is how promiscuous compounds, such as H2O, should be handled. One may argue that the network would become more biochemically meaningful if these compounds are removed because the promiscuous compounds are usually not limiting factors of reactions . In this study, we have chosen to apply a simple network-based criterion. We count the number of times a compound occurs as part of an edge in the network. The most common compounds were then considered as promiscuous compounds [2, 3]. We performed our studies on different networks where up to 40 compounds have been removed.
For the statistical analysis 100 000 randomized networks were generated through shuffling the group numbers while preserving the network topology. Subsequently, Z-scores were calculated. The Z-score expresses how far the average connectivity of the enzymes belonging to a certain phylogenetic group differs from the average connectivity of randomly sampled enzymes, measured in units of the random sampling distribution's standard deviation. The larger the Z-score, the less likely that the difference between phylogenetic group's average and the random group's average is by chance.
This work was supported by the Foundation for Strategic Research (SSF).
- Rison SC, Teichmann SA, Thornton JM: Homology, pathway distance and chromosomal localisation of the small molecule metabolism enzymes in Escherichia coli. J Mol Biol. 2002, 318: 911-932. 10.1016/S0022-2836(02)00140-7.PubMedView ArticleGoogle Scholar
- Alves R, Chaleil RA, Sternberg MJ: Evolution of enzymes in metabolism: a network perspective. J Mol Biol. 2002, 320: 751-770. 10.1016/S0022-2836(02)00546-6.PubMedView ArticleGoogle Scholar
- Light S, Kraulis P: Network analysis of metabolic enzyme evolution in Escherichia coli. BMC Bioinformatics. 2004, 5:Google Scholar
- Jensen RA: Enzyme recruitment in evolution of new function. Annu Rev Microbiol. 1976, 30: 409-425. 10.1146/annurev.mi.30.100176.002205.PubMedView ArticleGoogle Scholar
- Horowitz NH: On the evolution of biochemical syntheses. Proc Natl Acad Sci USA. 1945, 31: 153-157.PubMedPubMed CentralView ArticleGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature. 2000, 407: 651-654. 10.1038/35036627.PubMedView ArticleGoogle Scholar
- Wagner A, Fell DA: The small world inside large metabolic networks. Proc R Soc Lond B Biol Sci. 2001, 268: 1803-1810. 10.1098/rspb.2001.1711.View ArticleGoogle Scholar
- Wuchty S: Scale-free behavior in protein domain networks. Mol Biol Evol. 2001, 18: 1694-1702.PubMedView ArticleGoogle Scholar
- Arita M: The metabolic world of Escherichia coli is not small. Proc Natl Acad Sci U S A. 2004, 101: 1543-1547. 10.1073/pnas.0306458101.PubMedPubMed CentralView ArticleGoogle Scholar
- Albert R, Jeong H, Barabasi A-L: Error and attack tolerance of complex networks. Nature. 2000, 406: 378-382. 10.1038/35019019.PubMedView ArticleGoogle Scholar
- Gleiss PM, Stadler PF, Wagner A, Fell DA: Relevant cycles in chemical reaction networks. Adv Complex Syst. 2001, 4: 207-226. 10.1142/S0219525901000140.View ArticleGoogle Scholar
- Barabasi A-L, Albert R: Emergence of scaling in metabolic networks. Science. 1999, 286: 509-512. 10.1126/science.286.5439.509.PubMedView ArticleGoogle Scholar
- Qian J, Luscombe NM, Gerstein M: Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J Mol Biol. 2001, 313: 673-681. 10.1006/jmbi.2001.5079.PubMedView ArticleGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: Understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272.PubMedView ArticleGoogle Scholar
- Eisenberg E, Levanon EY: Preferential attachment in Protein Network Evolution. Phys Rev Lett. 2003, 91: 138701-10.1103/PhysRevLett.91.138701.PubMedView ArticleGoogle Scholar
- Kunin V, Pereira-Leal JB, Ouzounis CA: Functional Evolution of the Yeast Protein Interaction Network. Mol Biol Evol. 2004, 21: 1171-1176. 10.1093/molbev/msh085.PubMedView ArticleGoogle Scholar
- Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM, Pellegrini-Toole A, Bonavides C, Gama-Castro S: The EcoCyc Database. Nucleic Acids Res. 2002, 30: 56-58. 10.1093/nar/30.1.56.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucleic Acids Res. 2002, 30: 42-46. 10.1093/nar/30.1.42.PubMedPubMed CentralView ArticleGoogle Scholar
- Boucher Y, Douady CJ, Papke RT, Walsh DA, Boudreau ME, Nesbo CL, Case RJ, Doolittle WF: Lateral gene transfer and the origins of prokaryotic groups. Annu Rev Genet. 2003, 37: 283-328. 10.1146/annurev.genet.37.050503.084247.PubMedView ArticleGoogle Scholar
- Kunin V, Ouzounis CA: The balance of driving forces during genome evolution in prokaryotes. Genome Res. 2003, 13: 1589-1594. 10.1101/gr.1092603.PubMedPubMed CentralView ArticleGoogle Scholar
- Andersson JO, Andersson SG: Insights into the evolutionary process of genome degradation. Curr Opin Genet Dev. 1999, 9: 664-671. 10.1016/S0959-437X(99)00024-6.PubMedView ArticleGoogle Scholar
- Kurland CG, Canback B, Berg OG: Horizontal gene transfer: a critical view. Proc Natl Acad Sci U S A. 2003, 100: 9658-9662. 10.1073/pnas.1632870100.PubMedPubMed CentralView ArticleGoogle Scholar
- Snel B, Bork P, Huynen MA: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 2002, 12: 17-25. 10.1101/gr.176501.PubMedView ArticleGoogle Scholar
- Lawrence JG, Ochman H: Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol. 1997, 44: 383-397.PubMedView ArticleGoogle Scholar
- Lawrence JG, Ochman H: Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci U S A. 1998, 95: 9413-9417. 10.1073/pnas.95.16.9413.PubMedPubMed CentralView ArticleGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138.PubMedView ArticleGoogle Scholar
- Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson T, Gelfand MS, Bhattacharya A, Kapatral V, D'Souza M, Baev MV, Grechkin Y, Mseeh F, Fonstein MY, Overbeek R, Barabasi A-L, Oltvai ZN, Osterman AL: Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol. 2003, 185: 5673-5684. 10.1128/JB.185.19.5673-5684.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A: ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31: 3784-3788. 10.1093/nar/gkg563.PubMedPubMed CentralView ArticleGoogle Scholar
- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, Weng S, Botstein D: SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998, 26: 73-79. 10.1093/nar/26.1.73.PubMedPubMed CentralView ArticleGoogle Scholar
- Papp B, Pal C, Hurst LD: Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature. 2004, 429: 661-664. 10.1038/nature02636.PubMedView ArticleGoogle Scholar
- Huynen MA, Dandekar T, Bork P: Variation and evolution of the citric-acid cycle: a genomic perspective. Trends Microbiol. 1999, 7: 281-291. 10.1016/S0966-842X(99)01539-5.PubMedView ArticleGoogle Scholar
- Paulsen IT, Sliwinski MK, Saier MHJ: Microbial genome analyses: global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities. J Mol Biol. 1998, 277: 573-592. 10.1006/jmbi.1998.1609.PubMedView ArticleGoogle Scholar
- Gerrard JA, Sparrow AD, Wells JA: Metabolic databases – what next?. Trends Biochem Sci. 2001, 26: 137-140. 10.1016/S0968-0004(00)01759-X.PubMedView ArticleGoogle Scholar
- Ma H, Zeng AP: Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics. 2003, 19: 270-277. 10.1093/bioinformatics/19.2.270.PubMedView ArticleGoogle Scholar
- Gough J: Convergent evolution of domain architectures (is rare). Bioinformatics. 2005, 21: 1464-1471. 10.1093/bioinformatics/bti204.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.