- Research article
- Open Access
Functional organization and its implication in evolution of the human protein-protein interaction network
© Zhao and Mooney; licensee BioMed Central Ltd. 2012
- Received: 14 October 2011
- Accepted: 24 April 2012
- Published: 24 April 2012
Based on the distinguishing properties of protein-protein interaction networks such as power-law degree distribution and modularity structure, several stochastic models for the evolution of these networks have been purposed, motivated by the idea that a validated model should reproduce similar topological properties of the empirical network. However, being able to capture topological properties does not necessarily mean it correctly reproduces how networks emerge and evolve. More importantly, there is already evidence suggesting functional organization and significance of these networks. The current stochastic models of evolution, however, grow the network without consideration for biological function and natural selection.
To test whether protein interaction networks are functionally organized and their impacts on the evolution of these networks, we analyzed their evolution at both the topological and functional level. We find that the human network is shown to be functionally organized, and its function evolves with the topological properties of the network. Our analysis suggests that function most likely affects local modularity of the network. Consistently, we further found that the topological unit is also the functional unit of the network.
We have demonstrated functional organization of a protein interaction network. Given our observations, we suggest that its significance should not be overlooked when studying network evolution.
- Degree Distribution
- Cluster Coefficient
- Temporal Group
- Network Distance
- Functional Evolution
Proteins physically interact with each other in physiological conditions. Individual protein interactions can be direct physical binding or membership within a multiprotein complex, and can be either permanent or transient . It is believed that the diversity of protein-protein interactions (PPI) contribute to the genetic complexity of organisms [2, 3]. Thanks to the development of high throughput technology, human PPI data has been greatly accumulated, which provides an opportunity to study that network systematically.
One important question to ask is, "How did the human PPI network emerge and evolve?" Given that the most significant property of the network is that the degree distribution follows a power law , several evolutionary models have been proposed to account for this attribute. These include the preferential attachment model, which asserts that a new protein is more likely to interact with well-connected nodes [5, 6], and the duplication-divergence model, which emulates gene duplication and the subsequent loss of inherited interactions [7–9]. Both models successfully reproduce the power law degree distribution. Researchers, however, found that the exponent of the degree distribution generated by the preferential attachment model is higher than that from the empirical network  and, more importantly, the preferential attachment model fails to reproduce the modularity structure that is observed in most biological networks . Alternatively, the proposed duplication-divergence model is more biologically motivated. With proper parameters, it can reproduce both the power law degree distribution and the modularity structure (through interaction rewiring and/or homomeric duplication, that is, duplication of self-interacting nodes); hence it receives extensive attention as a better candidate mechanism [12, 13]. Except studying pure topology, Kim  recently found that proteins of close age tend to interact with each other in yeast and proposed a new stochastic model which grows the network analogous to the process of growing protein crystals in solution. The authors claimed that the new model better explains many features of PPI networks. Although increasing features of empirical PPI networks had been captured by current models, all these stochastic models proposed do not require the intervention of natural selection to reproduce the intended topology, nor does it use biological function as a parameter.
On the other hand, people have realized that network structure is relevant for biological function [15, 16]. Many efforts have been made to find a relationship between network topology and functional and/or evolutionary properties. It has been reported that interacting proteins tend to be co-evolving , co-functional  and co-expressed [19, 20]. Highly interacting nodes in the network are generally more evolutionarily conserved  and tend to be essential and disease causing [4, 22]. Based on this information, the PPI network has been successfully used for predicting or prioritizing candidate genes of interest [23–26]. Given that, however, systematic functional analysis of PPI networks is still lacking. Using different datasets and techniques, Yook and Pandey both found correlation between the functional roles and topological structure, indicating that PPI networks are functionally organized [27, 28]. In a separate study, by comparing changes of interaction degree in functional classes and the time of origin of proteins, as well as functional heterogeneity at the time of origin, Kunin suggested that functional evolution might be the underlying reason for observed PPI network topological evolution . That study, however did not show in detail how function evolves, nor its relationship with the evolution of network topologies. In opposition to these findings, Wang et al., by breaking down a PPI network into structure modules, found that the network is not functionally organized at the modular level and suggested it evolves neutrally . Whether a PPI network is functionally organized and whether that organization's implication in the evolution of PPI networks is currently inconclusive.
Because functionality is an important aspect of molecular evolution, it is important to clearly address this question in order to have a better understanding of how the PPI network evolves. In this paper, we examine the evolution of a PPI network by dividing human proteins into temporal groups using known phylogenetic information. After doing this, we were able to track the evolutionary changes of the human PPI network at both the topological and functional level. We show the human PPI network functionally organized. In addition, we find that the topological and functional evolution of the human PPI network are not independent of each other. Function affects network topology during evolution, especially on local modularity. This is further supported by the finding that the topological unit is also the functional unit of the human PPI network. Based on our observations, we suggest that an extended model be developed that considers functional significance.
Topological evolution of the PPI network
Properties of the PPI network for each temporal group
Approximate group age (MYA)
Gene number in the genome
Gene number in the interaction network
Average interaction degree
Average clustering coefficient
Average network distances across temporal groups
In summary, tracking the topological evolution of a PPI network by dividing proteins in the network into six temporal groups showed that more ancient proteins were more highly connected to other proteins in the PPI network. We also showed the topological changes during evolution were not uniform; they accelerated during the stage of evolution from cold-blooded animals to warm-blooded animals.
Functional organization and evolution of the PPI network
It is obvious that ancient genes that are not lost are conserved. By calculating omega (Ka/Ks) between human and mouse orthologs (see Methods), genes from the older TGs do show lower omega values, indicating stronger selective conservation (Kruskal-Wallis test, H = 2682.06; df = 5; P < 0.0001, Table 1). Evolutionary conservation is often interpreted as having functional importance. Our observations, as described above, together with previous reports on the association between network properties and functional or evolutionary properties, make us wonder whether the network is functionally organized, and, furthermore, whether topological evolution of the network is associated with functional evolution.
Summary of function enrichment tests
Average functional distances across temporal groups
Changes in the correlation coefficients before and after functions are controlled for
Spearman correlation coefficient
Spearman correlation coefficient
Spearman correlation coefficient
Controlling Biological Process
t = 1.98 df = 29 p = 0.058
t = 2.95 df = 29 p = 0.006
t = -1.21 df = 29 p = 0.234
Controlling Molecular Function
t = 1.06 df = 26 p = 0.299
t = 2.29 df = 26 p = 0.031
t = -0.71 df = 26 p = 0.484
All together, we show that PPI networks are functionally organized and under progressive functional evolution. Function might also substantially contribute to local clustering and topological modularity of these networks.
The topological unit is also the functional unit
Based on the above findings, to further detect the association between functional and local topological organization of the network, we designed a new functional analysis, motivated by the concept of the clustering coefficient that measures a node's neighborhood density. We first defined a topological unit as a hub protein and all of its interacting partners in the network. We expect that members in the topological unit share a higher degree of functional similarity than do random nodes. Random networks were constructed with the same degree distribution but with randomly shuffled interaction partners. Taking into account the overall function changes for nodes from different TGs (as shown above), we tested our hypothesis by calculating a group distance, which is the average of all functional distances for partner pairs from different TGs for one single node. This approach will maximize the group distance to control the temporal effect (i.e., genes in the same temporal group are functionally close, as shown above) in the empirical network, thus making the analysis more convincing. We found that the group distances for partners of hubs (we defined a hub with minimum degree of 50, n = 678) in the empirical network were significantly smaller than the values obtained from random networks (Kruskal-Wallis test, H = 62.69; df = 1; P < 0.0001), suggesting that partners are actually more functionally similar. Varying the definition of a hub to a minimum degree of 100 (n = 309) did not change the significance (Kruskal-Wallis test, H = 218.13; df = 1; P < 0.0001).
There are several considerations that might bias our result. The first consideration is that directly interacting proteins are functionally similar. Our PPI data show that interacting partners are more functionally similar (Kruskal-Wallis test, H = 53326.61; df = 7; P < 0.0001), which is consistent with previous reports [18, 28]. We thus repeated the analysis by excluding neighbor pairs that are indeed interacting. The result was still statistically significant (Kruskal-Wallis test, H = 211.65; df = 1; P < 0.0001). The second consideration is duplication of interaction partners, because gene duplication plays a major role in evolution by providing material for evolution. Although it has been reported that 1), most duplicated genes experience a brief period of relaxed selection early in their history, many of them diverge significantly or are wiped out by natural selection due to accumulation of deleterious mutations ; and 2), only the most conserved pairs will retain their interaction . However, some of the duplicated genes as well as the inherited interaction survive. This is the key point of the duplication-divergence model. If a hub protein X interacts with partner protein Y and Y', but Y' is duplicated from Y, Y and Y' are likely to have functional similarity due to the duplication. In order to control this situation, all human genes were clustered based on nucleotide sequence similarity. In brief, if gene A shows enough sequence similarity to gene B, and gene B shows enough sequence similarity to gene C, even if gene A is not similar to gene C, genes A, B and C will be put into one cluster. Clusters containing two or more genes thus show evidence of historic and detectable duplications. We first used the sequence similarity threshold of e-25. We excluded all genes from clusters containing two or more genes from our interaction data and repeated the analysis. After the correction, our results remained statistically significant (Kruskal-Wallis test, H = 201.08; df = 1; P < 0.0001). If we use loosened thresholds such as e-10 or e-20, the results are quite similar (data not shown). Finally, since the interaction partners of a hub protein (that are not interacting with each other) are actually at a network distance of two, we also tested to see if the functional distance of interaction partners of a hub protein is smaller than the overall functional distance of proteins in the network with a network distance of two. Once again the result is consistent with what we expect: Members in the topological unit are more functionally condensed (Z = -68.79; P < 0.0001).
All in all, here we show topological modularity of the network as well as the functional modularity. The topological unit is also the functional unit of the network.
Current studies on the evolution of the PPI network focused mostly on topological properties, especially the cause of the power law degree distribution. A number of network models have been proposed. Preferential attachment is widely acknowledged as a candidate mechanism of generating a power law degree distribution for many networks, including the Web, publication citations and others. When the preferential attachment model is applied to PPI network evolution, it predicts that the interaction gain of a protein in the network is related to its connectivity at present. By adding a "fitness" parameter, Bianconi and Barabasi proposed an improved preferential attachment model called the "Fitness model," which gives the opportunity for latecomers to compete with existing nodes . Some of our observations agree with what preferential attachment models predict. Since this model is not very biologically relevant and not able to capture the modularity in the empirical PPI network, it cannot be used to model the evolution of the PPI network.
Compared to the preferential attachment model, the duplication-divergence model may be more promising. It is more biologically plausible, and the network produced by the duplication-divergence model satisfies both the power law degree distribution and the modularity structure. However, duplication-divergence models are still derived more from a topological perspective. It is obvious that the evolution of the network is based on the evolution of proteins in the network. Unfortunately principles of molecular evolution are still largely ignored in the current duplication-divergence models. These models claim that both the observed degree distribution and the topological modularity of the network could be produced by gene duplication regardless of biological function and natural selection [7, 13, 30, 38]. In the real world, every surviving gene and its interactions contribute to the organism's fitness according to its functional significance . The fitness varies across specific biological functions and through stages of evolution. A gene or interaction with high fitness will survive in the next round of selection. Those interactions and the gene itself with low fitness will be selected against. The fitness of modularity is less studied because modularity does not interact with the environment directly, thus it was thought that it might not contribute to the fitness of an organism. However, simulation studies suggest modularity would directly benefit fitness by providing evolvability . Modularity can also contribute to the fitness of an organism by increasing "error tolerance" through limiting the contributions of the fitness of genes in the module . We found a connection between topological modularity and functional modularity by showing that the topological unit is also the functional unit. The topological modularity appears to carry functional information and less likely to be a pure byproduct of stochastic processes. So is the evolution of the overall network.
In this study, we show that the human PPI network is functionally organized and evolving. The evolution of function is consistent with the evolution of network topologies. Function might substantiality contribute to the local topological modularity of a PPI network. Although the functional evolution is hard to incorporate into current stochastic models, we suggest that it cannot be simply ignored when studying PPI network evolution.
Protein interaction and annotation database
Nucleotide sequences used in this study were collected from two sources: the NCBI Reference Sequence (RefSeq) database for human and mouse ftp://ftp.ncbi.nih.gov/refseq/ and the Unigene database for all other species as listed below ftp://ftp.ncbi.nih.gov/repository/UniGene/. Human protein-protein interaction data was integrated from three sources: BioGrid http://thebiogrid.org, HPRD http://www.hprd.org and REACTOME http://www.reactome.org. Functional annotations for human genes were retrieved from the PANTHER database ftp://ftp.pantherdb.org and the GO database http://geneontology.org.
Human gene temporal group construction
All human genes were classified into six temporal groups based on a nucleotide sequence similarity search using BLAST  against several clades in the known evolutionary tree  with an E-value threshold set to e-20. In detail, if a human gene has a homolog in either the fruitfly, mosquito, nematode or schistosoma species with the nucleotide similarity over the threshold, it was classified into the oldest temporal group. Similarly, if a second gene has a homolog in either the pufferfish, medaka, trout or zebrafish species but not in the first clade, the assumption was made that it was introduced at this stage in the phylogenetic chain and was therefore placed in the second temporal group. The species used for each temporal group are: TG1 (African malaria mosquito, Fruitfly, Nematode, Schistosoma and Yellow fever mosquito), TG2 (Medaka, Pufferfish, Trout and Zebrafish), TG3 (Clawed frog and Tropical frog), TG4 (Chicken), TG5 (Cattle, Dog, Pig and Sheep), and TG6 (all human genes not found in the other species). See Additional file 1, Figure S1 for more details. Since the distribution of genes among the six different temporal groups would be sensitive to the threshold E-value used to allocate genes. We also tried a looser threshold e-10 and a stricter threshold e-30 for the classifying. Our conclusion was basically not affected by which threshold was chose. We thus report the results using the threshold of e-20, which is more commonly used by other studies. Results using other thresholds are provided in Additional file 1, Table S5 and S6.
Where I m, n and E m, n are the observed and all possible interactions between temporal group m and n in the PPI network, respectively. N is the number of proteins that are in the PPI network of a particular temporal group.
Coding sequences (CDS) of human and mouse were extracted from RefSeq transcripts. Orthologs between human and mouse were identified using reciprocal blast with the threshold of e-50. Orthologous protein pairs were aligned using ClustalW and then back translated into a nucleotide sequence alignment. For a nucleotide sequence, Ka is defined as the number of nonsynonymous substitutions per nonsynonymous site and Ks as the number of synonymous substitutions per synonymous site. Ka/Ks (Omega) is the index of strength of selective constraint. Ka and Ks are estimated using the maximum likelihood method implemented in the codeml program under the F3 } 4 model of codon substitution .
Gene functional distance
We used two different methods to calculate the gene functional distance. For a direct method, genes were first represented in vector space, where each vector denoted presence or absence of a functional term. If there is an annotation for this functional term, that term's position in the vector is set to 1, otherwise it is set to 0. Considering the hierarchical structure of function annotations, we used only the sub-root level annotations (the direct children of biological process/molecular function) for each transcript to avoid redundancy. Functional distance is calculated as the Mahalanobis distance, measured for the vectors. Mahalanobis distance was used because it considers the dependence of the annotation terms, which is reflected in the covariance matrix. We also used a two-step semantic similarity based method implemented by R package csbl.go . In this method, the semantic similarity between each pair of GO annotation terms was first computed according to Resnik  and gene functional similarity was then measured by the maximum of pair wise term similarities for the gene pair . The gene similarities were finally 1/2x transformed into distance-like measures.
Using either method, or using the annotation "Biological Process" or "Molecular Function", will not affect our conclusion. We thus reported functional distances calculated from "Biological Process" annotations from Mahalanobis distance method only in this manuscript. Gene functional distance data from "Biological Process" and "Molecular Function" of both methods can be downloaded at http://www.mooneygroup.org/yiqiang/PPI_data/.
We used the Kruskal-Wallis test for comparing populations in this study. Kruskal-Wallis test is non-parametric one-way analysis of variance which does not assume that the data are normally distributed. Kruskal-Wallis test is an extension of the Mann-Whitney U test to three or more groups and it is equivalent to the Mann-Whitney U test when applying for two groups. Function enrichment/overrepresentation of specific functional annotations was determined by the hypergeometric test. The z-score was used to measure if proteins in some functional categories had significantly higher or lower network properties. The statistical significance was then accessed according to the Gaussian distribution. Considering the hierarchical structure of function annotations, we used only the sub-root level annotation (the annotation just under biological process and molecular function) for each gene to do the function enrichment test. P-values are corrected by the Benjamini-Hochberg (BH) method.
This work was supported by the National Library of Medicine (R01LM009722 (PI: Mooney), U54-CA0126540 (PI: Lithgow)), National Center for Biomedical Ontology (U54-HG004028 (PI: Musen)), Buck Trust and funds from INGEN. The Indiana Genomics Initiative (INGEN) is funded in part from a grant by endowment of Eli Lilly and Co. We would like to thank Corey Powell and Dietlind Gerloff for discussions, and Corey Powell and Laura Scearce for editing the manuscript.
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996, 93 (1): 13-20. 10.1073/pnas.93.1.13.PubMed CentralView ArticlePubMedGoogle Scholar
- Uetz P, Finley RL: From protein networks to biological systems. FEBS Lett. 2005, 579 (8): 1821-1827. 10.1016/j.febslet.2005.02.001.View ArticlePubMedGoogle Scholar
- Xia K, Fu Z, Hou L, Han JD: Impacts of protein-protein interaction domains on organism and network complexity. Genome Res. 2008, 18 (9): 1500-1508. 10.1101/gr.068130.107.PubMed CentralView ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411 (6833): 41-42. 10.1038/35075138.View ArticlePubMedGoogle Scholar
- Eisenberg E, Levanon EY: Preferential attachment in the protein network evolution. Phys Rev Lett. 2003, 91 (13): 138701-View ArticlePubMedGoogle Scholar
- Barabasi AL, Albert R: Emergence of scaling in random networks. Science. 1999, 286 (5439): 509-512. 10.1126/science.286.5439.509.View ArticlePubMedGoogle Scholar
- Evlampiev K, Isambert H: Conservation and topology of protein interaction networks under duplication-divergence evolution. Proc Natl Acad Sci USA. 2008, 105 (29): 9863-9868. 10.1073/pnas.0804119105.PubMed CentralView ArticlePubMedGoogle Scholar
- Gibson TA, Goldberg DS: Improving evolutionary models of protein interaction networks. Bioinformatics. 2011, 27 (3): 376-382. 10.1093/bioinformatics/btq623.PubMed CentralView ArticlePubMedGoogle Scholar
- Chung F, Lu L, Dewey TG, Galas DJ: Duplication models for biological networks. J Comput Biol. 2003, 10 (5): 677-687. 10.1089/106652703322539024.View ArticlePubMedGoogle Scholar
- Aiello W, Chung F, Lu L: A random graph model for massive graphs. 2000: Acm. 2000, 171-180.Google Scholar
- Almaas E, Vazquez A, Barabasi AL: Scale-free networks in biology. Biological networks. 2007, 3: 1-View ArticleGoogle Scholar
- Bebek G, Berenbrink P, Cooper C, Friedetzky T, Nadeau J, Sahinalp S: Improved duplication models for proteome network evolution. Systems Biology and Regulatory Genomics. 2006, 119-137.View ArticleGoogle Scholar
- Hallinan J: Gene duplication and hierarchical modularity in intracellular interaction networks. Biosystems. 2004, 74 (1-3): 51-62. 10.1016/j.biosystems.2004.02.004.View ArticlePubMedGoogle Scholar
- Kim WK, Marcotte EM: Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence. PLoS Comput Biol. 2008, 4 (11): e1000232-10.1371/journal.pcbi.1000232.PubMed CentralView ArticlePubMedGoogle Scholar
- Qi Y, Ge H: Modularity and dynamics of cellular networks. PLoS Comput Biol. 2006, 2 (12): e174-10.1371/journal.pcbi.0020174.PubMed CentralView ArticlePubMedGoogle Scholar
- Wuchty S, Oltvai ZN, Barabasi AL: Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet. 2003, 35 (2): 176-179. 10.1038/ng1242.View ArticlePubMedGoogle Scholar
- Yeang CH, Haussler D: Detecting coevolution in and among protein domains. PLoS Comput Biol. 2007, 3 (11): e211-10.1371/journal.pcbi.0030211.PubMed CentralView ArticlePubMedGoogle Scholar
- Agarwal S, Deane CM, Porter MA, Jones NS: Revisiting date and party hubs: novel approaches to role assignment in protein interaction networks. PLoS Comput Biol. 2010, 6 (6): e1000817-10.1371/journal.pcbi.1000817.PubMed CentralView ArticlePubMedGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004, 430 (6995): 88-93. 10.1038/nature02555.View ArticlePubMedGoogle Scholar
- Lemos B, Meiklejohn CD, Hartl DL: Regulatory evolution across the protein interaction network. Nat Genet. 2004, 36 (10): 1059-1060. 10.1038/ng1427.View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296 (5568): 750-752. 10.1126/science.1068696.View ArticlePubMedGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA. 2007, 104 (21): 8685-8690. 10.1073/pnas.0701361104.PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Ulitsky I, Shamir R: Network-based prediction of protein function. Mol Syst Biol. 2007, 3: 88-PubMed CentralView ArticlePubMedGoogle Scholar
- Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C: Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet. 2006, 78 (6): 1011-1025. 10.1086/504300.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu X, Jiang R, Zhang MQ, Li S: Network-based global inference of human disease genes. Mol Syst Biol. 2008, 4: 189-PubMed CentralView ArticlePubMedGoogle Scholar
- Radivojac P, Peng K, Clark WT, Peters BJ, Mohan A, Boyle SM, Mooney SD: An integrated approach to inferring gene-disease associations in humans. Proteins. 2008, 72 (3): 1030-1037. 10.1002/prot.21989.PubMed CentralView ArticlePubMedGoogle Scholar
- Pandey J, Koyuturk M, Grama A: Functional characterization and topological modularity of molecular interaction networks. BMC Bioinforma. 2010, 11 (Suppl 1): S35-10.1186/1471-2105-11-S1-S35.View ArticleGoogle Scholar
- Yook SH, Oltvai ZN, Barabasi AL: Functional and topological characterization of protein interaction networks. Proteomics. 2004, 4 (4): 928-942. 10.1002/pmic.200300636.View ArticlePubMedGoogle Scholar
- Kunin V, Pereira-Leal JB, Ouzounis CA: Functional evolution of the yeast protein interaction network. Mol Biol Evol. 2004, 21 (7): 1171-1176. 10.1093/molbev/msh085.View ArticlePubMedGoogle Scholar
- Wang Z, Zhang J: In search of the biological significance of modular structures in protein networks. PLoS Comput Biol. 2007, 3 (6): e107-10.1371/journal.pcbi.0030107.PubMed CentralView ArticlePubMedGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297 (5586): 1551-1555. 10.1126/science.1073374.View ArticlePubMedGoogle Scholar
- Erdos P, Renyi A: On the evolution of random graphs. Publ Math Inst Hungar Acad Sci. 1960, 5: 17-61.Google Scholar
- Hedges SB: The origin and evolution of model organisms. Nat Rev Genet. 2002, 3 (11): 838-849.View ArticlePubMedGoogle Scholar
- Anderberg M: Cluster analysis for applications. 1973, New York: AcademicGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.View ArticlePubMedGoogle Scholar
- Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M: Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 2004, 14 (6): 1107-1118. 10.1101/gr.1774904.PubMed CentralView ArticlePubMedGoogle Scholar
- Bianconi G, Barabasi AL: Competition and multiscaling in evolving networks. EPL (Europhysics Letters). 2001, 54: 436-10.1209/epl/i2001-00260-6.View ArticleGoogle Scholar
- Gibson TA, Goldberg DS: Questioning the ubiquity of neofunctionalization. PLoS Comput Biol. 2009, 5 (1): e1000252-10.1371/journal.pcbi.1000252.PubMed CentralView ArticlePubMedGoogle Scholar
- Orr HA: Fitness and its role in evolutionary genetics. Nat Rev Genet. 2009, 10 (8): 531-539. 10.1038/nrg2603.PubMed CentralView ArticlePubMedGoogle Scholar
- Schlosser G: Modularity and the units of evolution. Theory Biosci. 2002, 121 (1): 1-80. 10.1078/1431-7613-00049.View ArticleGoogle Scholar
- Schlosser G, Wagner GP: 2004, Modularity in development and evolution: University of Chicago PressGoogle Scholar
- Pruitt KD, Tatusova T, Klimke W, Maglott DR: NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2009, D32-36. 37 DatabaseGoogle Scholar
- Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA: Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003, 31 (1): 28-33. 10.1093/nar/gkg033.PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, D535-539. 34 DatabaseGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003, 13 (10): 2363-2371. 10.1101/gr.1680803.PubMed CentralView ArticlePubMedGoogle Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005, D428-432. 33 DatabaseGoogle Scholar
- Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A: PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003, 13 (9): 2129-2141. 10.1101/gr.772403.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13 (5): 555-556.PubMedGoogle Scholar
- Ovaska K, Laakso M, Hautaniemi S: Fast gene ontology based clustering for microarray experiments. BioData Min. 2008, 1 (1): 11-10.1186/1756-0381-1-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Resnik P: Using information content to evaluate semantic similarity in a taxonomy. Arxiv preprint cmp-lg/9511007. 1995Google Scholar
- Xu T, Du L, Zhou Y: Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data. BMC Bioinforma. 2008, 9: 472-10.1186/1471-2105-9-472.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.