Relationship between gene duplicability and diversifiability in the topology of biochemical networks
© Guo et al.; licensee BioMed Central Ltd. 2014
Received: 29 January 2014
Accepted: 26 June 2014
Published: 8 July 2014
Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.
Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene’s duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes – the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.
Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks – an improvement of our understanding of gene duplicability.
Biochemical networks underlie essentially all cellular functions [1, 2]. Proteins do not act alone. Instead, they connect with each other to form pathways, such as the MAP kinase cascades and the glycolysis pathway. The connections are often direct physical protein-protein interactions or enzyme-substrate relationships. They can also be indirect ones. For instance, metabolic enzymes are usually connected through a chain of biochemical reactions they catalyze, even though the enzymes may not be physically associated with each other. And pathways in turn join together to form networks, such as the signaling and the metabolic networks. It is via such networks that genomic information gives rise to cellular functions and genotypes are translated into phenotypes. Biochemical network models have thus long served effectively as platforms for analysis of high-throughput experimental data, e.g., microarray or next generation sequencing based gene expression data [3–5].
A prominent category of constituents in biochemical networks is proteins encoded by duplicate genes, also termed paralogs . Duplicate genes arose from genomic duplication events, which can be whole-genome duplication (WGD) or small-scale duplication (SSD). Genomic duplication is a major driving force of biological evolution [6–8]. Proteins of duplicate genes are thus abundant in biochemical networks. Moreover, their abundance increases along with genomic complexity, which is quantified by genome size, gene number, abundance of spliceosomal introns and mobile genetic elements, from bacterial to uni-cellular eukaryotes, to multi-cellular species . Proteins of duplicate genes function and evolve in biochemical networks [10, 11]. Duplicate gene evolution is frequently analyzed in the context of biochemical networks, such as the protein-protein interaction networks [12–14] and the metabolic networks [15, 16], as well as other biological networks [17, 18].
A critical issue is gene duplicability. This term captures the selective gene duplication pattern universally observed in sequenced genomes [19–22]. A small portion of the genes in a genome has extraordinarily high duplicate counts, while the vast majority either are singletons or has only a few duplicates. In other words, a small number of gene families are selectively expanded during the genomic evolution process. Quantitatively, this phenomenon is often described by a power-law relationship between the number of genes (P(K)) with K duplicates and the duplicate count K, P(k) ∝ k-α, with α as a positive constant. This relationship holds true regardless of which duplicate gene detection methods were used; FASTA, BLAST, as well as protein domain based methods have all been used [19, 22–24]. Moreover, this relationship holds true in bacterial, unicellular eukaryotic and multicellular genomes, and changes in the value of α can be used to quantify enrichment of duplicate genes as genomic complexity increases . We operationally define gene duplicability, as popularly done, as the number of duplicates a gene has or the size of the gene family in a genome [25–28], although slightly different definitions also exist .
How and why did the selective gene duplicability pattern described above emerge? Two seemingly contradictive factors should contribute significantly: the opportunity to derive novel genetic materials from existing ones and the need to minimize deleterious effects of gene duplication. The first is the evolutionary advantage that genomic duplication confers to a species. A gene in the duplicated regions would have two copies. Subsequently, the pair of duplicate genes would accumulate mutations. Very often, one of the two duplicates formed a pseudo-gene, and became silenced [6, 30]. More importantly, the mutations sometimes led to functional diversification, either neo- or sub-functionalization, between the pair [7, 22, 31, 32]. This divergence can be in spatial-temporal expression patterns, interaction partners, enzymatic specificities of their proteins or subcellular locations of their proteins, etc. On the other hand, gene duplicability is limited, as postulated by the gene balance hypothesis, by the second factor – the potential detrimental effects of gene duplication due to disruption of the stoichiometric balance between protein products of duplicated and non-duplicated genes [28, 33, 34]. For instance, specific ratios among subunits are required for formation of protein complexes, which are major components of biochemical networks. Unless the genes for every subunit are all duplicated, a genomic duplication event would disrupt the balance. Rapid neo- or sub-functionalization between the two duplicates would restore the stoichiometric balance and alleviate this gene dosage constrain, thus enhancing gene duplicability. For instance, in multi-cellular genomes, enhanced functional diversification through accumulation of introns has been associated with higher duplicate gene survival rates [9, 35].
Thus, functional diversification of duplicate genes not only promotes genomic functional innovation, but also alleviates potential deleterious effect of gene duplication. It is very likely that selective gene family expansion and enhanced diversification within the expanding families proceeded inextricably hand-in-hand. In other words, duplicate genes in larger gene families should have diverged from each other to a higher extent than those in smaller families. For the sake of consistency with the usage of “duplicability” to refer to the propensity of a gene to be duplicated (duplication rate and duplicate survivability) , we use the term “diversifiability” as its sister term to refer to the propensity of duplicate genes to undergo diversification (neo- or sub-functionalization). Similar to duplicability being operationally computed as the number of duplicate a gene has or the size of the duplicate gene family, diversifiability can be computed as the degree of diversification among duplicate genes. We hypothesized positive correlations between gene duplicability and diversifiability.
Testing the hypothesis requires quantifying diversifiability of duplicate genes. Three metrics were used in this study. Two of them were developed in the context of biochemical network; one measures the extent to which duplicate genes diverge sufficiently for their proteins to participate in mutually antagonizing pathways in a network, the other the pair-wise shortest network distance among the proteins of duplicate genes. As the third metric, a protein sequence homology based clustering coefficient was used to quantify sequence divergence among duplicate genes. We report, for each of the three metrics, positive correlation between gene duplicability and diversifiability.
Quantification of gene duplicability
To put it another way, we used all-against-all BLAST results of a proteome to cast the proteins into a weighted sequence homology network for that species. Nodes and edges of the network were proteins and pair-wise protein homology relationship, respectively. Edges were weighted by the strength of the homology relationship, as quantified by BLAST output parameters such as the E-value. Connectivity of proteins, a key parameter in network analysis, equals to the values of their BLAST hit count K, which, as described above, follows a power-law distribution. The network is thus scale-free. This line of analysis, to be discussed later, led us to effectively adopt another standard network analysis parameter in this study.
Our question then became whether the value of K is correlated with gene diversifiability, the extent to which these duplicate genes have diverged. Thus, the next step was to evaluate duplicate gene diversifiability, which we performed in the context of biochemical networks.
Pairs of duplicate genes that have diverged to mutual genetic antagonism tend to belong to high duplicability gene families
We looked for an approach to identify cases of high diversifiability among duplicate genes, so that we could then determine whether high diversifiability is associated with high gene duplicability, i.e., high K values. We took advantage of the observation that two proteins may participate in pathways that antagonize each other in a biochemical network. Genetically, synthetic knockout of both of their genes rescues or alleviates the phenotypic defects caused by the individual knockout of either one. Pairs of duplicate genes that exhibit this genetic antagonism relationship must have gone through a switch from their initial identical functions upon gene duplication to functional antagonism – a complete functional diversification process. Such pairs are thus perfect examples of high functional diversifiability. For instance, the S. cerevisiae Pif1 and Rrm3 DNA helicases share high sequence homology (BLAST E-value 2E-103), but they have opposite effects on ribosome DNA replication. Pif1 enhances necessary pausing, whereas Rrm3 promotes continuous progression of the replication forks . Moreover, synthetic knockout has been systematically carried out in the yeast S. cerevisiae, making it possible to identify pairs of mutually antagonizing duplicate genes. We thus identified, as described in Materials and Methods, all such S. cerevisiae duplicate gene pairs from the SGD database. As a control for our analysis, we also identified pairs of duplicate genes that exhibit the opposite relationship – mutual genetic complement. In such relationships, synthetic knockout of both genes causes more severe phenotypic defects than each of the two individual knockouts. The two duplicate genes in such pairs retain functional similarity, and are often functionally interchangeable. The two groups of duplicate gene pairs gave us an opportunity to determine whether high diversifiability is accompanied by high K values, and thus enhanced duplicability.
Thus, the two genes in GA duplicate gene pairs tend to have their proteins co-occur in BLAST hits of genes with high K values, and they themselves also tend to have higher K values. The results strongly suggest that the higher the duplicability, the higher the functional diversifiability becomes. Genes in smaller gene families tend to be less diversified, so their functions are more likely to compensate each other, giving rise to genetic redundancy and robustness. Genes in larger families, on the other hand, are more likely to have neo- and/or sub-functionalized more to assume different, or even antagonizing, functions. Thus, their functions are less likely to compensate each other. Instead, they contribute to evolutionary functional innovation.
Proteins of genes from high duplicability families tend to be farther away from each other in the protein-protein interaction networks
For a more direct quantifier of functional diversifiability in the context of biochemical networks, we evaluated pair-wise network distances among proteins. Since they diverge from the same ancestor, duplicate gene pairs are expected to be more functionally related than non-duplicate pairs in the topology of biochemical networks. Prior to functional diversification, their proteins shared the same set of interaction partners. During subsequent evolutionary network re-wiring, the proteins went though various levels of functional diversification and switched interaction partners. Very often, these diverging pairs eventually lost all common interaction partners, although they are still more likely than expected by random chance to participate in the same network domains, i.e., functional modules. We tested whether protein-to-protein network distances were a reflection of overall functional similarity among duplicate genes; that is, besides genetic antagonism, whether network distance could be used as another quantifier of functional diversifiability to study relationship between duplicability and functional diversifiability of genes.
Proteins of duplicate gene pairs tend to be closer to each other in protein-protein interaction networks
We tested whether this observation in S. cerevisiae remained true in the human network. As described in Materials and Methods, we downloaded human protein interaction data from the IntAct database . We collected all pairs of proteins of duplicate genes and calculated the distance in the network for each of them. Once again, we randomly picked the same number of pairs of non-paralogous proteins from the network and calculated distance for each of them. A comparison of the two sets of network distances is shown in Figure 4b. The distribution of network distances of randomly picked non-paralogous pairs resembled a normal distribution, with a single peak at network distance 5. The network distances between paralogous proteins, on the other hand, exhibited a very different distribution. The distribution has a similar peak at the network distance 5, but a significant portion of the distances shifted leftward, leading to a shoulder in the short network distance region of the distribution. Consequently, for 17.7% of duplicate gene pairs, their proteins were observed to be close to each other in the network, with a network distance of 2 or 3, whereas the percentage was only 3.9% for randomly picked non-duplicate pairs. When the data are subdivided into two groups using a boundary of 3.5, the two distributions have a p-value of 0.003 (Pearson’s χ2 test).
Proteins of genetically antagonizing (GA) duplicate gene pairs tend to have longer network distances than those of genetically complementing (GC) pairs
The result suggests that, as a reflection of their functional similarity, paralogous proteins have an overall tendency to be closer to one another in both the human and the yeast biochemical networks. However, different duplicate gene pairs might have different levels of functional similarity. Also, not all pairs of duplicate genes retain high functional similarity during genomic evolution, as evolution pressure is often for neo- and/or sub-functionalization. We tested whether the network distance tends to be longer between the two proteins of a highly diverged pair of duplicate genes.
Network distances between proteins of genes from high duplicability families tend to be longer
These results suggest that network distance can be used as a quantifier of duplicate gene diversifiability in the topology of biochemical networks; longer distances imply higher diversifiability. Functional diversifiability measured with this parameter correlates positively with gene duplicability. Thus, network distance and genetic relationship provide two lines of evidence that enhanced functional diversifiability accompanied hand-in-hand enhanced gene duplicability.
The positive correlation between duplicability and diversifiability applies to both whole-genome duplicate (WGD) and small-scale duplication (SSD) duplicate genes
Respective distribution of GC and GA pairs, and their ratio, in WGD and SSD duplicate gene pairs
Thus, the positive correlation between gene duplicability and diversifiability applies to both WGD and SSD duplicate genes. We next examined whether the enhanced functional diversifiability observed among high duplicability genes is accompanied by enhanced diversification at the sequence level, as sequence is the primary determinant of protein functions.
Genes from high duplicability families tend to be more divergent at the sequence level
Where Ki is the duplicability (connectivity in the homology network) of protein i; proteins j and k represent any pair of immediate neighbors of protein i. Wij is the weight of the edge between proteins i and j (0 < W ≤ 1), which is calculated as the negative logarithm of the pair-wise BLAST E-value normalized by the maximum E-value of each individual clusters. In short, Ci indicates the probability for two immediate neighbors of node i to form a non-zero weighted triangle together with i.
Taken together, these results from S. cerevisiae and human suggest that higher gene duplicability is accompanied by enhanced diversifiability at the sequence level, the 3rd line of evidence that high gene duplicability and diversifiability acted hand-in-hand during selective gene family expansion in genomic evolution. Thus, this study has used three parameters, each measuring one aspect of duplicate gene diversifiability. Results from all of them support the notion that diversifiability is an important determinant of gene duplicability in evolution.
Selective gene duplicability is universally observed in all sequenced genomes . Gene duplication is a major source of genetic material for functional innovation in evolution, leading to a high genomic abundance of duplicate genes [6–8]. On the other hand, in a genome, most protein-coding duplicate genes belong to a small number of gene families while genes outside these large families have few or no duplicates, i.e., a small number of gene/protein families are selectively and extensively expanded. Quantitatively, the number of genes with K duplicates (P(K)) often follows a power law, i.e., P(K) ∝ K-α, with α as a positive constant [19, 22, 23]. As K increases, P(K) decreases precipitously. High duplicability is confined to select groups of genes. This study demonstrates, at the genomic level, that this elevated gene duplicability is associated with higher degree of functional and sequence diversification. We use the term gene diversifiability as a sister term of gene duplicability to describe the degree of diversification among duplicate genes. In a word, high gene duplicability and high gene diversifiability acted side-by-side to promote functional innovation during evolution. This conclusion is supported, as discussed below, by results from each of the three diversifiability measurement methods.
First, we took advantage of systematic genetic interaction data available for the yeast S. cerevisiae, and identified all pairs of genetically antagonizing duplicate genes as representatives of high diversifiability between duplicate genes. For instance, as mentioned earlier, the homologous DNA helicases Pif1 and Rrm3 exert opposite effects on ribosome and mitochondria DNA replication [36, 44], and the protein kinases FUS3 and CDC28 (E-value 9E-46) have counteractive control over cell polarization during mating . Genes in such fully diversified pairs, we found, overwhelmingly belong to large duplicate gene groups – they have higher duplicate counts (K) and thus higher duplicability. As a result, the relationship between the number of proteins (P(K)) with K duplicates and the duplicate count K among these genes deviates significantly from a power-law relationship. These results show that high diversifiability genes tend to have high duplicability as well.
Second, we examined network distances (shortest path) between proteins of duplicate genes in the protein-protein interaction network as a metric of their diversifiability. In both human and yeast, the higher a gene’s duplicate count, the further its duplicates’ protein products tend to drift away from its own protein (longer shortest path) in the networks. In yeast, proteins of genetically antagonizing duplicate gene pairs tend to have longer network distances than those of genetically complementing pairs. This further confirms that gene diversifiability is positively correlated to gene duplicability.
Third, we measured sequence divergence within duplicate gene groups, using a homology-based clustering coefficient, which increases inversely to sequence divergence. A negative correlation was observed between duplicability K of a gene and the clustering coefficient among its duplicates. Thus, once again, gene duplicability is positively correlated to gene diversifiability.
Taken together, these results demonstrate enhanced diversifiability among genes in large duplicate gene families. Current knowledge suggests that this enhanced diversifiability played two roles in duplicate gene evolution – functional innovation and, at the same time, alleviation of the gene-dosage evolution constrain. As discussed in the introduction, cellular processes usually consist of the actions of multiple proteins and require specific stoichiometry ratios among the proteins. Gene duplication breaks this balance between proteins of duplicated and non-duplicated genes . Thus, without functional diversification, duplicate genes not only confer no evolution advantage (functional innovation), but also are also potentially deleterious. Very often, one of the two copies of the gene disappears in order for the gene balance to be restored during subsequent evolution . When both survive, they must quickly neo- or sub-functionalize, both to alleviate the gene-dosage constrain and to meet the evolution demand for functional innovation [46, 47]. Thus, enhanced diversifiability must have accompanied selective expansion of gene families during genomic evolution. And, in addition to duplication rates, gene duplicability is also determined by survival rates of duplicate genes.
Our results are consistent with the observations by Lynch and Conery that duplicate gene survival rate increases from prokaryotes to multicellular eukaryotes . It is obvious that additional venues for functional diversifiability of duplicate genes became available as cellular and organismal complexity increases from prokaryotes to eukaryotes, and on to multicellular species. For instance, eukaryotic cells are compartmentalized, making subcellular localization a potential venue for gene diversification. Multi-cellular species provide an additional layer of functional diversification, diversifying cell/tissue distribution patterns. The evolutionary pressure is to create complementary expression patterns among duplicate genes [22, 32]. Protein products of duplicate genes often do not co-exist in the same cell or subcellular location. They can preserve their biochemical specificity, e.g. interacting with the same set of proteins, without breaking the gene dosage balance. The gene dosage constraint is thus lessened, explaining the higher retention rate of duplicate genes observed in multi-cellular genomes.
The higher duplicate gene retention rates due to enhanced diversifiability in turn leads to increases in duplicate gene abundance from prokaryotes to multicellular eukaryotes. This is consistent with changes in the exponent (α) values of the power-law relationship between P(K) and K, P(K) ∝ K-α. The value of α is a duplicate protein coding gene abundance quantifier . Its value decreases from prokaryote to unicellular eukaryote, and to multicellular eukaryote. A lower value of α indicates that P(K) decreases at a slower pace as K increases, and therefore dictates higher paralog abundance. As discussed above, the eukaryotic cellular environment is more permissive for gene duplication, allowing duplicate genes to be partitioned to different cellular compartments to bypass the dosage evolutionary constraint. Moreover, in multicellular eukaryotes, duplicate genes can potentially overcome the dosage evolutionary constraint through expression in different cell types.
The permissive environment in multicellular eukaryotes enabled extensive expansion of many gene families. On the other hand, the expansion is often species-specific, such as the explosive expansion of the receptor serine/threonine kinase family and the receptor tyrosine kinase family in plants and animals, respectively [48, 49]. Species-specific factors enhancing diversifiability and duplicability within these gene families, and how their expansion contributed to evolutionary fitness of the specific species, will be an interesting research topic.
In summary, we report three lines of evidence supporting a positive relationship between gene diversifiability and duplicability. The significance of this work can be illustrated through an analogy. Both genetic sequences and English literature are linear strings of alphabets [50, 51]. If the genome is the “book” of life, as it is often referred to, evolution is the “writer” of the book. The process of gene duplication and subsequent diversification is in turn intuitively analogous to the frequently used copy-paste-revise writing technique – copying and pasting texts from other sources, and then revising and merging them into current context. Gene duplication and fate of duplicate genes has thus been fundamental in genomics and evolution biology. This study improves our understanding of this critical process in the context of biochemical networks.
Sequence data and duplicate gene identification
Yeast (S. cerevisiae) proteome sequences were downloaded from the Saccharomyces Genome Database (SGD) . Human (Homo sapiens) sequences were downloaded from NCBI. In yeast, protein sequences encoded by dubious ORF and transposable-element enclosed genes were removed. The yeast and human sequences were then used in respective all-against-all BLAST analysis to identify pairs of duplicate genes . A stringent threshold BLAST E-value of 10−30 is used. In yeast, a total of 7,556 pairs, involving 1,945 genes, were identified. In human 99,611 pairs, involving 13,309 genes, were identified.
Genetic antagonism (GA) and complementation (GC) data of the yeast S. cerevisiae
Synthetic knockout data was downloaded from the SGD database . We identified all genetic interactions between duplicate genes, and assigned them to the (GA) or the (GC) categories based on SGD annotation. Genes were designated as antagonistic when their synthetic knockout rescues or alleviates the phenotypic defects caused by individual knockout of either one. Such genetic interactions were annotated as synthetic rescue or phenotypic suppression in the SGD dataset, and were categorized accordingly as GA in our analysis. The opposite, GC, means that synthetic knockout of two genes causing severer phenotypic defect than individual knockout. Such interactions were annotated as synthetic lethality or phenotypic enhancement in SGD, and were categorized accordingly as GC.
Protein-protein interaction network data
Yeast protein interaction data were downloaded from the Saccharomyces Genome Database (SGD) . The dataset contains 38573 interactions. Individual proteins have up to 332 interactions. Human protein interaction data was from the IntAct database . The dataset contains 140,268 interactions. Individual proteins have up to 1,225 interactions.
Shortest path analysis
Pair-wise shortest path in the protein interaction network was calculated between proteins using the depth-first search (DFS) algorithm. The length of a shortest path is calculated as the number of proteins in the path. For instance, a length of 2 indicates the two proteins directly connect with each other, and a length of 3 indicates that there is one protein between them. In analyzing the distribution of shortest paths between paralogous pairs, an equal number of non-paralogous pairs randomly picked from the network were used as a control.
WGD and SSD data set
The whole-genome duplicate (WGD) data set is taken from The Yeast Gene Order Browser [54, 55]. Within the entire 548 pairs, 465 pairs (930 genes) still have BLAST E-values smaller than 1E-30 between their proteins and are included with our duplicate gene pair list. We use this dataset of 465 pairs for our WGD analysis. The rest (3762 pairs) of pairs in our list are considered as small-scale duplication (SSD) pairs. These 3762 pairs consist of 1441 genes. WGD and SSD pairs share 321 genes. As expected from the way they were identified, each WGD duplicate gene show up only once in the whole list of WGD pairs.
This work is supported by National Institute of Health (NIH) grant 1R01LM010212 and fund from the Greehey Children’s Cancer Research Institute (GCCRI) to DW. We would like to acknowledge Thomas G. Andrew of GCCRI for proofreading the manuscript, and Yufeng Wang at University of Texas at San Antonio (UTSA) for providing constructive suggestions.
- Barabasi A-L, Oltvai ZN: Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004, 5: 101-113.PubMedView ArticleGoogle Scholar
- Ideker T, Krogan NJ: Differential network biology. Mol Syst Biol. 2012, 8: 565-PubMed CentralPubMedView ArticleGoogle Scholar
- Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001, 292: 929-934.PubMedView ArticleGoogle Scholar
- Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002, 18 (Suppl 1): S233-S240.PubMedView ArticleGoogle Scholar
- Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431: 308-312.PubMedView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary demography of duplicate genes. J Struct Funct Genomics. 2003, 3: 35-44.PubMedView ArticleGoogle Scholar
- Konrad A, Teufel AI, Grahnen JA, Liberles DA: Toward a general model for the evolutionary dynamics of gene duplicates. Genome Biol Evol. 2011, 3: 1197-1209.PubMed CentralPubMedView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290: 1151-1155.PubMedView ArticleGoogle Scholar
- Lynch M, Conery JS: The origins of genome complexity. Science. 2003, 302: 1401-1404.PubMedView ArticleGoogle Scholar
- Rzhetsky A, Gomez SM: Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics. 2001, 17: 988-996.PubMedView ArticleGoogle Scholar
- Chung F, Lu L, Dewey TG, Galas DJ: Duplication models for biological networks. J Comput Biol. 2003, 10: 677-687.PubMedView ArticleGoogle Scholar
- Fares MA, Keane OM, Toft C, Carretero-Paulet L, Jones GW: The roles of whole-genome and small-scale duplications in the functional specialization of Saccharomyces cerevisiae genes. PLoS Genet. 2013, 9: e1003176-PubMed CentralPubMedView ArticleGoogle Scholar
- Wagner A: Asymmetric functional divergence of duplicate genes in yeast. Mol Biol Evol. 2002, 19: 1760-1768.PubMedView ArticleGoogle Scholar
- Alvarez-Ponce D, Fares MA: Evolutionary rate and duplicability in the Arabidopsis thaliana protein-protein interaction network. Genome Biol Evol. 2012, 4: 1263-1274.PubMed CentralPubMedView ArticleGoogle Scholar
- Eanes WF: Molecular population genetics and selection in the glycolytic pathway. J Exp Biol. 2011, 214: 165-171.PubMed CentralPubMedView ArticleGoogle Scholar
- Wagner A: Metabolic networks and their evolution. Adv Exp Med Biol. 2012, 751: 29-52.PubMedView ArticleGoogle Scholar
- Conant GC, Wolfe KH: Functional partitioning of yeast co-expression networks after genome duplication. PLoS Biol. 2006, 4: e109-PubMed CentralPubMedView ArticleGoogle Scholar
- Cork JM, Purugganan MD: The evolution of molecular genetic pathways and networks. Bioessays. 2004, 26: 479-484.PubMedView ArticleGoogle Scholar
- Huynen MA, van Nimwegen E: The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol. 1998, 15: 583-589.PubMedView ArticleGoogle Scholar
- Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV: Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol. 2002, 2: 18-PubMed CentralPubMedView ArticleGoogle Scholar
- Koonin EV: Are there laws of genome evolution?. PLoS Comput Biol. 2011, 7: e1002173-PubMed CentralPubMedView ArticleGoogle Scholar
- Padawer T, Leighty RE, Wang D: Duplicate gene enrichment and expression pattern diversification in multicellularity. Nucleic Acids Res. 2012, 40: 7597-7605.PubMed CentralPubMedView ArticleGoogle Scholar
- Wuchty S: Scale-free behavior in protein domain networks. Mol Biol Evol. 2001, 18: 1694-1702.PubMedView ArticleGoogle Scholar
- Qian J, Luscombe NM, Gerstein M: Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J Mol Biol. 2001, 313: 673-681.PubMedView ArticleGoogle Scholar
- D’Antonio M, Ciccarelli FD: Modification of gene duplicability during the evolution of protein interaction network. PLoS Comput Biol. 2011, 7: e1002029-PubMed CentralPubMedView ArticleGoogle Scholar
- He X, Zhang J: Gene complexity and gene duplicability. Curr Biol. 2005, 15: 1016-1021.PubMedView ArticleGoogle Scholar
- Yang J, Lusk R, Li WH: Organismal complexity, protein complexity, and gene duplicability. Proc Natl Acad Sci U S A. 2003, 100: 15661-15665.PubMed CentralPubMedView ArticleGoogle Scholar
- Liang H, Fernandez A: Evolutionary constraints imposed by gene dosage balance. Front Biosci. 2008, 13: 4373-4378.PubMedView ArticleGoogle Scholar
- Hase T, Niimura Y, Tanaka H: Difference in gene duplicability may explain the difference in overall structure of protein-protein interaction networks among eukaryotes. BMC Evol Biol. 2010, 10: 358-PubMed CentralPubMedView ArticleGoogle Scholar
- Pink RC, Wicks K, Caley DP, Punch EK, Jacobs L, Carter DR: Pseudogenes: pseudo-functional or key regulators in health and disease?. RNA. 2011, 17: 792-798.PubMed CentralPubMedView ArticleGoogle Scholar
- Balakirev ES, Chechetkin VR, Lobzin VV, Ayala FJ: DNA polymorphism in the beta-Esterase gene cluster of Drosophila melanogaster. Genetics. 2003, 164: 533-544.PubMed CentralPubMedGoogle Scholar
- Gu Z, Rifkin SA, White KP, Li WH: Duplicate genes increase gene expression diversity within and between species. Nat Genet. 2004, 36: 577-579.PubMedView ArticleGoogle Scholar
- Papp B, Pal C, Hurst LD: Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003, 424: 194-197.PubMedView ArticleGoogle Scholar
- Qian W, Zhang J: Gene dosage and gene duplicability. Genetics. 2008, 179: 2319-2324.PubMed CentralPubMedView ArticleGoogle Scholar
- Carmel L, Rogozin IB, Wolf YI, Koonin EV: Evolutionarily conserved genes preferentially accumulate introns. Genome Res. 2007, 17: 1045-1050.PubMed CentralPubMedView ArticleGoogle Scholar
- Ivessa AS, Zhou JQ, Zakian VA: The Saccharomyces Pif1p DNA helicase and the highly related Rrm3p have opposite effects on replication fork progression in ribosomal DNA. Cell. 2000, 100: 479-489.PubMedView ArticleGoogle Scholar
- Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H: The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012, 40: D841-D846.PubMed CentralPubMedView ArticleGoogle Scholar
- Carretero-Paulet L, Fares MA: Evolutionary dynamics and functional specialization of plant paralogs formed by whole and small-scale genome duplications. Mol Biol Evol. 2012, 29: 3541-3551.PubMedView ArticleGoogle Scholar
- Morris RT, Drouin G: Ectopic gene conversions in the genome of ten hemiascomycete yeast species. Int J Evol Biol. 2011, 2011: 970768-PubMed CentralView ArticleGoogle Scholar
- Satake M, Kawata M, McLysaght A, Makino T: Evolution of vertebrate tissues driven by differential modes of gene duplication. DNA Res. 2012, 19: 305-316.PubMed CentralPubMedView ArticleGoogle Scholar
- Singh PP, Affeldt S, Cascone I, Selimoglu R, Camonis J, Isambert H: On the expansion of “dangerous” gene repertoires by whole-genome duplications in early vertebrates. Cell Rep. 2012, 2: 1387-1398.PubMedView ArticleGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297: 1551-1555.PubMedView ArticleGoogle Scholar
- Onnela J-P, Saramaki J, Kertesz J, Kaski K: Intensity and coherence of motifs in weighted complex networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2005, 71: 065103-065103.PubMedView ArticleGoogle Scholar
- Cheng X, Qin Y, Ivessa AS: Loss of mitochondrial DNA under genotoxic stress conditions in the absence of the yeast DNA helicase Pif1p occurs independently of the DNA helicase Rrm3p. Mol Genet Genomics. 2009, 281: 635-645.PubMedView ArticleGoogle Scholar
- Yu L, Qi M, Sheff MA, Elion EA: Counteractive control of polarized morphogenesis during mating by mitogen-activated protein kinase Fus3 and G1 cyclin-dependent kinase. Mol Biol Cell. 2008, 19: 1739-1752.PubMed CentralPubMedView ArticleGoogle Scholar
- Wagner A: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol. 2001, 18: 1283-1292.PubMedView ArticleGoogle Scholar
- Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007, 449: 54-61.PubMedView ArticleGoogle Scholar
- Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S: The protein kinase complement of the human genome. Science. 2002, 298: 1912-1934.PubMedView ArticleGoogle Scholar
- Wang D, Harper JF, Gribskov M: Systematic trans-genomic comparison of protein kinases between Arabidopsis and Saccharomyces cerevisiae. Plant Physiol. 2003, 132: 2152-2165.PubMed CentralPubMedView ArticleGoogle Scholar
- Searls DB: Linguistic approaches to biological sequences. Comput Appl Biosci. 1997, 13: 333-344.PubMedGoogle Scholar
- Searls DB: The language of genes. Nature. 2002, 420: 211-217.PubMedView ArticleGoogle Scholar
- Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Karra K, Krieger CJ, Miyasato SR, Nash RS, Park J, Skrzypek MS, Simison M, Weng S, Wong ED: Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 2012, 40: D700-D705.PubMed CentralPubMedView ArticleGoogle Scholar
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinformatics. 2009, 10: 421-421.PubMed CentralPubMedView ArticleGoogle Scholar
- Byrne KP, Wolfe KH: Visualizing syntenic relationships among the hemiascomycetes with the Yeast Gene Order Browser. Nucleic Acids Res. 2006, 34: D452-D455.PubMed CentralPubMedView ArticleGoogle Scholar
- Byrne KP, Wolfe KH: The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 2005, 15: 1456-1461.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.