Skip to main content

Altered patterns of gene duplication and differential gene gain and loss in fungal pathogens



Duplication, followed by fixation or random loss of novel genes, contributes to genome evolution. Particular outcomes of duplication events are possibly associated with pathogenic life histories in fungi. To date, differential gene gain and loss have not been studied at genomic scales in fungal pathogens, despite this phenomenon's known importance in virulence in bacteria and viruses.


To determine if patterns of gene duplication differed between pathogens and non-pathogens, we identified gene families across nine euascomycete and two basidiomycete species. Gene family size distributions were fit to power laws to compare gene duplication trends in pathogens versus non-pathogens. Fungal phytopathogens showed globally altered patterns of gene duplication, as indicated by differences in gene family size distribution. We also identified sixteen examples of gene family expansion and five instances of gene family contraction in pathogenic lineages. Expanded gene families included those predicted to be important in melanin biosynthesis, host cell wall degradation and transport functions. Contracted families included those encoding genes involved in toxin production, genes with oxidoreductase activity, as well as subunits of the vacuolar ATPase complex. Surveys of the functional distribution of gene duplicates indicated that pathogens show enrichment for gene duplicates associated with receptor and hydrolase activities, while euascomycete pathogens appeared to have not only these differences, but also significantly more duplicates associated with regulatory and carbohydrate binding functions.


Differences in the overall levels of gene duplication in phytopathogenic species versus non-pathogenic relatives implicate gene inventory flux as an important virulence-associated process in fungi. We hypothesize that the observed patterns of gene duplicate enrichment, gene family expansion and contraction reflect adaptation within pathogenic life histories. These adaptations were likely shaped by ancient, as well as contemporary, intimate associations with monocot hosts.


Change in gene inventory in pathogenic genomes is an important evolutionary signal. Previous studies have documented the relationship between virulence and differential gene gain and/or loss in bacteria and viruses [18]. However, this phenomenon remains unexamined at a genomic scale in fungal pathogens.

Our exploration of patterns of differential gene gain and loss in pathogenic fungal genomes was prompted by two possibly related observations. First, gene counts in phytopathogenic euascomycete species and fungus-like plant parasites, such as species of Phytophthora, are often higher than those for the most closely related non-pathogenic genomes [917]. Second, some of the additional genes identified in these pathogens are predicted to have roles in secondary metabolism and managing encounters with hosts [10, 13, 1821]. For instance, polyketide synthetases and non-ribosomal peptide synthetases are essential for toxin production, while G protein-coupled receptors and cytochrome P450s are critical for host perception and quenching infection-related oxidative stress [10, 2226].

The differential expansion of a gene family by duplication in a particular species is termed lineage-specific gene family expansion (LSE) [22, 24, 27, 28]. Selection for virulence could induce LSE among particular gene families [10, 11, 13, 18, 22, 24, 29, 30], as well as contraction among other families [31, 32]. Size differences between the genomes of pathogenic and non-pathogenic species will depend on the relative rates of gene duplication, gene loss and horizontal transfer events. Two evolutionary trends that would result in larger genomes among pathogens are the consistent expansion of certain gene families [10, 11, 26], as well as pathogens' apparent affinity for gene acquisition through horizontal transfer [3335]. However, it is also known that the number of genes in the genomes of opportunistic human fungal pathogens [3640] appears to be reduced, as compared to non-pathogenic relatives, suggesting that gene loss may also be increased among some pathogens [41].

In the present study, we evaluated patterns of gene duplication in pathogens versus non-pathogens and in phylogenetically-informed paired species comparisons. We subsequently explored potential functional differences among duplicate genes in pathogens as compared to non-pathogens. In addition, we investigated trends of gene gain and loss in pathogenic fungal genomes.


Altered patterns of gene duplication among diverse fungi

Gene duplicates identified by GenomeHistory in the eleven sequenced fungal genomes were grouped into gene families via single linkage clustering procedures (Methods). Figure 1 and Table 1 give the number of gene families of size two or greater that met all minimum threshold criteria, as well as the total number genes in gene families for each species. U. maydis, a basally branching hemibiotrophic pathogenic basidiomycete lineage, had the fewest gene families and the least number of genes in gene families, while the opportunistic euascomycete pathogen A. flavus possessed the greatest number of gene families, and also the largest number of genes in gene families.

Figure 1

Number and size of gene families larger than two for eleven fungal genomes. Shown are the numbers of gene families with two or more members (red and blue bars) and the total numbers of genes in those gene families (black bars) across the sample of genomes studied here. Duplicate genes were identified by sequence similarity using GenomeHistory [82]. Duplicate genes were used to form homology-based single-linkage cluster gene families, using the graph-theoretic application GT Miner [89].

Table 1 Summary of life history attributes for the genomes studied

We initially predicted greater proportions of duplicated genes would become fixed in pathogenic lineages as a result of increased preservation of duplicated genes by natural selection and/or higher rates of duplication. In this coevolutionary arms race scenario, continually evolving host resistance would give rise to constant selective pressure for the preservation of duplications of genes relevant to virulence [20, 42, 43]. We thus compared the distributions of gene family sizes in pathogens versus non-pathogens. The distribution of gene family sizes in a genome is thought to follow a power law distribution [4447], and we therefore modeled the pooled set of pathogenic gene families as following this distribution. Similarly, we allowed the pooled set of non-pathogenic gene families to follow a power law distribution, with a potentially different value of the power law coefficient, a (which describes the frequency of gene families of each size; see Methods). We then applied a likelihood ratio test to examine the null hypothesis that the value of a was the same in pathogens as in non-pathogens.

Although we can reject the hypothesis of equal values of a (P < 10-8), surprisingly, in view of our initial prediction, we find that non-pathogenic fungi in fact have slightly larger gene families than do the pathogens (a = 4.14 verses a = 4.29 for pathogens). Interestingly, when the two basidiomycete species (P. chyrosporium and U. maydis) are excluded from the analysis, no overall significant differences in gene family size distribution are evident (P = 0.14). We also compared lineages that were primary pathogens (U. maydis, M. grisea, F. graminearum and S. nodorum) to their non-pathogenic relatives (P. chrysosporium, A. nidulans, N. crassa and T. reesei), finding again that the non-pathogens have larger gene families (a = 4.57 and 4.26, respectively, P < 10-17). Again, excluding the basidiomycete species results in no significant difference being found (P = 0.28). Finally, we examined only the Aspergillus species, comparing the opportunistic pathogens A. flavus and A. fumigatus to A. nidulans and A. oryzae. No significant differences in gene family sizes are evident in the comparison between opportunistic pathogens and their non-pathogenic relatives (P = 0.15).

The above approach is potentially flawed, because the species in question do not represent independent realizations of the same general stochastic process. Rather, the genomes are related by the phylogeny shown in Figure 2[48, 49]. To control for this common ancestry, we performed phylogenetically independent contrasts for gene family size distributions for the six species pairs indicated in Figure 2, applying the same likelihood ratio test described above. We find evidence for larger gene families in phytopathogenic species for two paired-species comparisons (N. crassa versus M. grisea and T. reesei versus F. graminearum; P ≤ 10-5), while the other comparisons either showed either the opposite trend (A. nidulans versus S. nodorum and P. chrysosporium versus U. maydis; P < 10-9) or no significant differences (A. flavus versus A. oryzae and A. fumigatus versus A. nidulans; P > 0.01). In all cases, we used a significance threshold of α = 0.008, which reflects application of a Bonferroni correction for 6 hypothesis tests. Note that a maximum likelihood fit of another potential distribution describing the probability of observing a gene family of size x in a genome, the exponential distribution, visually provides a rather poorer explanation of these data (see Additional File 1).

Figure 2

Independent phylogenetic contrasts for pathogens and their closest non-pathogenic relatives. Recent phylogenomics studies support relationships presented here [48, 49]. The distribution of gene family sizes in each genome is assumed to follow a power law, and the data fit to this distribution by maximum likelihood. Family size versus frequency data shown here are plotted on log-log-scales. Likelihood ratio tests were used to determine if pathogens (blue text) had larger gene families (blue shading), smaller gene families (red shading) or no significant difference in the distribution of gene family sizes distribution (grey shading), as compared to their closest non-pathogenic (red text) relative. P-values indicate the significance of these tests (with the null hypothesis that the power law coefficient, a, is the same for the pathogenic and the non-pathogenic species in each paired comparison). Values for the differences in the log likelihoods (i.e., 2ΔlnL) used to infer P-values are also given.

Functional distribution of gene duplicates in pathogens versus non-pathogens

To elucidate potential functional differences in duplicated genes in pathogenic versus non-pathogenic genomes, we compared the distribution of GO terms between the two groups. We initially selected twenty-two different GO terms (Figure 3) representing functional categories that are relevant to fungal pathogenesis, as well as others related to basal metabolic processes. We compared the proportions of gene duplicates associated with a GO term for pathogens and non-pathogens.

Figure 3

Functional distribution of gene duplicates in pathogenic and non-pathogenic fungal lineages. The distribution of gene duplicates across a sample of 22 Gene Ontology (GO) terms is compared for pathogenic (blue bars) and non-pathogenic (red bars) fungal lineages. When all eleven taxa were considered, we observed significantly higher proportions of gene duplicates associated with the terms "hydrolase activity" and "receptor activity" in pathogens (*); survey of the euascomycetes indicated that gene duplicates associated with the "hydrolase activity," "carbohydrate binding," "nucleic acid binding," "regulation of transcription," and "receptor activity" terms, respectively, were enriched in pathogenic species (#).

When all eleven taxa were considered, two GO terms differed significantly in the number of gene duplicates observed for pathogens versus non-pathogens, after allowing for a 20% false discovery rate (FDR, see Methods). The "receptor activity" and "hydrolase activity" terms showed significantly greater numbers of gene duplicates in pathogenic species than in non-pathogenic lineages. When the same analysis was repeated for the nine euascomycete genomes, we identified three additional functional categories where pathogenic species had a greater than expected number of gene duplicates: "nucleic acid binding," "carbohydrate binding" and "regulation of transcription."

We also compared the number of gene duplicates associated with terms in the Generic GO Slim Ontology for pathogens and non-pathogens. This survey revealed no significant distinctions among pathogens versus non-pathogens for all taxa. When only euascomycete species were considered, we found that gene duplicates associated with following six terms were over represented among pathogens, again controlling for an FDR of 20%: "hydrolase activity," "extracellular region," "carbohydrate metabolism," "nucleobase, nucleoside, nucleotide and nucleic acid metabolism," "carbohydrate binding" and "catalytic activity" (Table 2).

Table 2 GO terms that are over represented in euascomycete pathogens

Functional distribution of gene duplicates in species pairs

For the four species pairs that showed global differences in the magnitude of gene duplication (Figure 2), we surveyed the functional distribution of gene duplicates using the Generic GO Slim Ontology.

The rice blast fungus, M. grisea, showed higher average gene family size than its non-pathogenic relative N. crassa (Figure 2). Correspondingly, we found four GO terms that are overrepresented in pathogenic M. grisea, as compared to exclusively saprophytic N. crassa (Table 3). Further, while pathogenic F. graminearum had larger gene families than did non-pathogenic T. reesei (Figure 2), we found five overrepresented GO terms in F. graminearum and an equal number of overrepresented terms in T. reesei (Tables 3 and 4). When we compared the species pairs where the non-pathogenic taxon possessed more gene duplicates globally, (A. nidulans versus S. nodorum and P. chrysosporium versus U. maydis), we found significantly more gene duplicates associated with a total of eighteen particular GO terms in the A. nidulans-S. nodorum test pair, and fourteen terms in P. chrysosporium (versus U. maydis), respectively (Figure 2; Table 3). Significant differences in paired species comparisons were determined after applying corrections for multiple tests, as above.

Table 3 GO terms that are over represented in one member of a species pair
Table 4 GO terms that are over represented or under represented in pathogenic F. graminearum versus non-pathogenic T. reesei

Expansion or contraction of gene families in pathogens

To determine whether a given gene family showed significant expansion or contraction, we employed a binomial test (see Methods). Sixteen gene families show significant expansion in pathogens, while five gene families appear to be contracted. Functional identities, as well as gene family sizes are presented for all significantly expanded or contracted gene families in Additional files 2 and 3.

Some significantly expanded gene families of particular interest are those with predicted hydrolytic, transporter or oxidioreductase activities, as well as those involved with carbohydrate metabolism (see Additional file 2). Examples of expanded gene families in the hydrolytic functional class included those with predicted chitin deactylase, cutinase, amino peptidase and feruloyl esterase activities, among others. Specific examples of expanded gene families that belong to the oxidoreductase functional class included those with predicted galactose oxidase and tyrosinase activities. The transporter functional class of expanded gene families included instances of sugar porters, malic acid transporters, neutral amino acid permeases and L-fucose permeases, among others.

For gene families that were deemed significantly contracted, the main functional categories, as indicated by GO and GenBank annotations, also include hydrolases, oxidases and transporters. However, the specific biological roles of particular gene families that showed contraction in pathogens differ from those that showed significant expansion (see Additional files 2 and 3). For instance, in the oxidoreductase functional class, homologues of ordA (O-methylsterigmatocystin reductase) appear to be depleted in pathogens. In aflatoxin-producing species of Aspergillus, this gene catalyzes the last reaction in the biosynthesis of this secondary metabolite [31, 50, 51]. Other interesting examples include gene families that were predicted to be components of the vaculolar ATPase complex, in addition to those encoding glucose oxidase precursors (see Additional file 3).

Expanded gene families in species that have more duplicated genes

For the four cases where phylogenetic contrast analyses indicated significant differences in gene family sizes in paired species comparisons (Figure 2), gene family expansion was also examined using an approach similar to that above (see Methods).

Five gene families are significantly expanded in M. grisea, relative to non-pathogenic N. crassa. These gene familiesinclude those predicted to have oxidoreductase, transporter, peroxidase and melanin biosynthesis functions (see Additional file 3). Four gene families are significantly expanded in F. graminearum versus T. reesei, including those predicted to have transporter, endoglucanase and methyl transferase activities (see Additional file 4). Interestingly, the significantly expanded gene family with methyl transferase functional attributes has substantial homology to LaeA, a gene that may play a global regulatory role in secondary metabolism in Aspergillus species [51, 52]. There are two significantly expanded gene families in A. nidulans, relative to S. nodorum (see Additional file 4). One family is predicted to have phosphorylase activity, while the other encodes genes with ATP-binding cassette (ABC) transporter function (see Additional file 4). P. chrysosporium has eight significantly expanded gene families, relative to pathogenic U. maydis. These eight families include genes with predicted oxidoreductase and peptidase functions, as well as genes with roles in carbohydrate metabolism (see Additional file 4).


Change in gene inventory and fungal genome evolution

Differential gene gain and loss clearly play definitive roles in both degree of virulence, and in determining host range [2, 58, 34, 53, 54]. Although there are numerous examples of gene family expansion in pathogenic lineages of fungi [10, 22, 24, 26, 29, 31, 38, 53], to date, gene duplication trends across whole genomes have not been analyzed.

Our analysis sought to understand whether the sum of gene family expansions have given rise to an overall increase in numbers of gene duplicates in the genomes of pathogenic fungi. We observed no such trend when all pathogenic species were compared to the non-pathogens, nor when we performed phylogenetically-informed paired species comparisons. Instead, the paired-species comparisons showed no clear association between gene family size distributions and pathogenicity. An example of this lack of association is apparent in the Aspergillus species comparisons, where we found no significant differences in gene family size distributions, a result that derives from both the opportunistic nature of the pathogens, as well as the close phylogenetic relationships of the species in question [16, 17, 37, 40]. Another potential explanation for similarity among the complement of gene duplicates in genomes of the Aspergilli is the recent suggestion that host animals, and humans in particular, do not generate the sort of co-evolutionary arms-race characteristic of the plant hosts of the other pathogenic species considered [55].

A confounding factor for our analyses is that gene duplication's ability to drive biological innovation is diminished in some fungi by the process of repeat-induced point mutation [9, 5659]. RIP is a pre-meiotic homology-based mechanism that introduces characteristic point mutations into sequences present in multiple copies in a genome, and may have evolved to limit the deleterious effects of mobile genetic elements [56]. All euascomycete genomes surveyed here likely possess some degree of RIP [912, 15, 36, 5658]. However, the severity and efficiency of the process varies, with N. crassa possibly possessing the most stringent form [56]. We note that N. crassa shows the smallest average gene family size among the species studied, as would be expected, given RIP's severe pruning of duplicates in this organism. There is no evidence for RIP in either of the basidiomycete genomes examined in this study, although it has recently been documented in anther smut [18, 60, 61]. Some of the differences illustrated in Figure 2 (particularly the differences between M. grisea and N. crassa) are possibly due to variation in RIP stringency rather than pathogenicity.

Although quite variable, the approximate divergence times among the six pathogenic and five non-pathogenic species examined are reasonable for large-scale comparative studies of gene duplication in euascomycete and basidiomycete lineages [36, 62, 63].

Functional patterns of duplicate gene enrichment in fungal genomes

Enrichment of gene duplicates over particular functional categories in the set of twenty-two GO terms, as well as those in the Generic GO Slim Ontology appear consistent with requirements of pathogenic lifestyles [26, 6467]. These results parallel those of previous studies, which have demonstrated that fungal pathogenesis is associated with increased catalytic potential among enzymes such as hydrolases, relatively larger repertoires of receptors, and the expansion of secreted proteins, as well as carbohydrate recognition and binding gene families [10, 11, 18, 25, 26, 29, 67].

For example, we found an increased number of duplicates associated with transport functions in F. graminearum, the causal agent of head blight in cereals, as compared to non-pathogenic T. reesei (Tables 3 and 4), a result consistent with the known relative deficit of carbohydrate catalysis genes in T. reesei [12].

Some of the differences observed between non-pathogenic P. chrysosporium and the hemibiotrophic corn smut fungus U. maydis appear relevant to the markedly divergent ecology and development these basidiomycete species [18, 60]. The over-represented terms "carbohydrate binding," "extracellular region," "response to abiotic stimulus," "multicellular organismal development," "peptidase activity," as well as "amino acid and derivative metabolic process" are associated with the suite of traits that make the white rot fungus, P. chrysosporium, amenable to industrial applications, such as lignocellulose and organopollutant degradation (Table 3). Moreover, the observed enrichment of gene duplicates dedicated to particular developmental programs, such as muticellularity, is also consistent with the differences in morphological complexity between these two organisms in their fruiting body structures; U. maydis produces no fruiting body, per se, only teliospore-filled tumors on a host, whereas P. chrysosporium possesses a resupinate fleshy fruiting body. That U. maydis showed no enrichment for gene duplicates for any GO term was consistent with recent analyses of this genome, which indicated relatively few duplicates [18].

When phylogenetic relationships are not considered, a cohort of GO terms that is common to pathogens and non-pathogens in all four paired-species comparisons becomes evident (see Table 3). These three terms are: electron transport, generation of precursor metabolites and energy, as well as catalytic activity. Biological explanations for this observation include two possibilities that are not mutually exclusive: first, that genes associated with these three terms are not peculiar to pathogenesis, and/or secondly, that genes associated with these terms possess inherently greater plasticity in copy number in fungal genomes. Given numerous accounts of duplication in pathogenic and non-pathogenic fungi, copy number plasticity in certain functional categories would seem the more plausible explanation.

Our results, both from comparisons across all taxa, as well as paired species comparisons, suggest an association between organisms' lytic potential ("hydrolytic activity") and receptor ("receptor activity") repertoires and pathogenesis (Figure 3; Tables 2 and 3). Moreover, even when a pathogen had fewer gene duplicates overall (A. nidulans versus S. nodorum), enrichment of terms associated with lytic activities ("peptidase activity," "catabolic process"), management of oxygen toxicity ("antioxidant activity"), as well as terms possibly relevant to protein secretion ("extracellular region") were still evident in the pathogenic lineage (Table 3).

Gene family expansion or contraction in pathogens

Gene families that were expanded in pathogens largely fall into two functional categories: those encoding lytic enzymes and those encoding putative transporters (see Additional file 2). Lytic enzymes, such as feruloyl esterases, cutinases, aminopeptidases and endoglucanases are known to be significant in successful plant parasitism interactions, as such genes have demonstrated roles in plant cell wall decomposition [26, 30, 64]. Serine proteinases and the regulatory P domain of the subtilisin-like proprotein convertases have been implicated in mutualistic, as well as pathogenic, interactions with grasses [26, 68, 69]. Moreover, the importance of fungal chitin deactylases in entomopathology has been empirically demonstrated [70]. Expanded families that were predicted to have diverse transporter functions included genes that encode L-fucose and neutral amino acid permeases, as well as MFS, malic acid and sugar transporters, respectively (see Additional file 2). L-fucose permeases have not only been implicated in galactose transport and sphigolipid metabolism, but also in development of resistant sclerotia in A. flavus, while neutral amino acid permeases and MFS transporters are possibly involved in efflux of non-ribosomal peptides and other secondary metabolites required for virulence [71, 72]. Permeases may also be critical for the initial assault upon a host, as well as assimilation of host-derived carbohydrates [73, 74]. The remaining expanded families included those encoding tyrosinases, galatose oxidase precursors and transmembrane receptors, all of which have been implicated in pathogenic life styles [26, 66, 75]. Tyrosinases catalyze initial steps in melanin biosynthesis [76]. Melanization in fungi is frequently, if not always, essential for virulence [26, 65, 66]. It is notable that another gene in the melanin synthesis pathway was also found to be over duplicated in M. grisea, relative to non-pathogenic N. crassa [77]. Galactose oxidation yields peroxide [78], a reactive oxygen species known to be produced by fungal pathogens during infection [19, 79]. Recent work in M. grisea described a phalanx of receptors that are linked to pathogenesis and host perception [29].

The five families that showed gene losses in pathogens included those with predicted roles in secondary metabolism (see Additional file 3). One intriguing case is that of O-methylsterigmatocystin oxidoreductase, which catalyzes the last step in aflatoxin biosynthesis. While only a few species of Aspergillus are known to produce aflatoxin, numerous euascomycete and basidiomycete lineages possess homologues of this gene, as well as others in the aflatoxin biosynthesis pathway [32, 51]. The wide phylogenetic distribution of these genes suggests not only that that these genes are ancient, but also that they are possibly important in secondary metabolism. We speculate that contraction of this family in pathogenic lineages may be the result of random gene deletions after relaxation of selective pressure that maintains production of this toxin [31, 32, 80, 81]. Recent genomic and population studies that focused on the aflatoxin biosynthesis cluster in species of Aspergillus would seem to support this conjecture, as ordA, an O-methylsterigmatocystin oxidoreductase, appears to have undergone more extensive loss following duplication than any other gene in the cluster [22, 31].

Gene family expansion in species pairs

Gene families that were expanded in two paired-species comparisons also had either expected or previously demonstrated roles in virulence, host-pathogen interaction or chemical defense [22, 26, 66, 67]. For example, homologues of laeA, a methyltransferase, are over duplicated in F. graminearum. This gene appears to be a global regulator of secondary metabolism in species of Aspergillus [51, 52]. It is interesting to speculate whether an expanded repertoire of upstream regulators of secondary metabolism in F. graminearum could reflect either greater numbers of target genes or refined regulatory control of secondary metabolic networks in this primary pathogen (see Additional file 4). In the second instance where a phytopathogen had overall greater levels of gene duplication, five gene families were expanded. Gene families encoding diverse cytochrome P450s, MFS transporters, as well as those predicted to have trihydroxynaphthalene reductase activity were expanded in M. grisea, as compared to exclusively saprophytic N. crassa (Figure 2; see Additional file 4).

The white rot fungus P. chrysosporium had eight gene families that were significantly expanded, relative to basal basidiomycete U. maydis (see Additional file 4). These eight expanded families encode genes that have diverse oxidoreductase activities, protease and endo-1, 3 (4)-beta-glucanase functions, consistent with genes previously reported to be present in high copy numbers in this genome [60]. A large complement of such genes in this organism's genome is likely a reflection of its saprotropic wood-decay ecology.


Here, we present results on patterns of gene duplication and differential gene gain and loss in pathogenic fungal genomes. The scope of this study captures at least of one billion years of fungal evolution [62]. No general relationship between pathogenicity and the magnitude of gene duplication was evident, but differences in duplicate gene retention among certain functional classes were consistent with known and predicted requirements of virulent lifestyles [26, 66, 67, 75]. Gene family expansion in species with overall higher levels of duplication also showed functional trends that could be reconciled with organismal life histories [912, 15, 18, 36, 37, 55, 60, 66, 75]. The observed differences in overall levels of duplication between phytopathogenic lineages and their non-pathogenic relatives implicate gene inventory flux as an important virulence-associated process in fungi.


Identifying gene duplicates within and among fully sequenced genomes

Gene duplicates in nine euascomycte genomes, and two basidiomycete genomes (Table 1) were identified using a customized version of GenomeHistory (GH), which was parallelized for implementation in a high performance computing environment [82]. Each of these genomes is publicly available [8386]. Our GH analyses required that candidate duplicate genes have at least 40% amino acid identity and a BLAST-based significance threshold of E ≤ 10-8.

Forming sequence homology-based single linkage clustering gene families

Candidate gene duplicates obtained in the above GH analyses that had BLAST hits with query and subject coverage of at least 40%, a minimum percent identity of 25% and significance threshold of E ≤ -3 to Repbase V11.05 sequences were excluded from further analysis [87, 88]. Homology-based single linkage clustering (SLC) gene families were formed using standard graph theoretic approaches and network graphing algorithms within the GT Miner software package [8991]. Our approach required that a gene family have at least two members. Candidate gene duplicate pairs obtained in the above GH analyses were also filtered using minimum coverage thresholds, where query and subject sequences were required to show at least 70% coverage, prior to forming single-linkage clustering gene families. Single linkage gene families were computed across all eleven genomes (Figure 1).

Distribution of gene family sizes in pathogenic species verses non-pathogenic species

To assess the relative likelihood of observing gene families of a given size in pathogenic and non-pathogenic species, we compared the distribution of gene family sizes across eleven genomes. This comparison assumes that the probability of observing a gene family with n members is given by:

P = n a i = 1 i a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaeyypa0tcfa4aaSaaaeaacqWGUbGBdaahaaqabeaacqGHsislcqWGHbqyaaaabaWaaabCaeaacqWGPbqAdaahaaqabeaacqGHsislcqWGHbqyaaaabaGaemyAaKMaeyypa0JaeGymaedabaGaeyOhIukacqGHris5aaaaaaa@3D50@

For this distribution, the value of a can be thought of as the "rate of decay" of gene family sizes: large values of a indicate proportionally more small gene families. Thus, we can thus compare values of a between genomes to test for genome-scale differences in duplication propensities. Given the observed distribution of family sizes in each genome, we used numerical optimization to find the value of a that gives the maximum likelihood of observing these gene family sizes [92]. We first compared the overall distribution of gene families in pathogens to that in the non-pathogens. We separately fit these two sets of gene families to the above distribution by maximum likelihood and retained the ln-likelihood of each dataset (lnL_P and lnL_NP). We then created a pooled dataset containing all gene families from both the pathogens and non-pathogens and calculated the likelihood of that pooled dataset (lnL_A). This pooled analysis is a special case of the previous analyses, where the pathogens and non-pathogens are required to have the same value of a (i.e., required to have the same gene family size distributions). To test the hypothesis of equal values of a between the datasets, we compared twice the difference in log-likelihood between the pooled model and the sum of the other two likelihoods (i.e., 2Δln L = 2((ln L_P + ln L_NP) - ln L_A)). Under the null hypothesis of no difference between the pathogens and non-pathogens, 2ΔlnL will follow a chi-square distribution, with one degree of freedom [93]. Cases where the P-values from these tests were small (<0.05) indicated significant differences in duplication propensities in the genomes being compared. We employed an identical procedure for testing for significant differences among individual pairs of genomes.

GeneOntology-based functional annotation

Gene Ontology (GO) terms were associated with genes in gene families using the 2006 version of the GO Consortium's database [94, 95]. All genes in gene families were queried against proteins extracted from the GO database using BLAST, with cutoff values for significance at E ≤ 10-8, query and subject coverage of at least 40% and percent amino acid identity of at least 25%. For each such hit, all GO terms associated with the database sequence were transferred to a given fungal gene. In sum, of 125, 902 genes that were queried against the GO database, a total of 75, 511 were associated with GO terms.

Functional distribution of gene duplicates

We compared the number of gene duplicates across twenty-two higher-level GO terms. For each GO term, we performed a chi squared test of the null hypothesis that the proportion of gene duplicates associated with that term was the same for both pathogens and non-pathogens [82, 96]. We also compared the number of gene duplicates associated with individual GO terms in the Generic GO Slim ontology [97] in pathogenic versus non-pathogenic genomes. Here, as in the above, we tested whether the proportions of genes duplicates associated with a particular GO Slim term in pathogens versus non-pathogens were similar. To correct for the multiple hypothesis tests problems inherent to this approach, we judged results to be significant allowing a maximum false discovery rate (FDR) of 20% [96, 98]. These two analyses were performed on all eleven genomes, on the nine euascomycete genomes and on the four pairs of genomes that showed differences in the overall distributions of gene duplicates (see above and Figure 2).

Characterizing expansion and contraction of gene families

In the 500 largest gene families in our analyses, we tested for significant expansion or contraction in pathogenic genomes using a binomial test. Thus, for the global families, we calculated the proportion p (= 0.528) of all genes across all genomes in gene families of size two or greater that were observed in pathogenic species. This value was then used as the binomial parameter in the above test.

We also searched for significantly expanded families in the four instances where differences in gene family size distributions for pairs of genomes were evident (described above). To test for significance in these cases, a binomial test was again employed. For a given gene family, we compared the observed gene count for the lineage showing an overall greater magnitude of gene duplication (Figure 2) to the number genes contributed by the related species. In this context, the binomial parameter was p = 0.5. No attempt to account for differing genomes sizes was made in this analysis because the phylogenetic control makes it clear that any differences in genome size necessarily arose after the species pair in question shared a common ancestor.

GenBank annotation for expanded and contracted gene families

We also performed BLAST searches for all expanded and contracted gene families against the GenBank non-redundant (nr) database [99]. Only sequences with BLAST E-values less than 10-8, which had at least 25% identity between subject and query, where the local alignment spanned at least 40% of the query sequence, and that were not annotated as "hypothetical protein" or "predicted protein" were retained.


Lineage-specific gene family expansion (LSE):

Gene Ontology (GO), Repeat-induced point mutation (RIP), ATP-binding cassette (ABC), Major Facilitator Superfamily (MFS), GenomeHistory (GH), Single linkage clustering (SLC), Non-redundant (nr)


  1. 1.

    Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, et al: Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli . Proceedings of the National Academy of Sciences of the United States of America. 2002, 99 (26): 17020-17024. 10.1073/pnas.252529799.

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Holden MTG, Feil EJ, Lindsay JA, Peacock SJ, Day NPJ, Enright MC, Foster TJ, Moore CE, Hurst L, Atkin R, et al: Complete genomes of two clinical Staphylococcus aureus strains: Evidence for the rapid evolution of virulence and drug resistance. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101 (26): 9786-9791. 10.1073/pnas.0402521101.

    PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Hacker J, Carniel E: Ecological fitness, genomic islands and bacterial pathogenicity – A Darwinian view of the evolution of microbes. EMBO reports. 2001, 2 (5): 376-381.

    PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Umayam L, et al: DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae . Nature. 2000, 406 (6795): 477-483. 10.1038/35020000.

    PubMed  Article  Google Scholar 

  5. 5.

    McLysaght A, Baldi PF, Gaut BS: Extensive gene gain associated with adaptive evolution of poxviruses. Proceedings of the National Academy of Sciences of the United States of America. 2003, 100 (26): 15655-15660. 10.1073/pnas.2136653100.

    PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Hughes AL, Friedman R: Poxvirus genome evolution by gene gain and loss. Molecular Phylogenetics and Evolution. 2005, 35 (1): 186-195. 10.1016/j.ympev.2004.12.008.

    PubMed  Article  Google Scholar 

  7. 7.

    Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MTG, Prentice MB, Sebaihia M, James KD, Churcher C, Mungall KL, et al: Genome sequence of Yersinia pestis, the causative agent of plague. Nature. 2001, 413 (6855): 523-527. 10.1038/35097083.

    PubMed  Article  Google Scholar 

  8. 8.

    Lawrence JG: Common themes in the genome strategies of pathogens. Current Opinion in Genetics and Development. 2005, 15: 584-588. 10.1016/j.gde.2005.09.007.

    PubMed  Article  Google Scholar 

  9. 9.

    Hane JK, et al: Dothideomycete-plant interactions illuminated by genome sequencing and EST analysis of the wheat pathogen Stagonospora nodorum . Personal communication. 2007

    Google Scholar 

  10. 10.

    Dean RA, Talbot NJ, Ebbole DJ, Farman ML, Mitchell TK, Orbach MJ, Thon M, Kulkarni R, Xu JR, Pan HQ, et al: The genome sequence of the rice blast fungus Magnaporthe grisea . Nature. 2005, 434 (7036): 980-986. 10.1038/nature03449.

    PubMed  Article  Google Scholar 

  11. 11.

    Cuomo CA, Gueldener U, Xu JR, Trail F, Turgeon BG, Di Pietro A, Walton JD, Ma LJ, Baker SE, Rep M, et al: The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization. Science. 2007, 317 (5843): 1400-1402. 10.1126/science.1143708.

    PubMed  Article  Google Scholar 

  12. 12.

    Martinez D, et al: Genome sequence analysis of the cellulolytic fungus Trichoderma reesei (syn. Hypocrea jecorina) reveals a surprisingly limited inventory of carbohydrate active enzymes. Personal communication. 2007

    Google Scholar 

  13. 13.

    Tyler BM, Tripathy S, Zhang XM, Dehal P, Jiang RHY, Aerts A, Arredondo FD, Baxter L, Bensasson D, Beynon JL, et al: Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science. 2006, 313 (5791): 1261-1266. 10.1126/science.1128796.

    PubMed  Article  Google Scholar 

  14. 14.

    Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou SG, Allen AE, Apt KE, Bechner M, et al: The genome of the diatom Thalassiosira pseudonana : Ecology, evolution, and metabolism. Science. 2004, 306 (5693): 79-86. 10.1126/science.1101156.

    PubMed  Article  Google Scholar 

  15. 15.

    Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, et al: The genome sequence of the filamentous fungus Neurospora crassa . Nature. 2003, 422 (6934): 859-868. 10.1038/nature01554.

    PubMed  Article  Google Scholar 

  16. 16.

    Machida M, Asai K, Sano M, Tanaka T, Kumagai T, Terai G, Kusumoto KI, Arima T, Akita O, Kashiwagi Y, et al: Genome sequencing and analysis of Aspergillus oryzae . Nature. 2005, 438 (7071): 1157-1161. 10.1038/nature04300.

    PubMed  Article  Google Scholar 

  17. 17.

    Payne GA, Nierman WC, Wortman JR, Pritchard BL, Brown D, Dean RA, Bhatnagar D, Cleveland TE, Machida M, Yu J: Whole genome comparison of Aspergillus flavus and A. oryzae. Medical Mycology. 2006, 44: S9-S11. 10.1080/13693780600835716.

    Article  Google Scholar 

  18. 18.

    Kamper J, Kahmann R, Bolker M, Ma LJ, Brefort T, Saville BJ, Banuett F, Kronstad JW, Gold SE, Muller O, et al: Insights from the genome of the biotrophic fungal plant pathogen Ustilago maydis . Nature. 2006, 444 (7115): 97-101. 10.1038/nature05248.

    PubMed  Article  Google Scholar 

  19. 19.

    Egan MJ, Wang ZY, Jones MA, Smirnoff N, Talbot NJ: Generation of reactive oxygen species by fungal NADPH oxidases is required for the rice blast disease. PNAS. 2007, 104 (28): 11772-11777. 10.1073/pnas.0700574104.

    PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Tosa Y, Osue J, Eto Y, Oh HS, Nakayashiki H, Mayama S, Leong SA: Evolution of an avirulence gene AVR1-CO39, concomitant with the evolution and differentiation of Magnaporthe oryzae . Molecular Plant-Microbe Interactions. 2005, 18 (11): 1148-1160. 10.1094/MPMI-18-1148.

    PubMed  Article  Google Scholar 

  21. 21.

    Win J, Morgan W, Bos J, Krasileva KV, Cano LM, Chaparro-Garcia A, Ammar R, Staskawicz BJ, Kamoun S: Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic Oomycetes . The Plant Cell. 2007

    Google Scholar 

  22. 22.

    Deng JX, Carbone I, Dean RA: The evolutionary history of Cytochrome P450 genes in four filamentous Ascomycetes. BMC Evolutionary Biology. 2007, 7: 30-10.1186/1471-2148-7-30.

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Keon J, Antoniw J, Carzaniga R, Deller S, Ward JL, Baker JM, Beale MH, Hammond-Kosack K, Rudd JJ: Transcriptional adaptation of Mycosphaerella graminicola to programmed cell death (PCD) of its susceptible host. Molecular Plant-Microbe Interactions. 2007, 20 (2): 178-193. 10.1094/MPMI-20-2-0178.

    PubMed  Article  Google Scholar 

  24. 24.

    Kroken S, Glass NL, Taylor JW, Yoder OC, Turgeon GB: Phylogenomic analysis of type I polyketide synthase genes in pathogenic and saprobic ascyomycetes. PNAS. 2003, 100 (26): 15670-15675. 10.1073/pnas.2532165100.

    PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Braun BR, Hoog MV, d'Enfert C, Martchenko M, Dungan J, Kuo A, Inglis DO, Uhl MA, Hogues H, Berriman M, et al: A human-curated annotation of the Candida albicans genome. PLOS Genetics. 2005, 1 (1): 36-57. 10.1371/journal.pgen.0010001.

    PubMed  Article  Google Scholar 

  26. 26.

    Xu JR, Peng YL, Dickman MB, Sharon A: The dawn of fungal pathogen genomics. Annual Review of Phytopathology. 2006, 44: 337-366. 10.1146/annurev.phyto.44.070505.143412.

    PubMed  Article  Google Scholar 

  27. 27.

    Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV: Lineage-specific gene expansions in Bacterial and Archaeal genomes. Genome Research. 2001, 11: 555-565. 10.1101/gr.GR-1660R.

    PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Lespinet O, Wolf YI, Koonin EV, Aravind L: The role of lineage-specific gene family expansion in the evolution of Eukaryotes. Genome Research. 2002, 12: 1048-1059. 10.1101/gr.174302.

    PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Kulkarni RD, Thon MR, Pan HQ, Dean RA: Novel G-protein-coupled receptor-like proteins in the plant pathogenic fungus Magnaporthe grisea . Genome Biology. 2005, 6 (3):

  30. 30.

    Skamnioti P, Gurr SJ: Magnaporthe grisea Cutinase2 Mediates Appressorium Differentiation and Host Penetration and Is Required for Full Virulence. The Plant Cell. 2007

    Google Scholar 

  31. 31.

    Carbone I, Jakobek JL, Ramirez-Prado JH, Horn BW: Recombination, balancing selection and adaptive evolution in the aflatoxin gene cluster of Aspergillus parasiticus . Molecular Ecology. 2007, 16: 4401-4417. 10.1111/j.1365-294X.2007.03464.x.

    PubMed  Article  Google Scholar 

  32. 32.

    Carbone I, Ramirez-Prado JH, Jakobek JL, Horn BW: Gene duplication, modularity and adaptation in the evolution of the aflatoxin gene cluster. BMC Evolutionary Biology. 2007, 7: 111-10.1186/1471-2148-7-111.

    PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Hotopp JCD, et al: Widespread lateral gene transfer from intracellular Bacteria to multicellular Eukaryotes. Science. 2007, 317: 1753-1756. 10.1126/science.1142490.

    Article  Google Scholar 

  34. 34.

    Friesen TL, Stuckenbrock EH, Liu Z, Meinhardt S, Ling H, Faris JD, Rasmussen JB, Solomon PS, McDonald BA, Oliver RP: Emergence of a new disease as a result of interspecific virulence gene transfer. Nature Genetics. 2006, 38 (8): 953-956. 10.1038/ng1839.

    PubMed  Article  Google Scholar 

  35. 35.

    Paoletti M, Buck KW, Braiser CM: Selective acquisition of novel mating type and vegetative incompatibility genes via interspecies gene transfer in the globally invading eukaryote Ophiostoma novo-ulmi . Molecular Ecology. 2006, 15 (1): 249-262. 10.1111/j.1365-294X.2005.02728.x.

    PubMed  Article  Google Scholar 

  36. 36.

    Galagan JE, Henn MR, Ma LJ, Cuomo CA, Birren B: Genomics of the fungal kingdom: Insights into eukaryotic biology. Genome Research. 2005, 15 (12): 1620-1631. 10.1101/gr.3767105.

    PubMed  Article  Google Scholar 

  37. 37.

    Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, Batzoglou S, Lee SI, Basturkmen M, Spevak CC, Clutterbuck J, et al: Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 2005, 438 (7071): 1105-1115. 10.1038/nature04341.

    PubMed  Article  Google Scholar 

  38. 38.

    Loftus BJ, Fung E, Roncaglia P, Rowley D, Amedeo P, Bruno D, Vamathevan J, Miranda M, Anderson IJ, Fraser JA, et al: The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans . Science. 2005, 307 (5713): 1321-1324. 10.1126/science.1103773.

    PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier G, Barbe V, Peyretaillade E, Brottier P, Wincker P, et al: Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi . Nature. 2001, 414 (6862): 450-453. 10.1038/35106579.

    PubMed  Article  Google Scholar 

  40. 40.

    Nierman WC, Pain A, Anderson MJ, Wortman JR, Kim HS, Arroyo J, Berriman M, Abe K, Archer DB, Bermejo C, et al: Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus . Nature. 2006, 439 (7075): 502-502. 10.1038/nature04572.

    Article  Google Scholar 

  41. 41.

    Sachs JL, Bull JJ: Experimental evolution of conflict mediation between genomes. PNAS. 2005, 102 (2): 390-395. 10.1073/pnas.0405738102.

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    McDowell JM, Simon SA: Recent insights into R gene evolution. Molecular Plant Pathology. 2006, 7 (5): 437-448. 10.1111/j.1364-3703.2006.00342.x.

    PubMed  Article  Google Scholar 

  43. 43.

    Clay K, Kover PX: The Red Queen hypothesis and plant/pathogen interactions. Annu Rev Phytopathol. 1996, 34: 29-50. 10.1146/annurev.phyto.34.1.29.

    PubMed  Article  Google Scholar 

  44. 44.

    Huynen MA, van Nimwegen E: The frequency distribution of gene family sizes in complete genomes. Mol Bio Evol. 1998, 15 (5): 583-589.

    Article  Google Scholar 

  45. 45.

    Reed WJ, Hughes BD: A model explaining the size distribution of gene and protein families. Mathematical Biosciences. 2004, 189 (1): 97-102. 10.1016/j.mbs.2003.11.002.

    PubMed  Article  Google Scholar 

  46. 46.

    Koonin EV, Wolf YI, Karev GP: The structure of the protein universe and genome evolution. Nature. 2002, 420 (6912): 218-223. 10.1038/nature01256.

    PubMed  Article  Google Scholar 

  47. 47.

    Koonin EV, Wolf YI, Karev GP: Power laws, scale-free networks and genome biology. 2006, Georgetown, Tex. & New York, N.Y.: Landes Bioscience/; Springer Science+Business Media

    Chapter  Google Scholar 

  48. 48.

    Fitzpatrick DA, Logue ME, Stajich JE, Butler G: A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evolutionary Biology. 2006, 6:

    Google Scholar 

  49. 49.

    James TY, Kauff F, Schoch CL, Matheny PB, Hofstetter V, Cox CJ, Celio G, Gueidan C, Fraker E, Miadlikowska J, et al: Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature. 2006, 443 (7113): 818-822. 10.1038/nature05110.

    PubMed  Article  Google Scholar 

  50. 50.

    Sidhu GS: Mycotoxin genetics and gene clusters. European Journal of Plant Pathology. 2002, 108: 705-711. 10.1023/A:1020613413483.

    Article  Google Scholar 

  51. 51.

    Keller NP, Turner G, Bennett JW: Fungal secondary metabolism-from biochemistry to genomics. Nature Reviews: Microbiology. 2005, 3: 937-947. 10.1038/nrmicro1286.

    PubMed  Google Scholar 

  52. 52.

    Bok JW, Keller NP: LaeA, a regulator of secondary metabolism in Aspergillus species. Eukaryotic Cell. 2004, 3 (2): 527-535. 10.1128/EC.3.2.527-535.2004.

    PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Costanzo S, Ospina-Giraldo MD, Deahl KL, Baker CJ, Jones RW: Gene duplication event in family 12 glycosyl hydrolase from Phytophthora spp. Fungal Genetics and Biology. 2006, 43: 707-714. 10.1016/j.fgb.2006.04.006.

    PubMed  Article  Google Scholar 

  54. 54.

    Wolfe KH: Comparative genomics and genome evolution in yeasts. Philosophical Transactions of The Royal Society. 2006, 361 (1467): 403-412. 10.1098/rstb.2005.1799.

    Article  Google Scholar 

  55. 55.

    Berbee ML: The phylogeny of plant and animal pathogens in the Ascomycota. Physiological and Molecular Plant Pathology. 2001, 59: 165-187. 10.1006/pmpp.2001.0355.

    Article  Google Scholar 

  56. 56.

    Galagan JE, Selker EU: RIP: the evolutionary cost of genome defense. Trends in Genetics. 2004, 20 (9): 417-423. 10.1016/j.tig.2004.07.007.

    PubMed  Article  Google Scholar 

  57. 57.

    Ikeda K, Nakayashiki H, Kataoka T, Tamba H, Hashimoto Y, Tosa Y, Mayama S: Repeat-induced point mutation (RIP) in Magnaporthe grisea : Implications for its sexual cycle in the natural field context. Molecular Microbiology. 2002, 45 (5): 1355-1364. 10.1046/j.1365-2958.2002.03101.x.

    PubMed  Article  Google Scholar 

  58. 58.

    Montiel MD, Lee HA, Archer DB: Evidence of RIP (repeat-induced point mutation) in transposase sequences of Aspergillus oryzae . Fungal Genetics and Biology. 2006, 43: 439-445. 10.1016/j.fgb.2006.01.011.

    PubMed  Article  Google Scholar 

  59. 59.

    Faugeron G: Diversity of homology-dependent gene silencing strategies in fungi. Current Opinion in Microbiology. 2000, 3 (2): 144-148. 10.1016/S1369-5274(00)00066-7.

    PubMed  Article  Google Scholar 

  60. 60.

    Martinez D, Larrondo LF, Putnam N, Gelpke MDS, Huang K, Chapman J, Helfenbein KG, Ramaiya P, Detter JC, Larimer F, et al: Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nature Biotechnology. 2004, 22 (6): 695-700. 10.1038/nbt967.

    PubMed  Article  Google Scholar 

  61. 61.

    Hood ME, Katawczik M, Giraud T: Repeat-induced point mutation and the population structure of transposable elements in Microbotryum violaceum. Genetics. 2005, 170: 1081-1089. 10.1534/genetics.105.042564.

    PubMed  PubMed Central  Article  Google Scholar 

  62. 62.

    Berbee ML, Taylor JW: Dating divergences in the fungal tree of life: Review and new analyses. Mycologia. 2006, 98 (6): 838-849.

    PubMed  Article  Google Scholar 

  63. 63.

    Berbee ML, Taylor JW: Fungal molecular evolution: Gene trees and geologic time. The Mycota VII Part B: Systematics and Evolution. 2001, 229-245.

    Chapter  Google Scholar 

  64. 64.

    St Leger RJ, Joshi L, Roberts DW: Adaptation of proteases and carbohydrases of saprophytic, phytopathogenic and entomopathogenic fungi to the requirements of their ecological niches. Microbiology. 1997, 143 (6): 1983-1992.

    PubMed  Article  Google Scholar 

  65. 65.

    Rappleye CA, Goldman WE: Defining virulence genes in the dimorphic fungi. Annu Rev Microbiol. 2006, 60: 281-303. 10.1146/annurev.micro.59.030804.121055.

    PubMed  Article  Google Scholar 

  66. 66.

    Sexton AC, Howlett BJ: Parallels in fungal pathogenesis on plant and animal hosts. Eukaryotic Cell. 2006, 5 (12): 1941-1949. 10.1128/EC.00277-06.

    PubMed  PubMed Central  Article  Google Scholar 

  67. 67.

    Xu J, Zhao X, Dean RA, Jay CD: From Genes to Genomes: A New Paradigm for Studying Fungal Pathogenesis in Magnaporthe oryzae . Advances in genetics. 2007, Academic Press, 57: 175-218.

    Google Scholar 

  68. 68.

    Lindstrom JT, Belanger FC: Purification and characterization of an endophytic fungal proteinase that is abundantly expressed in the infected host grass. Plant Physiology. 1994, 106: 7-16.

    PubMed  PubMed Central  Google Scholar 

  69. 69.

    Reddy PV, LCK , Belanger FC: Mutualistic Fungal Endophytes Express a Proteinase that Is Homologous to Proteases Suspected to Be Important in Fungal Pathogenicity. Plant Physiology. 1996, 111 (4): 1209-1218. 10.1104/pp.111.4.1209.

    PubMed  PubMed Central  Article  Google Scholar 

  70. 70.

    Nahar P, Ghormade V, Deshpande MV: The extracellular constitutive production of chitin deacetylase in Metarhizium anisopliae : possible edge to entomopathogenic fungi in the biological control of insect pests. Journal of Invertebrate Pathology. 2004, 85 (2): 80-88. 10.1016/j.jip.2003.11.006.

    PubMed  Article  Google Scholar 

  71. 71.

    Lee BN, Kroken S, Chou DYT, Robbertse B, Yoder OC, Turgeon BG: Functional Analysis of All Nonribosomal Peptide Synthetases in Cochliobolus heterostrophus Reveals a Factor, NPS6, Involved in Virulence and Resistance to Oxidative Stress. Eukaryotic Cell. 2005, 4 (3): 545-555. 10.1128/EC.4.3.545-555.2005.

    PubMed  PubMed Central  Article  Google Scholar 

  72. 72.

    Cary JW, OBrian GR, Nielsen DM, Nierman W, Harris-Coward P, Yu J, Bhatnagar D, Cleveland TE, Payne GA, Calvo AM: Elucidation of veA- dependent genes associated with aflatoxin and sclerotial production in Aspergillus flavus by functional genomics. Applied Microbiology and Biotechnology. 2007

    Google Scholar 

  73. 73.

    Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997, 387 (6634): 708-713. 10.1038/42711.

    PubMed  Article  Google Scholar 

  74. 74.

    Conant GC, Wolfe KH: Increased glycolytic flux as an outcome of whole-genome duplication in yeast. Molecular Systems Biology. 2007, 3: 129-10.1038/msb4100170.

    PubMed  PubMed Central  Article  Google Scholar 

  75. 75.

    Talbot NJ: On the trail of a cereal killer: Exploring the Biology of Magnaporthe grisea . Annu Rev Microbiol. 2003, 57: 177-202. 10.1146/annurev.micro.57.030502.090957.

    PubMed  Article  Google Scholar 

  76. 76.

    Halaouli S, Asther M, Sigoillot JC, Hamdi M, Lomascolo A: Fungal tyrosinases: new prospects in molecular characteristics, bioengineering and biotechnological applications. Journal of Applied Microbiology. 2006, 100: 219-232. 10.1111/j.1365-2672.2006.02866.x.

    PubMed  Article  Google Scholar 

  77. 77.

    Thompson JE, Basarab GS, Andersson A, Lindqvist Y, Jordan DB: Trihydroxynaphthalene Reductase from Magnaporthe grisea: Realization of an Active Center Inhibitor and Elucidation of the Kinetic Mechanism. Biochemistry. 1996, 36 (7): 1852-1860. 10.1021/bi962355u.

    Article  Google Scholar 

  78. 78.

    McPherson MJ, Ogel ZB, Stevens C, Yadav KDS, Keen JN, Knowles PF: Galatose oxidase of Dactylium dendroides . The Journal of Biological Chemistry. 1992, 267 (12): 8146-8152.

    PubMed  Google Scholar 

  79. 79.

    Rolke Y, Liu S, Quidde T, Williamson B, Schouten A, Weltring KM, Siewers V, Tenberge KB, Tudzynski B, Tudzynski P: Functional analysis of H2O2-generating systems in Botrytis cinerea : the major Cu-Zn-superoxide dismutase (BCSOD1) contributes to virulence on french bean, whereas a glucose oxidase (BCGOD1) is dispensable. Molecular Plant Pathology. 2004, 5 (1): 17-27. 10.1111/j.1364-3703.2004.00201.x.

    PubMed  Article  Google Scholar 

  80. 80.

    Wolfe KH, Li WH: Molecular evolution meets the genomics revolution. Nature Genetics Supplement. 2003, 33: 255-265. 10.1038/ng1088.

    Article  Google Scholar 

  81. 81.

    Lynch M, O'Hely M, Walsh B, Force A: The probability of preservation of a newly arisen gene duplicate. Genetics. 2001, 159 (4): 1789-1804.

    PubMed  PubMed Central  Google Scholar 

  82. 82.

    Conant GC, Wagner A: GenomeHistory: a software tool and its application to fully sequenced genomes. Nucleic Acids Research. 2002, 30 (15): 3378-3386. 10.1093/nar/gkf449.

    PubMed  PubMed Central  Article  Google Scholar 

  83. 83.

    Fungal Genome Initiative. []

  84. 84.

    Eukaryotic Genomics. []

  85. 85.

    Entrez Genome. []

  86. 86.

    Aspergillus oryzae genome. []

  87. 87.

    Repbase. []

  88. 88.

    Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of Eukaryotic repetitive elements. Cytogenetic and Genome Research. 2005, 110: 462-467. 10.1159/000084979.

    PubMed  Article  Google Scholar 

  89. 89.

    Brown DE, Powell AJ, Carbone I, Dean RA: GT-Miner: a graph-theoretic data miner, viewer and model processor. Personal communication. 2008

    Google Scholar 

  90. 90.

    Chartrand G, Oellermann OR: Applied and Algorithmic Graph Theory. 1993, New York: McGraw-Hill

    Google Scholar 

  91. 91.

    Bollobas B: Graph Theory: An introductory Course. 1979, Heidelberg: Springer-Verlag

    Chapter  Google Scholar 

  92. 92.

    Press WH, Flannery BP, Teukolsky SA, Vetterling WT: Numerical Recipes in C: the art of scientific computing. 1992, Cambridge; New York : Cambridge University Press, second

    Google Scholar 

  93. 93.

    Sokal RR, Rohlf FJ: Biometry. 1995, WH Freeman and Company, New York, ISBN-10: 0716724111, Third

    Google Scholar 

  94. 94.

    The Gene Ontology database. []

  95. 95.

    The Gene OC: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research. 2004, D258-D261. 32 Database

  96. 96.

    Cai Z, Mao X, Li S, Wei L: Genome comparison using Gene Ontology (GO) with statistical testing. BMC Bioinformatics. 2006

    Google Scholar 

  97. 97.

    The Gene OC: GO Slim. []

  98. 98.

    Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995, 57 (1): 289-300.

    Google Scholar 

  99. 99.

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.

    PubMed  PubMed Central  Article  Google Scholar 

Download references


AJP was supported by US Public Health Service Grant (NIH T32) Molecular Mycology and Pathogenesis Training Program AI52080 awarded to Professor Thomas G. Mitchell of Duke University. GCC is supported by a Science Foundation Ireland grant to K.H. Wolfe. IC was supported by the National Research Initiative of the USDA Cooperative State Research, Education and Extension Service, grant number 2005-35319-16126. The authors wish to thank A.M. Evangelisti for valuable discussions and statistical and computational guidance.

Author information



Corresponding author

Correspondence to Ralph A Dean.

Additional information

Authors' contributions

AJP, IC and RAD conceived the study. AJP and DEB performed the analyses. GCC assisted with computational and statistical analyses. AJP and GCC wrote the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Comparison of power law and exponential distributions for gene family sizes. These plots show that a power law provides a better fit for the distribution of gene family sizes in a genome than an exponential distribution. (PDF 176 KB)


Additional file 2: Gene family sizes and functional annotation for gene families showing significant expansion in pathogens. This table summarizes gene family sizes and functional annotation for gene families that are expanded in fungal pathogens. (DOC 47 KB)


Additional file 3: Gene family sizes and functional annotation for gene families showing significant contraction in pathogens. This table summarizes gene family sizes and functional annotation for gene families that are contracted in fungal pathogens. (DOC 52 KB)


Additional file 4: Gene family sizes and functional annotation for gene families showing significant expansion in one member of a species pair. This table gives gene family sizes and functional annotation for gene families that are expanded in one member a species pair. (DOC 52 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Powell, A.J., Conant, G.C., Brown, D.E. et al. Altered patterns of gene duplication and differential gene gain and loss in fungal pathogens. BMC Genomics 9, 147 (2008).

Download citation


  • Gene Ontology
  • Gene Family
  • Gene Duplication
  • Graminearum
  • Large Gene Family