Comparative genome analysis and genome evolution of members of the magnaporthaceae family of fungi
© Okagaki et al. 2016
Received: 4 November 2015
Accepted: 17 February 2016
Published: 25 February 2016
Magnaporthaceae, a family of ascomycetes, includes three fungi of great economic importance that cause disease in cereal and turf grasses: Magnaporthe oryzae (rice blast), Gaeumannomyces graminis var. tritici (take-all disease), and Magnaporthe poae (summer patch disease). Recently, the sequenced and assembled genomes for these three fungi were reported. Here, the genomes were compared for orthologous genes in order to identified genes that are unique to the Magnaporthaceae family of fungi. In addition, ortholog clustering was used to identify a core proteome for the Magnaporthaceae, which was examined for diversifying and purifying selection and evidence of two-speed genome evolution.
A genome-scale comparative study was conducted across 74 fungal genomes to identify clusters of orthologous genes unique to the three Magnaporthaceae species as well as species specific genes. We found 1149 clusters that were unique to the Magnaporthaceae family of fungi with 295 of those containing genes from all three species. Gene clusters involved in metabolic and enzymatic activities were highly represented in the Magnaporthaceae specific clusters. Also highly represented in the Magnaporthaceae specific clusters as well as in the species specific genes were transcriptional regulators. In addition, we examined the relationship between gene evolution and distance to repetitive elements found in the genome. No correlations between diversifying or purifying selection and distance to repetitive elements or an increased rate of evolution in secreted and small secreted proteins were observed.
Taken together, these data show that at the genome level, there is no evidence to suggest multi-speed genome evolution or that proximity to repetitive elements play a role in diversification of genes.
Genome comparison studies have become critical to understanding the evolutionary relationships between similar species. Genome sequencing and expression data have become more cost-effective and easier to generate, resulting in an increase in the number of available genomes for analysis. In mycology, many of the genomes are poorly annotated, resulting in a need for large scale genome analysis to identify genes that have similar function. For pathogens, comparisons can help to find novel drug targets, mechanisms of infection, or common genes that might shed light on pathogenic and non-pathogenic lifestyles.
Homologs are genes that are shared among related organisms and can be used for genome comparisons. Homologs can fall into two different subclasses: orthologs and paralogs. Orthologs are derived from a common ancestor but usually diverge by speciation, resulting in retention of similar functions during evolution. In contrast, paralogs typically diverge after speciation and are the result of gene duplication events and may or may not retain similar functions. Orthologs and paralogs can be useful tools in genome comparison studies because they can highlight genes shared among species that are important to conserved biological processes or can reveal those genes that are unique to a particular subset of fungi, such as families of fungi or fungi with a specific lifestyle. Several algorithms have been developed to study orthologs across species, but most are limited to comparisons between only two species. OrthoMCL  is an algorithm used for the identification of orthologs between multiple species. Developed by Li et al. , OrthoMCL uses multiple steps including BLASTp and Markov clustering in order to group genes into likely orthologous clusters. Using such algorithms, genes with similar functions as well as those genes unique to each species can be identified.
The Magnaporthaceae family of fungi contains several economically important plant pathogens. Among the pathogenic members of this family are Magnaporthe oryzae, Gaeumannomyces graminis var. tritici, and Magnaporthe poae. M. oryzae is known as the rice blast fungus and causes disease in rice, wheat, and barley following landing of conidia on the host plant leaf [3, 4]. Upon germination on the hydrophobic leaf surface, the formation of a specialized infection structure, the appressorium, is stimulated. The appressorium penetrates the leaf surface allowing the fungus to invade and spread in the plant tissue. M. oryzae outbreaks have been known to devastate vast acreages of rice on a regular basis and is a major concern for global food security [4, 5]. More recently, M. oryzae has also been shown to cause disease on other cultivated grasses including barley and wheat, increasing its threat to the food supply [3, 6]. G. graminis var. tritici is the causative agent of take-all disease in wheat [3, 7]. Unlike M. oryzae, which targets the leaf of the plant, G. graminis var. tritici attacks the roots of wheat plants resulting in root rot. Hyphae of the soil-borne fungus wrap around the root and invade the root structure causing tissue necrosis and subsequent killing of the plant [3, 7]. M. poae, the causative agent of summer patch disease in turf grasses, acts in a similar manner to G. graminis var. tritici and attacks the roots of grasses causing root-rot and subsequent host plant death .
Identification of proteins that are involved with host-pathogen interactions has, until recently, relied on molecular biology techniques at the bench. For plant pathogens, several classes of proteins are frequent targets of further study including carbohydrate active enzymes (CAzymes), transcriptional regulators, and secreted proteins. CAzymes can be classified into six subsets : auxiliary activity (AA), carbohydrate binding molecules (CBM), carbohydrate esterases (CE), glycoside hydrolases (GH), glycosyltransferases (GT), and polysaccharide lyases (PL). Comparative studies of CAzymes in 103 fungal proteomes were performed by Zhao et al. , and showed for M. oryzae, G. graminis var. tritici, and M. poae that GHs were the most abundant class. Targets of GHs include cellulose, glycans, glucans, and chitin, suggesting both plant and fungal targets for this enzyme class [8, 9].
Fungal effector proteins are secreted proteins, often less than 250 amino acids in length, which interact with host plant proteins in order to modulate the host immune system and promote infection [10, 11]. Effectors proteins have been shown to be highly diversifying [3, 6, 11–24] and may be undergoing accelerated evolution. Studies in M. oryzae have shown that some effector proteins are undergoing high rates of diversification in order to evade the host immune response, suggesting that there is selection pressure by the host environment to rapidly accumulate non-synonymous mutations [3, 12, 15, 21, 22, 24, 25]. These data suggest that diversification of genes through mutation is one mechanism for fungi to evolve to escape plant recognition. This concept of two-speed genome evolution, where virulence genes evolve more rapidly than other genes, has implicated repetitive DNA elements, including retrotransposons, in the increased rate of evolution in effector proteins [12–22]. Together, CAzymes and small secreted proteins are critical to initial host-pathogen interactions that allow a fungal pathogen to degrade and enter host cells while modulating their response to invasion. With more recent advances in bioinformatics, both CAzymes and small secreted proteins of special interest can be identified and characterized prior to studying them at the bench.
The goal of this study was two-fold: identify genes and gene clusters that are unique to the Magnaporthaceae family of fungi in order to identify genes that may be involved pathogenesis, and identify a core proteome of conserved genes and identify functional clusters that are undergoing rapid diversification. First, the protein sequences from 74 fungal genomes, including the genomes of M. oryzae, G. graminis var. tritici, and M. poae, were chosen from the Broad Institute’s Fungal Genome Initiative  for OrthoMCL analysis. The genomes included consisted of plant and animal pathogens as well as the genomes of model fungi, such as Saccharomyces cerevisiae. OrthoMCL clusters that contained only genes from the Magnaporthaceae family of fungi and unclustered genes that are species specific were further analyzed. Gene Ontology annotation (GO annotation) , and InterProScan  protein domain identification were used to determine the putative functions for each cluster of orthologs. We hypothesized that genes and gene clusters involved with metabolic process would be highly represented in the Magnaporthaceae specific and species specific genes. The data suggests, however, that proteins with enzymatic function and transcriptional regulators were highly represented in orthologous clusters that are unique to the Magnaporthaceae. In addition, we used Hmmscan [29, 30], to identify Magnaporthaceae specific clusters and species specific “unique” genes that had putative CAzyme function. We found that few CAzymes were clustered by OrthoMCL, while a higher number were identified in the species specific genes.
Second, OrthoMCL clusters containing at least one gene from each of the three Magnaporthaceae species were identified as the “core proteome”. We hypothesized that secreted proteins and specifically secreted proteins with enzymatic and protease functions would be undergoing diversifying selection. In addition, we hypothesized that genes under diversifying selection would be closer to repetitive elements than genes that are under neutral or purifying selection. Phylogenetic Analysis by Maximum Likelihood (PAML)  was used to identify genes that exhibited purifying selection or diversifying selection and compared to repetitive element locations in the genome. Additionally, secreted proteins were identified using TargetP  and SignalP  and were examined for their proximity to repetitive elements. Surprisingly, the data suggests that there is no correlation between genes undergoing diversifying selection or genes with higher mutations rates and distance to repetitive elements. In addition, we found no evidence that secreted proteins are subjected to more diversifying selection than purifying selection. Taken together, we found no evidence of two-speed genome evolution between the three Magnaporthaceae species examined.
OrthoMCL and unique gene summary
Clustered as single species
Total unique genes
Magnaporthaceae specific OrthoMCL cluster summary
Cluster function identification
To determine putative functions for genes unique to each of the three Magnaporthaceae species, InterProScan was used to identify functional protein domains and GO annotations for each gene. One-hundred ninety four unique GO annotations were identified, with 244 genes returning no known protein domains and no GO annotation (Fig. 2b). Protein binding and other binding functions were highly represented in the unique proteins, with 4298 and 1290 genes represented by these two categories, respectively. Similar to the clustered genes, proteins with predicted enzymatic activity, including metabolic process (599 genes) and oxidoreductase activity (739 genes) were abundant in genes unique to each fungus. Additionally, six categories that were the most abundant were functions involved in transcription and transcriptional regulation, including DNA binding (437 genes) and transcription factor activity (281 genes). Again, similar to the Magnaporthaceae specific clusters, ion binding activity, with zinc (467 genes), heme (298 genes), and iron binding (266 genes) functions appeared in the most abundant twenty functional categories. Together with the Magnaporthaceae shared cluster data, these data suggest that proteins with enzymatic functions and transcriptional regulation proteins may be undergoing higher rates of mutation than genes with other functions.
CAZyme identification and analysis
Fungal plant pathogens utilize a wide variety of carbohydrate active enzymes (CAZymes) in order to infect the host plant . Previous analysis of a variety of fungal species showed that even mammalian commensal fungi retain an array of CAZymes . There are six major classifications of CAZymes : polysaccharide lyases (PL), glycosyltransferases (GT), glycoside hydrolases (GH), carbohydrate esterases (CE), carbohydrate binding molecules (CBM), and auxiliary activities (AA). Additional analysis showed that monocot pathogens, including M. oryzae, G. graminis var. tritici, and M. poae, exhibited an abundance of glycoside hydrolases and low numbers of polysaccharide lyases .
M. poae and G. graminis var. tritici shared twelve clusters with putative CAZyme genes (Fig. 3b, left). G. graminis var. tritici and M. poae shared twelve clusters containing GTs, five clusters containing GHs and four clusters containing CBMs were identified in the M. poae and G. graminis var. tritici shared clusters. M. oryzae and G. graminis var. tritici shared only a single cluster that contained a putative CAZyme, which was identified as containing GTs. Interestingly, M. poae and M. oryzae had no shared clusters that contained CAZymes.
Analysis of the unique genes for each species revealed that M. oryzae had the most unique CAZymes, with 107, while G. graminis var. tritici and M. poae were similar with 50 and 54 unique CAZyme genes, respectively (Fig. 3b, right). The majority of M. oryzae CAZymes fell into the GH and CBM categories. For both M. poae and G. graminis var. tritici, GHs were the primary CAZymes identified in the unique genes. Taken together, these data support the previous data by Zhao et al.  that glycoside hydrolases are the most abundant CAZymes in the monocot pathogens. These data also show that GTs were abundant in the M. poae and G. graminis var. tritici shared clusters compared with clusters shared by all three Magnaporthaceae species, suggesting that the glycosyltransfereases may be involved in a biological process common to M. poae and G. graminis var. tritici.
Putative transcription factor identification and analysis
Selection analysis of orthologus clusters
Recent studies have suggested that rapid diversification of certain genes can occur in fungal phytopathogens in response to host plant selection pressures. Mechanisms of increased diversification include proximity to repetitive elements and repeat induced point mutation (RIP), especially in genes close to long-terminal repeat (LTR) retrotransposons . However, most of the studies to date have only been performed in single or small families of genes with similar functions and comparisons were performed in strains of a single species [12, 38–40]. We hypothesized that at the family level, similar patterns would be observed: that genes closer to repetitive elements would exhibit more diversifying selection than genes further from repetitive elements. To test this, orthologous clusters identified by OrthoMCL that contain at least one gene from each Maganporthaceae species were examined for diversifying and purifying selection and their proximity to repetitive elements and putative functions.
The vast majority of core proteome clusters (87 %) were found to contain a single gene from each of the three Magnaporthaceae species while only 13 % contained paralogs. We hypothesized that the clusters that contained paralogs were undergoing more diversifying selection than those with a single gene from each species. To test this, the clusters were split into two categories, those with a single gene from each species (Fig. 5a, middle, No Paralogs), and those that contain putative paralogs (Fig. 5a, right, With Paralogs). We observed that fewer clusters containing paralogs were under neutral selection. In addition, both the proportion of clusters under purifying selection (6 %) and diversifying selection (42 %) were higher compared with the clusters with no paralogs. Thus, clusters that contain paralogs are under more selection than those without paralogs, but the selection is not limited to purifying or diversifying.
Repetitive element sequence analyses in several fungal species has been used to identify evolutionary relationships between species based on repetitive element copy number and location. Hypotheses have been suggested that genomes can evolve at two different speeds due to proximity to and influence by repetitive elements, where diversifying genes are in regions of high repetitive content, while conserved genes are in area with low repetitive content. Previous studies have shown a high mutation rate due to repeat induces point mutation (RIP) in areas of the M. oryzae genome which contain specific long-terminal repeat (LTR) retrotransposons, such as Maggy . Therefore, PAML scores for the core proteome were compared to repetitive element content of the DNA near each gene.
Briefly, repetitive element libraries were built for each Magnaporthaceae species de novo using RepeatModeler [36, 42]. Only repetitive elements >200 bp were considered for further analysis. For each species, genes were identified as undergoing diversifying, or purifying selection and their distances to the closest repetitive element were graphed (Fig. 5b). P-values were then calculated using the Mann—Whitney Rank Sum test comparing the diversifying gene group and the purifying gene group to determine if there is a significant difference in the distance between repetitive elements for each group. P-values of <0.05 were considered statistically significant. Surprisingly, there was no significant difference between the distance to the closest repetitive element between diversifying and purifying genes for M. oryzae (p = 0.128), G. graminis var. tritici (p = 0.756), or M. poae (p = 0.580). Taken together, these data do not support our hypothesis but rather suggest that there is no correlation between proximity to repetitive elements and diversifying or purifying selection.
dN/dS ratio does not take into account the total number of mutations found in a gene sequence, therefore, additional mutational analysis was performed. Briefly, mutational analysis was performed by predicting a majority consensus sequence for each sequence and then identity distances between the consensus and each sequence in the alignments were calculated using the majority character at each site. For each gene sequence in the ortholog cluster, the pairwise distance between the consensus and the transcript sequence were calculated. Values ranged from 0 to 0.69, hereby regarded as the degree of mutation, with the values closer to one representing genes with the highest proportion of total mutations. Degree of mutation for each gene in the core proteome was graphed against the distance to the closest repetitive element and coefficient of determination (R2 values) were calculated in order to determine if there was a correlation between degree of mutation and repetitive element proximity (Fig. 6a). The R2 values were near zero for M. oryzae (R2 = 0.0025), G. graminis var. tritici (R2 = 0.0079) and M. poae (R2 = 0.0076). These data suggest that for the orthologous clusters within the Magnaporthaceae family of fungi, the degree of mutation is not correlated with the distance to the closest repetitive element.
Identification of function for diversifying and purifying gene clusters
Secreted protein identification and analysis
Several secreted proteins in M. oryzae have been identified as effector proteins, which play a role in modulating the host immune response to infection (reviewed in ). It has been proposed that such effector proteins must be more prone to mutation than the rest of the fungal genome in order to evade host plant recognition and defenses [10, 11]. These studies suggest that small secreted proteins, defined here as under 250 amino acids in length, may be undergoing diversification due to close proximity to repetitive elements. Because our data show no correlation between diversifying selection and proximity to repetitive elements at the genome level, the relationship between small secreted proteins and repetitive element location was examined.
Because it has been proposed that small secreted proteins undergo faster evolution due to proximity to repetitive elements, the distance between unique proteins (UP), unique secreted proteins (USP), and unique secreted proteins smaller than 250 amino acids (USP250) was compared to the closest repetitive elements (Fig. 8b). In M. oryzae, UP, USP, and USP250 were significantly closer to repetitive elements when compared with the genome average (p < 0.001 for all comparisons). In addition, USPs were significantly closer to repetitive elements than UP (p < 0.001). However, there was no significant difference between the USPs and USP250s (p = 0.054). Interestingly, this trend was only observed in M. oryzae. In G. graminis var. tritici and M. poae, only UP has a significant difference when compared with the total genome (p < 0.001 and p = 0.008, respectively). There was no significant difference observed in the USP or USP250 in G. graminis var. tritici or M. poae when compared with the whole genome average.
High mutation rates and C–G → A–T point mutations are found to be associated with certain retrotransposons in M. oryzae . Therefore, we examined the closest repetitive elements to the USP250 to identify the subtype. We observed that in all three Magnaporthaceae, repetitive elements that were classified as “unknown” by RepeatModeler were most commonly found with small secreted proteins (Fig. 8c). A small proportion of G. graminis var. tritici and M. poae USP250 have no repetitive elements mapped to the same contig and were unable to be fully analyzed (Fig. 8c, None Found). Of the identified repetitive elements, retrotransposons were most commonly identified as the closest repetitive element to the USPs, with LTR/Gypsy and LINE/Tad1 elements being highly represented in M. oryzae (Fig. 8c). Thus, these data suggest that retrotransposons are the closest repetitive elements the small secreted proteins in M. oryzae. However, these observations cannot be extrapolated to M. poae or G. graminis var. tritici.
The Magnaporthaceae family of fungi is both economically and socially important; understanding the infection process and identifying novel antifungal targets are becoming critical to halt widespread crop and turf grass loss. Here we utilized several analytical approaches to interrogate conserved and unique genes among three species of Magnaporthaceae. Using OrthoMCL, we identified clusters that are highly conserved among 74 fungal species and 1149 clusters that are specific to the Magnaporthaceae (Fig. 1). In addition, we identified which genes are unique to each species and determined putative gene functions (Table 1, Fig. 2, Fig. 3, and Fig. 4). OrthoMCL revealed a core proteome for the Magnaporthaceae of 6518 clusters that contain at least one gene from M. oryzae, G. graminis var. tritici, and M. poae. To our surprise, further analysis of the core proteome using PAML revealed that there is no correlation between PAML score and distance to repetitive elements (Fig. 6a) or degree of mutation to repetitive elements (Fig. 6b), while analysis of clusters that are undergoing diversifying or purifying selection showed no enrichment of secreted proteins nor small secreted proteins (Fig. 9a, b).
GO annotation and InterProScan analysis of the clusters unique to the Magnaporthaceae species showed that proteins with enzymatic function and proteins involved in transcriptional regulation were the most common. However, these categories are also common in the genes that are unique to each species. These data suggest that both categories may contribute to speciation but not enough evolutionary time has passed to separate the genes in the shared clusters into unique genes. Alternatively, there may be some evolutionary pressure to maintain the genes within the 295 shared clusters, such as environmental conditions or host plant conditions. Putative function analysis was performed at the cluster level, thus leaving the potential that analysis at the sequence level will reveal specific conserved regions of the genes within each cluster.
Interestingly, more CAZymes were found to be specific to each species compared with the Magnaporthaceae specific clusters (Fig. 3). These data suggest that CAZyme gene sequences are plastic and may contribute to speciation. Fungi produce a large number of CAZymes  and an abundance of proteins with redundant functions may result in the diversity observed in our data. Zhao et al.  showed that there were similar ratios of each CAZyme class found in fungi that infect similar hosts, such as monocots or dicots. However, CAZymes may vary based on route of infection rather than the type of host plant. M. oryzae infects the leaf of a plant while G. graminis var. tritici and M. poae infect the root of the plant. G. graminis and M. poae share 12 clusters that contain CAZymes that are not shared with M. oryzae (Fig. 3a). One hypothesis for the abundance of shared CAZymes is that these clusters contain genes needed to infect the root of the host plant. Our CAZyme analysis suggests that glycosyltransferases may be important in the environmental or host-pathogen interactions in M. poae and G. graminis var. tritici, while glycoside hydrolases and carbohydrate binding molecules are most abundant among the M. oryzae unique genes (Fig. 3b). Further analysis of the CAZyme families may reveal specific enzyme targets for each cluster that are important to infection at the root or leaf.
In addition to enzymes, transcription factors were identified as abundant in both the Magnaporthaceae specific clusters and the unique gene groups for each species (Fig. 4). More specifically, the zinc finger and fungal-type zinc finger transcription factors were common in both analyses. These data suggest that adaptation to environmental and host-plant stresses may be dependent on transcriptional regulation in addition to altering protein function through mutation. Preliminary RNAseq data of the three species under several stress conditions, such as heat, cold, and osmotic stress, suggest that relatively few clusters exhibit similar transcriptional regulation (data not shown), however, additional experiments must be performed to confirm these data.
The ratio of purifying and diversifying clusters compared with clusters under neutral selection varied depending on the presence or absence of paralogs (Fig. 5a). In clusters that contained one or more paralogs, there was an increase in the proportions of both diversifying and purifying genes. There are several proposed functions for gene duplication in fungi. First, gene duplication of genes with highly conserved function (purifying genes) may be needed to maintain genes with redundant function. Second, duplication of conserved genes may result in increased protein production. Third, gene duplication of diversifying proteins may be needed to develop a novel function for the gene group. Our data suggests that the gene duplication observed in the core proteome results in both conserved and novel functions. Closer analysis of clusters and their function would be needed to further understand the nature of each gene duplication.
It has been suggested that in M. oryzae, genes encoding effectors are undergoing more rapid evolution than other genes [12–15, 22]. As hypothesized for antagonistic co-evolution between organisms, the zig-zag model of host and pathogen evolution suggests that as the host immune system evolves to recognize certain pathogen effector proteins, then the pathogen must, in turn, evolve to evade the host immune response [11, 45]. The avirulence genes (AVR) in M. oryzae, Leptosphaeria maculans, Leptospheraeria biglobosa, and other phytopathogenic fungi, have been shown to have undergone gene duplication, translocation, and RIP mutation [3, 6, 11–24], supporting the idea that these effector proteins are undergoing rapid mutation. Interestingly, the zig-zag model of evolution between host and pathogen is not limited to fungal pathogens nor plant hosts, but is also seen in a variety of host-pathogen interactions such as mammalian parasitic pathogens including the malaria causing Plasmodium falciparum . The merozoite surface protein I (MSP1) gene in P. falciparum is highly polymorphic, allowing for evasion of the host antibody response .
Additionally, it has been suggested that proximity to repetitive elements, such as retrotransposons, contributes to rapid diversification [12–22]. More specifically, the M. oryzae LTR retrotransposon, Maggy, has been found to be associated with T:A enriched regions due to RIP . Our data does show that in M. oryzae unique proteins and, more specifically, small unique proteins are closer to repetitive elements, including LTR classification (Fig. 8b, c). However, these observations were not seen in either G. graminis var. tritici or in M. poae, suggesting that increased diversification due to repetitive element proximity, and more specifically proximity to Maggy and similar retrotransposons, is not universal to the Magnaporthaceae family of fungi.
It is important to note that the purpose of this study was to compare three genomes of related phytopathogenic fungi at the family level. While our data shows no evidence of a two-speed genome evolution in the Magnaporthaceae, evidence of small scale evolution, such as diversification observed between strains, may still be found. While we hypothesized that evidence of a two-speed genome evolution would be observed among the Magnaporthaceae family, our analyses, which were performed in several different ways (Figs. 6a, b, 8a–c, 9a, b) failed to support the hypothesis.
Our data showed that at the genome level, there is no evidence to suggest multi-speed genome evolution or that proximity to repetitive elements plays a role in diversification of genes. Our core proteome analysis consisted of 6518 clusters containing a total of 22,085 genes from M. oryzae, G. graminis var. tritici, and M. poae. We examined the proximity of genes undergoing diversifying or purifying selection to repetitive elements and determined there was no significant difference between the two groups in any species (Fig. 5b). To confirm these data, PAML scores were graphed against distance to repetitive elements, R2 values were near zero (Fig. 6a) and mutation analysis (Fig. 6b) also confirmed no correlation between degree of mutation and proximity to repetitive elements. Because sequence homology is used to cluster orthologs in OrthoMCL, it is possible that more conserved genes were used in our analysis. Thus by comparing orthologs, the data may be skewed towards neutral or purifying clusters. However, by using a low cutoff of 50 % sequence homology implemented in OrthoMCL to cluster orthologs, clustering should include a wider range of diversified genes.
Taken together, our data suggests that there is no evidence for two-speed evolution at the genome level. Additionally, repetitive element proximity has no influence on diversification of purification of orthologous clusters. While it is possible for more rapid evolution can occur on a small scale, such as a small group or functional class of proteins, these trends cannot be observed at the genome level.
Genome sequences and OrthoMCL
Genome, transcript, and protein sequences for 74 fungal genomes were downloaded from the Fungal Genome Initiative at Broad Institute of Harvard and the Massachusetts Institute of Technology . A comprehensive list of the source files used can be found in Additional file 1. A phylogenetic tree representing the 74 fungal genomes was made using phylo T  and can be found in Additional file 2. For OrthoMCL analysis [1, 2] the protein sequences from 74 completed fungal genomes (including M. oryzae, M. poae, and G. graminis var. tritici) were compared using BLASTp (all-vs-all) with a maximum E-value of 1e–5. From the resulting BLASTp hits OrthoMCL identified homologous and paralogous relationships at 50 % similarity. Markov clustering was used to further refine orthologous clusters as described previously . Orthologous clusters can be found in Additional file 3. Three criteria were used to identify genes considered unique to each species: genes that were excluded from OrthoMCL clustering after all-vs-all BLASTp analysis, genes that were not clustered during Markov clustering, and all genes within clusters containing a single species.
Gene and cluster functions
Putative cluster functions were identified using the Blast2GO  suite of software, including BLASTn, InterPro protein domain identification, Gene Ontology annotation with Aspergillus slim. InterProScan v5.14 software  was used to determine the functions of unique genes. Functional domains from protein sequence files  were identified using PROSITE, HAMAP, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, CATH-Gene3D, and PANTHER protein databases through Blast2GO  and InterProScan . Gene Ontology (GO) terms were identified using InterProScan [27, 28].
CAZyme identification and classification
OrthoMCL clusters that were specific to the Magnaporthaceae were searched for carbohydrate activity enzymes (CAZymes). Fungal specific CAZymes were identified in the Magnaporthaceae protein sequences using Hmmscan v3.1b2  and dbCAN v4.0  database. Output files were parsed using the parser perl script included in the dbCAN database.
Transcription factor identification and classification
Conserved transcription factors were identified using InterProScan v5.0 software domain identification . Functional domains predicted by InterProScan analysis were used to identify putative transcription factors. Custom Python v3.4  scripts were used to parse and count putative transcription factors. InterProScan output data was manually inspected for genes with putative transcription factor analysis to ensure that all transcription factors were identified and no extraneous genes were included.
Phylogenetic analysis by maximum likelihood
OrthoMCL clusters that contained at least one gene from each Magnaporthaceae species were parsed and transcripts for genes within each cluster were retrieved from the Broad Institute transcript files using custom Python scripts. The paired sequence files were aligned using command line MUSCLE v3.8.31 , reiterating the alignments until reaching convergence. Phylogenetic trees were simultaneously generated from the second iteration. Alignment columns with more than 65 % gap characters were removed using a custom Python script. Three clusters (moggtmp1004, moggtmp1005, and moggtmp1315) were unable to be aligned and were not analyzed further. In order to estimate the nonsynonymous to synonymous (dN/dS) substitution rates, the CODEML program as part of PAML v4.8  was implemented using BioPython v1.65 . Likelihood ratio tests (LRTs) of site-specific selection were used, comparing M1 (neutral) to M2 (selection) and M7 (beta) to M8 (beta & w) using the test statistic 2*(lnL1–lnL2) = 2∆L. The cluster was considered undergoing positive selection if both the M1/M2 and M7/M8 LRTs were significant under a chi-square test with p < 0.05.
Repetitive elements identification and classification
Repetitive elements were identified as previously described . Briefly, repetitive element analysis was performed using RepeatModeler and RepeatMasker programs . De novo repetitive element libraries were created with RMBlast NCBI search engine within RepeatModeler. Similar repetitive element sequences were collapsed into their parent family and classified within RepeatModeler. Final classified consensus files for M. poae and G. graminis var. tritici were used as libraries for repetitive element searches with RepeatMasker. Repetitive sequence larger than 200 bp were considered for further analysis. Custom perl scripts were used to determine the distance to right flanking and left flanking repetitive element for each gene in the genomes of each of the three Magnaporthaceae. Box plots were graphed and Mann—Whitney Rank Sum statistical tests were performed using SigmaPlot v12.5 .
Mutational analysis was performed by predicting a majority consensus sequence for each sequence, using the seqinr v3.1-1 package incorporated into R. The identity distances between the consensus and each sequence in the alignments were calculated using the majority character at each site. The pairwise distance between the consensus and the transcript sequences were calculated. The degree of mutation was calculated as the squared root of the identity between the consensus and sequence.
Secreted protein identification and analysis
In order to identify secreted proteins, a two-step process was used; first protein sequences that contained a signal sequence were identified, then the subcellular localization of each was determined. Sequences that contained both a signal sequence and were identified as being targeted to the secretory pathway were considered secreted proteins. To identify proteins containing signal sequences, whole genome protein sequence files were analyzed using SignalP v4.1 . Those protein sequences that were identified as having a signal sequence by SignalP were then analyzed by TargetP v1.1 . VasserStats  was used to determine Z-scores and p-values for proportions. Genes from clusters identifies as undergoing purifying or diversifying selection by PAML analysis were analyzed for secreted proteins using SignalP and TargetP. Protein lengths for identified secreted proteins were graphed as box plots and Mann—Whitney Rank Sum statistical tests were performed with SigmaPlot v12.5.
Availability of supporting data and materials
The authors would like to thank Yeonyee Oh, William Sharpee, and Mengying Wang for their helpful feedback and discussions. LHO was supported by the Tri-institutional Molecular Mycology and Pathogenesis training grant (NIH 5 T32-AI052080-09). The College of Agriculture and Life Sciences at North Carolina State University, Raleigh, NC provided support for this project, JKS, AWE, and RAD.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- OrthoMCL. [http://www.orthomcl.org/].
- Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–89.PubMed CentralView ArticlePubMedGoogle Scholar
- Besi MI, Tucker SL, Sesma A. Magnaporthe and its relatives. eLS. 2001;58:83–93.Google Scholar
- Zeigler RS, Leong SA, Teeng PS. Rice blast disease. Wallingford: CAB International; 1994.Google Scholar
- McBeath JH, McBeath J. Plant diseases, pests and food security. In Environmental Change and Food Security in China. 2010;35:117–56.View ArticleGoogle Scholar
- Couch BC, Kohn LM. A multilocus gene genealogy concordant with host preference indicates segregation of a new species, Magnaporthe oryzae, from M. Grisea. Mycologia. 2002;94:683–93.View ArticlePubMedGoogle Scholar
- Freeman J, Ward E. Gaeumannomyces graminis, the take-all fungus and its relatives. Mol Plant Pathol. 2004;5:235–52.View ArticlePubMedGoogle Scholar
- Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. The carbohydrate-active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 2009;37(Database issue):D233–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao Z, Liu H, Wang C, Xu J. Correction: comparative analysis of fungal genomes reveals different plant cell wall degrading capacity in fungi. BMC Genomics. 2014;15:6.PubMed CentralView ArticlePubMedGoogle Scholar
- Petre B, Kamoun S. How Do filamentous pathogens deliver effector proteins into plant cells? PLoS Biol. 2014;12, e1001801.PubMed CentralView ArticlePubMedGoogle Scholar
- Jones JDG, Dangl JL. The plant immune system. Nature. 2006;444:323–9.View ArticlePubMedGoogle Scholar
- Khang CH, Park S-Y, Lee Y-H, Valent B, Kang S. Genome organization and evolution of the AVR-pita avirulence gene family in the Magnaporthe grisea species complex. Mol Plant Microbe Interact. 2008;21:658–70.View ArticlePubMedGoogle Scholar
- Howlett BJ, Lowe RGT, Marcroft SJ, van de Wouw a P. Evolution of virulence in fungal plant pathogens: exploiting fungal genomics to control plant disease. Mycologia. 2015;107:441–51.View ArticlePubMedGoogle Scholar
- Grandaubert J, Lowe RGT, Soyer JL, Schoch CL, Van de Wouw AP, Fudal I, Robbertse B, Lapalu N, Links MG, Ollivier B, Linglin J, Barbe V, Mangenot S, Cruaud C, Borhan H, Howlett BJ, Balesdent M-H, Rouxel T. Transposable element-assisted evolution and adaptation to host plant within the Leptosphaeria maculans-Leptosphaeria biglobosa species complex of fungal pathogens. BMC Genomics. 2014;15:891.Google Scholar
- Li G, Zhou X, Xu JR. Genetic control of infection-related development in Magnaporthe oryzae. Curr Opin Microbiol. 2012;15:678–84.View ArticlePubMedGoogle Scholar
- Wöstemeyer J, Kreibich A. Repetitive DNA elements in fungi (mycota): impact on genomic architecture and evolution. Curr Genet. 2002;41:189–98.View ArticlePubMedGoogle Scholar
- Stukenbrock EH. Evolution, selection and isolation: a genomic view of speciation in fungal plant pathogens. New Phytol. 2013;199:895–907.View ArticlePubMedGoogle Scholar
- Taniguti LM, Schaker PDC, Benevenuto J, Peters LP, Carvalho G, Palhares A, Quecine MC, Nunes FRS, Kmit MCP, Wai A, Hausner G, Aitken KS, Berkman PJ, Fraser J a., Moolhuijzen PM, Coutinho LL, Creste S, Vieira MLC, Kitajima JP, Monteiro-Vitorello CB. Complete genome sequence of sporisorium scitamineum and biotrophic interaction transcriptome with sugarcane. PLoS One. 2015;10, e0129318.Google Scholar
- Dhillon B, Gill N, Hamelin RC, Goodwin SB. The landscape of transposable elements in the finished genome of the fungal wheat pathogen Mycosphaerella graminicola. BMC Genomics. 2014;15(1):1132.PubMed CentralView ArticlePubMedGoogle Scholar
- Santana MF, Silva JC, Mizubuti ES, Araújo EF, Condon BJ, Turgeon BG, Queiroz MV. Characterization and potential evolutionary impact of transposable elements in the genome of Cochliobolus heterostrophus. BMC Genomics. 2014;15:536.Google Scholar
- Ikeda KI, Nakayashiki H, Kataoka T, Tamba H, Hashimoto Y, Tosa Y, Mayama S. Repeat-induced point mutation (RIP) in Magnaporthe grisea: implications for its sexual cycle in the natural field context. Mol Microbiol. 2002;45:1355–64.Google Scholar
- Chuma I, Isobe C, Hotta Y, Ibaragi K, Futamata N, Kusaba M, Yoshida K, Terauchi R, Fujita Y, Nakayashiki H, Valent B, Tosa Y. Multiple translocation of the AVR-pita effector gene among chromosomes of the rice blast fungus Magnaporthe oryzae and related species. PLoS Pathog. 2011;7, e1002147.Google Scholar
- Pendleton AL, Smith KE, Feau N, Martin FM, Grigoriev IV, Hamelin R, Nelson CD, Burleigh JG, Davis JM. Duplications and losses in gene families of rust pathogens highlight putative effectors. Front Plant Sci. 2014;5(June):299.Google Scholar
- Sesma A, Osbourn AE. The rice leaf blast pathogen undergoes developmental processes typical of root-infecting fungi. Nature. 2004;431:582–6.View ArticlePubMedGoogle Scholar
- Couch BC, Fudal I, Lebrun MH, Tharreau D, Valent B, Van Kim P, Nottéghem JL, Kohn LM. Origins of host-specific populations of the blast pathogen Magnaporthe oryzae in crop domestication with subsequent expansion of pandemic clones on rice and weeds of rice. Genetics. 2005;170:613–30.Google Scholar
- Broad Institute of Harvard and MIT. [http://www.broadinstitute.org/].
- Gene Ontology Consortium. [http://geneontology.org/].
- InterProScan. [http://www.ebi.ac.uk/Tools/pfa/iprscan5/].
- HMMscan. [http://www.ebi.ac.uk/Tools/hmmer/search/hmmscan].
- dbCAN CAzyme Database. [http://csbl.bmb.uga.edu/dbCAN/].
- PAML v4.8. [http://abacus.gene.ucl.ac.uk/software/paml.html; http://abacus.gene.ucl.ac.uk/software/paml.html].
- TargetP v1.1. [http://www.cbs.dtu.dk/services/TargetP/].
- SignalP v4.1. [http://www.cbs.dtu.dk/services/SignalP/].
- Broad Institute of Harvard and MIT. [https://www.broadinstitute.org/scientific-community/science/projects/fungal-genome-initiative/fungal-genomics].
- Luo J, Zhang N. Magnaporthiopsis, a new genus in magnaporthaceae (ascomycota). Mycologia. 2013;105:1019–29.View ArticlePubMedGoogle Scholar
- Okagaki LH, Nunes CC, Sailsbery J, Clay B, Brown D, John T, Oh Y, Young N, Fitzgerald M, Haas BJ, Zeng Q, Young S, Adiconis X, Fan L, Levin JZ, Mitchell TK, Okubara PA, Farman ML, Kohn LM, Birren B, Ma L-J, Dean RA. Genome sequences of three phytopathogenic species of the magnaporthaceae family of fungi. G3 Genes|Genomes|Genetics. 2015;g3(115):020057.Google Scholar
- Blast2GO. [https://www.blast2go.com/].
- Fabro G, Steinbrenner J, Coates M, Ishaque N, Baxter L, Studholme DJ, Körner E, Allen RL, Piquerez SJ, Rougon-Cardoso A, Greenshields D, Lei R, Badel JL, Caillaud MC, Sohn KH, Van den Ackerveken G, Parker JE, Beynon J, Jones JD. PLoS Pathog. 2011;7(11):e1002348. doi:10.1371/journal.ppat.1002348.
- Gout L, Kuhn ML, Vincenot L, Bernard-Samain S, Cattolico L, Barbetti M, Moreno-Rico O, Balesdent MH, Rouxel T. Genome structure impacts molecular evolution at the AvrLm1 avirulence locus of the plant pathogen Leptosphaeria maculans. Environ Microbiol. 2007;9:2978–92.Google Scholar
- Aguileta G, Lengelle J, Chiapello H, Giraud T, Viaud M, Fournier E, Rodolphe F, Marthey S, Ducasse A, Gendrault A, Poulain J, Wincker P, Gout L. Genes under positive selection in a model plant pathogenic fungus, botrytis. Infect Genet Evol. 2012;12:987–96.Google Scholar
- Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.View ArticlePubMedGoogle Scholar
- RepeatMasker. [http://www.repeatmasker.org/].
- Liu W, Liu J, Ning Y, Ding B, Wang X, Wang Z, Wang G-L. Recent progress in understanding PAMP- and effector-triggered immunity against the rice blast fungus Magnaporthe oryzae. Mol Plant. 2013;6:605–20.Google Scholar
- Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007;2:953–71.View ArticlePubMedGoogle Scholar
- Paterson S, Vogwill T, Buckling A, Benmayor R, Spiers AJ, Thomson NR, Quail M, Smith F, Walker D, Libberton B, Fenton A, Hall N, Brockhurst M a. Antagonistic coevolution accelerates molecular evolution. Nature. 2010;464:275–8.Google Scholar
- Pearce JA, Triglia T, Hodder AN, Jackson DC, Cowman AF, Anders RF. Plasmodium falciparum merozoite surface protein 6 is a dimorphic antigen. Infect Immun. 2004;72:2321–8.PubMed CentralView ArticlePubMedGoogle Scholar
- phylo T. [http://phylot.biobyte.de/].
- Python v3.4. [https://www.python.org/].
- MUSCLE v3.8.31. [http://www.drive5.com/muscle/].
- BioPython v1.65. [http://biopython.org/].
- SigmaPlot v12.5. [http://www.sigmaplot.com/].
- VasserStats. [http://vassarstats.net/index.html].