Strain, growth condition, and nucleic acid isolation
M. phaseolina strain MS6 was isolated from an infected jute plant at Bangladesh Jute Research Institute (BJRI), Dhaka. Strain MS6 is the most virulent among the 19 isolates so far isolated in BJRI. This strain has mycelium that is grey-white at the initial stage and turns dark green at the mature stage. It is coarse with feathery strand. The sclerotia are embedded in the strand. The fungus was cultured at 30°C in liquid potato dextrose medium. DNA and RNA were extracted as previously described [57, 58], respectively.
Genome sequencing and assembly
Whole-genome shotgun sequencing of the M. phaseolina MS6 strain was performed using the 454 and Illumina sequencing platforms. We generated a total of 6.92 Gb raw data, having 40.56 millions of raw reads. In the 454 sequencing strategy, both single-end (SE) and paired-end (PE) genomic libraries were constructed, of which 2.38 Gb of shotgun sequences provided 48.29x coverage and 1.57 Gb of 8 kb, 15 kb, and 20 kb PE sequences provided 31.95x coverage of the M. phaseolina genome. In the Illumina sequencing strategy, 2.97 Gb of PE libraries were generated, of which 1.73 Gb of 500 bp PE sequences provided 35.11x coverage and 1.24 Gb of 3 kb mate-paired sequences provided 25.14x coverage of the M. phaseolina genome.
We produced a high quality assembly of the M. phaseolina genome using Newbler assembly program version 2.5.3 (http://my454.com/products/analysis-software/index.asp). For both de novo assembly and reference mapping of the M. phaseolina genome, we only used raw data generated from 454 pyrosequencing. For de novo assembly, we fed the Newbler GS de novo assembler first with the shotgun sequences in one-step form and later with paired-end sequences incrementally in order to get better contigging and scaffolding. About 96.50% raw reads were assembled into 3,036 contigs and 94 scaffolds having 98.92% bases with Q40 plus bases. For reference mapping, we used Newbler GS reference mapper to map the raw sequence files onto the all contigs file generated that gave 98.89% reads and 99.11% bases mapped to the reference.
We also used Illumina PE sequences with GapCloser version 1.10, a tool from SOAP de novo (http://soap.genomics.org.cn/soapdenovo.html), in order to close the gaps inside the scaffolds that were generated from the Newbler scaffolding process. A total of 785 gaps were detected by GapCloser that cover 1.5 Mb residues of which 197 gaps were completely filled up, leaving only 1.47% (0.73 Mb) gaps inside the 94 scaffolds.
We checked the relative completeness of the M. phaseolina MS6 draft assembly version 1.0 by performing core gene annotation using the CEGMA pipeline . The resulting contigs as well as scaffolds from the M. phaseolina MS6 assembly were independently analyzed through this pipeline. In both cases we have found 245 (98.79%) complete gene models out of 248 ultra-conserved core eukaryotic genes (CEGs) present in the M. phaseolina genome.
This Whole Genome Shotgun project has been deposited at GenBank under the accession AHHD00000000. The version described in this paper is the first version, AHHD01000000.
We used Program to Assemble Spliced Alignments (PASA), a eukaryotic genome annotation pipeline , to generate potential training gene sets that were used to train other ab initio gene prediction software like Augustus v. 2.5.5  and Glimmer HMM v. 3.0.1  for predicting M. phaseolina genes.
A total of 13,481 gene assemblies under 11,414 gene clusters predicted by PASA, along with cDNA of M. phaseolina and other 4 closely related species (Aspergillus nidulans, M. grisea, P. marneffei, and S. cerevisiae) were used for this training purpose. Augustus and Glimmer independently predicted 12,231 and 11,432 ORFs, respectively, which were then subjected to correct gene structure annotation by EVidenceModeler (EVM) . EVM, when combined with PASA, yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms.
We also used Analysis and Annotation Tool (AAT)  as a pipeline for transcript and protein alignments. To align the M. phaseolina genome independently, we used cDNA of its own with cDNA of other 4 related fungal species (described earlier) and fungal protein databases downloaded from Fungal Genome Research (http://fungalgenome.org/data/PEP/). These two along with PASA transcript alignments were used as transcript and protein evidences for EVM. Finally, a total of 14,249 ab initios were predicted of which 11,975 were corrected gene structures from EVM along with 2,274 genes from Augustus and Glimmer underlying the intergenic region of EVM predictions.
The ab initios were then subjected to InterProScan  and nr BLAST (threshold value of E < 10-5)  for searching functional domains and homologies. Results revealed a total of 10,250 genes with either potential domains or homologies with other fungal proteins and 3,999 novel genes out of 14,249 predicted protein coding genes of M. phaseolina.
Transfer RNA-coding regions were searched using tRNAscan-SE  and rRNA was searched using RNAmmer . Repetitive elements were predicted by using RepeatMasker (http://www.repeatmasker.org/) and Putative Transposon elements were identified by Transposon-PSI (http://transposonpsi.sourceforge.net), a program that performs tBLASTn searches using a set of position specific scoring matrices (PSSMs) specific for different TE families.
The genomes of M. phaseolina and F. oxysporum were compared using MUMmer  tools to identify regions of synteny, with Aspergillus fumigatus used as a reference genome. For visualizing multiple genome comparisons, SyntenyMiner (http://syntenyminer.sourceforge.net/) was also used to visualize orthologous gene clusters among the organisms.
To identify proteins involved in carbohydrate metabolism, we used the Carbohydrate Active Enzymes (CAZy) database (http://www.cazy.org/). All CAZy related GenBank accession numbers were first downloaded from the CAZy website and then sequences were downloaded from NCBI using a custom python script. These sequences were searched by RPS-BLAST against the Pfam database to reveal protein domain architectures and compared against the Pfam domains identified in M. phaseolina proteins. The sequences were also compared by BLASTp against all M. phaseolina proteins to confirm the Pfam database matches.
Putative secondary metabolites (PKS and NRPS) were identified by using antiSMASH . Pathogenicity and virulence associated genes were identified using the PHI-base database (http://www.phibase.org/), a database that catalogs experimentally verified pathogenicity, virulence, and effector genes from fungal, Oomycete, and bacterial pathogens which infect animal, plant, fungal and insect hosts. Briefly, all sequences were first downloaded from the PHI-base database and then compared by BLASTp against all M. phaseolina proteins to confirm the presence of homologous genes in the M. phaseolina genome.
The in silico predictions were manually curated and tested experimentally with several larger gene families such as CAZymes and lignin degrading protein coding genes by PCR.
Construction of phylogenetic tree
Orthologous relationships were determined for a 14-way clustering from the complete genomes of the 14 fungal taxa: Aspergillus nidulans, Fusarium oxysporum, Penicillium chrysogenum, Grosmannia clavigera, Magnaporthe grisea, Podospora anserina, M. phaseolina, Botryosphaeria dothidea, Laccaria bicolor, Phanerochaete chrysosporium, Postia placenta, Yarrowia lipolytica, Saccharomyces cerevisiae, and Trichoderma reesei. All predicted protein sequences for the genomes of these fungi were searched against each other using BLASTp and clustered into orthologous groups using MCL-10-201. Single-copy orthologs were identified as the clusters with exactly one member per species. Phylogenetic relationships were determined from these single-copy orthologs and were aligned with MAFFT . Alignments were pruned with Gblocks .
The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model  and bootstrap value was 1,000. Evolutionary analyses were conducted in MEGA5 .
Metabolic pathway reconstruction
The Yeast-5 Pathway Studio database (Ariadne Genomics) contains molecular interactions extracted by MedScan natural processing technology for all fungal species from over 1,000,000 Pubmed abstracts annotated with Medical Subject Headings (MeSH) term “Yeast OR Fungi” and from more than 100,000 full-length open access articles. Proteins in the Yeast-5 database are annotated with Entrez Gene and GenBank identifiers from six fungal genomes: S. cerevisiae, Schizosaccharomyces pombe, Cryptococcus neoformans var. neoformans JEC21, A. fumigatus Af293, A. nidulans FGSC A4, and Aspergillus niger CBS 513.88. To facilitate analysis of the M. phaseolina genome, we have added to the Yeast-5 database annotated proteins from recently sequenced genomes for Ustilago maydis 52 , Metarhizium anisopliae ARSEF 23, and Metarhizium acridum CQMa 102 .
The Yeast-5 database contains a collection of 303 metabolic pathways copied from MetaCyc. Pathways are represented as a collection of complex database entities called functional classes (enzymes) and a set of corresponding Chemical reactions. Every functional class in the database can contain an unlimited number of protein members performing corresponding enzymatic activity. Usually a set of members includes paralogs of catalytic and regulatory subunits necessary to perform enzymatic activity. The original Yeast-5 database has 444 functional classes with members out of a total of 769 pathways. We augmented the Yeast-5 database with an additional 168 metabolic pathways from RiceCyc and PoplarCyc to increase the pool of candidate enzymatic reactions for metabolic reconstruction of M. phaseolina.
The database of M. phaseolina interologs (predicted interactions) and reconstructed pathways was created by annotating proteins in the Yeast-5 database with M. phaseolina ortholog identifiers. Orthologs for M. phaseolina proteins in other fungal organisms were calculated using the best reciprocal hit method from full length protein sequence similarities calculated from BLAST alignments as described previously . First, orthologs were calculated between M. phaseolina and each of the nine fungal genomes supported in the Yeast-5 database. The best ortholog was then chosen for each M. phaseolina protein among the nine possible ortholog pairs. All interactions extracted for M. phaseolina orthologs were exported from the Yeast-5 database along with pathways containing M. phaseolina orthologs. M. phaseolina interologs and predicted pathways were imported into a new Pathway Studio database for manual pathway reconstruction and genome analysis. Pathways that contained at least one functional class with no M. phaseolina orthologs were manually curated to achieve one of the following three outcomes: a) close the gap by finding members in the M. phaseolina genome and adding them to empty functional classes, b) dismiss entire pathway if gap cannot be closed, or c) remove enzymatic step if empty functional class represents redundant path in the pathway.
To identify paralog families in the M. phaseolina genome, we used BLASTP to calculate all possible protein homologs in the M. phaseolina genome and then selected only homologs that have 30% shared amino acid similarity calculated as the average sequence similarity between two homologs. Paralog pairs were imported into the M. phaseolina database as a new type of interaction called “Paralog”. Protein functional families were identified as clusters in the global Paralog network using the direct force layout algorithm. To assign biological function to each Paralog cluster we found Gene Ontology groups enriched by the proteins in the cluster or simply inspected available functional annotation for proteins in the cluster.
Phenotype microarray analysis
Phenotype microarray (PM) is a high-throughput technique for screening the response of an organism against various substrates. M. phaseolina was evaluated using panels PM1 to PM10 (Biolog Inc.). The PM plates are denoted as PM1 and PM2A MicroPlates for Carbon sources; PM3B MicroPlate for Nitrogen sources; PM4A MicroPlate for Phosphorus and Sulfur sources; PM5 MicroPlate for Nutrient supplements; PM6, PM7, and PM8 MicroPlates for Peptide nitrogen sources; PM9 MicroPlate for Osmolytes; and PM10 MicroPlate for pH. There are 96 wells in each plate, so the substrate utilization patterns by the fungus were evaluated against a total of 960 substrates including 9 negative and 4 positive controls (see Additional file 3 for list of substrates).
M. phaseolina was grown on potato dextrose agar (PDA) at 30°C for 72 hr. Active hyphae was inoculated into 30 ml of liquid potato dextrose medium and incubated at 30°C for 60 hr. The mycelia from the liquid culture (~2 g) were washed with physiological buffer solution (10 mM sodium phosphate, pH 7.0, filter sterilized) at least 7 times to remove nutrient contamination. After washing, the mycelia were aliquoted into two 1.5 ml microcentrifuge tubes and macerated by a pellet pestle motor for 15 minutes. One ml of filamentous fungi inoculating fluid (FF-IF, Biolog) was added into each tube. The solution was transferred into a 15 ml falcon tube and 6 ml of FF-IF was added. The macerated mycelia were centrifuged at 3500 rpm for 5 minutes and the supernatant was discarded. Six to 8 ml of FF-IF was added to the tube and let stand for at least 40 minutes to settle down the bigger mycelial clumps. Approximately 2 ml from the clear upper portion was harvested for measuring the transmittance at OD590nm. The transmittance was adjusted to 62% for usable concentration of inoculums.
The inoculum suspensions along with different stock solutions were prepared as per the protocol standardized by Biolog (“PM Procedures for Filamentous Fungi”, 25-Aug-07). All the wells of PM 1–10 were inoculated with 100 μl of the inoculum suspensions and incubated in the OmniLog machine at 30°C for 96 hr. The instrument was programmed for recording data from each well in 15 minute intervals. After completion of incubation, the recorded data were extracted and analyzed using TIBCO Spotfire v220.127.116.110. The experiment was replicated three times.
It was observed that there was almost no growth in all the plates up to 40 hr of incubation. We therefore considered the 40 hr incubation readings as the baseline in our analysis. The data for each 8 hrs from three replications were averaged for analysis. Based on the 96 hr reading, all the figures (Figure 3; Additional file 2: Figures S6-S11) were constructed using OmniLog value ≥ 200.
Verification of lignin degradation
We measured the ability of M. phaseolina to degrade lignin on modified Boyd and Kohlmeyer (B&K) agar medium containing 4 mM guaiacol along with 0.001% azure B dye . The medium was inoculated with this fungus and incubated at 30°C in the dark. After 4 days of incubation, a halo of intense brownish white color was formed under and around the fungal colony, and the azure B dye turned from blue to white (Additional file 2: Figure S5). The growth of intense brownish white color fungal colony indicates a positive reaction resulting from guaiacol oxidation . The disappearance of the blue colored medium is also evidence of peroxidase production ( Additional file 2: Figure S5). The discoloration of azure B dye has been positively correlated with the production of lignin peroxidase and Mn dependent peroxidase, but it does not indicate the presence of laccase .