Genome sequence and virulence variation-related transcriptome profiles of Curvularia lunata, an important maize pathogenic fungus

Background Curvularia lunata is an important maize foliar fungal pathogen that distributes widely in maize growing area in China. Genome sequencing of the pathogen will provide important information for globally understanding its virulence mechanism. Results We report the genome sequences of a highly virulent C. lunata strain. Phylogenomic analysis indicates that C. lunata was evolved from Bipolaris maydis (Cochliobolus heterostrophus). The highly virulent strain has a high potential to evolve into other pathogenic stains based on analyses on transposases and repeat-induced point mutations. C. lunata has a smaller proportion of secreted proteins as well as B. maydis than entomopathogenic fungi. C. lunata and B. maydis have a similar proportion of protein-encoding genes highly homologous to experimentally proven pathogenic genes from pathogen-host interaction database. However, relative to B. maydis, C. lunata possesses not only many expanded protein families including MFS transporters, G-protein coupled receptors, protein kinases and proteases for transport, signal transduction or degradation, but also many contracted families including cytochrome P450, lipases, glycoside hydrolases and polyketide synthases for detoxification, hydrolysis or secondary metabolites biosynthesis, which are expected to be crucial for the fungal survival in varied stress environments. Comparative transcriptome analysis between a lowly virulent C. lunata strain and its virulence-increased variant induced by resistant host selection reveals that the virulence increase of the pathogen is related to pathways of toxin and melanin biosynthesis in stress environments, and that the two pathways probably have some overlaps. Conclusions The data will facilitate a full revelation of pathogenic mechanism and a better understanding of virulence differentiation of C. lunata. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-627) contains supplementary material, which is available to authorized users.


Background
The maize leaf spot caused by C. lunata have ever made tremendous yield loss of maize in 11 provinces of maize growing areas in China since the 1990s. Especially in northern China, for instance, it occurred over 192000 hm 2 and led to 8 million kg yield loss in Liaoning province in 1996 [1][2][3]. Many research projects have been designed to discover the disease occurrence pattern, resistance breeding and integrated control of the disease. As the application of resistance varieties containing tropic and sub-tropic germplasms in large growing areas, the incidence of disease infection and its severity were declined massively and less damage was observed in field. However, in recent years, the disease has bounced back again and caused serious damages in some maize growing areas such as Liaoning, Anhui and Henan province etc. The main cause of the disease recurrent was expected to link to large area of monoculture with same or similar resistant germplasm which then become a matrix to induce pathogen virulence variation.
C. lunata is a major organism causing the leaf spot disease in maize [4], but Cochliobolus lunatus as teleomorph of C. lunata is expected to form when the pathogen suffer stress condition in maize field in most cases. The teleomorph is not a major form to cause the foliar disease. C. lunata has 5 pathogenic types from high to low virulence in China. The distribution of pathogenic types varies in different maize growing areas. C. lunata has broad host range including maize, wheat, barley and sorghum and other grasses. Meanwhile a multiple virulence factors have been demonstrated to be involved in pathogen infection to maize, such as cellulose [5], non-host specific toxin (methyl 5-(hydroxymethyl)-furan-2-carboxylate) [6], melanin [7]. It is worth mentioning that some of virulence related genes have been successfully cloned in previous work such as clt-1 regulating non-host specific toxin production, brn1 being required for DHN melanin synthesis [8], two mitogen-activated protein kinases (MAPK) encoding genes (clk1 and clm1) [9,10]. Previous study preliminarily showed that brn1 was not only involved in melanin synthesis, but also associated with toxin biosynthesis. Whereas, the role of these genes in collaborated regulation of toxins and melanin production remains unknown. In addition, several novel virulence-associated genes and large suites of enzymes such as polyketide synthases (PKS) being involved in secondary metabolism were also associated with pathogenicity in most fungal pathogens [11,12]. However, there is no detailed knowledge to reveal the genetically synergistic regulation mechanism of those virulence-associated genes in the C. lunata so far.
C. heterostrophus (agamotype: B. maydis) causing a widely distributed maize fungal disease, southern leaf blight, belongs to Cochliobolus as well as C. lunatus (agamotype: C. lunata). As the genome of B. maydis has been accomplished in recent years, it provides a great deal of bioinformation for globally understanding the detailed infection mechanism of C. lunata and its interaction mechanism with maize. Based on previous work, it was found that C. lunata showed high homology with B. maydis in genetic evolution, thus C. lunata could be profiled in genome wide through comparative genome analysis. However, until now, very little information is known about developmental and pathogenic process of C. lunata from a global view, even though a normalized full-length cDNA library of C. lunata was constructed in the previous study [13]. As we mentioned, the production of melanin and non-host specific toxin may share the common gene regulation network or overlap some node genes responsible for cross-talking between both regulation network systems. Three velvet-like protein-encoding genes being interactive with brn1 gene were successfully screened in C. lunata by yeast two hybrid [14]. Further work showed that velvet-like protein was involved in the regulation of pathogen pathogenicity [15].
PAMP (pathogen associated molecular pattern) model is widely applied in a group of crops to illustrate basic immunity or mechanism of C. lunata-maize interaction. Previous study showed that planting resistant maize varieties for long term in a certain areas would induce virulence variation of C. lunata and some genes served as hallmarks displaying the virulence variation, most of which are secreted protein-encoding genes. Similarly a large amount of secreted proteins have been identified in B. maydis (C. heterostrophus), which are responsible for specific interaction of B. maydis with maize germplasm [16]. Taken together, it is expected that multiple virulence-associated genes control pathogen infection in host plants. Although recently the functions of virulenceassociated genes were characterized, the details on the relations among the expression modes of these genes still remain unclear, owing to lack of global understanding of C. lunata genome sequence. However, the global analysis on genome sequence of the pathogen C. lunata will be imperative and facilitate more rapid identification of genes responsible for pathogenicity and pathogenplant interactions. Hence, in this work, firstly the draft genome sequence of C. lunata CX-3 with high virulence was presented, and comparative analyses of genome repertoire among C. lunata CX-3 isolated from maize, C. lunata m118 isolated from sorghum and B. maydis C5 were conducted in pathogen proliferation and development, virulence and genetic variation, secondary metabolism, plant-pathogen interactions, signaling pathway and detoxification. Secondly, RNA-seq was applied to comparatively analyze transcriptional profiles of C. lunata WS18 with weak virulence and its virulence-increased variant WS18-Pob21-11. This work will provide abundant information for globally uncovering the infection mechanism of C. lunata to maize, thereby the new insight to the pathogen development and infection mechanism would be generated in genome wide. Last but not least comparative genomic and transcriptome analysis for virulence variation of C. lunata will contribute to better understanding what is required to develop flexible strategies to control the pathogen infection. proteins in C. lunata CX-3 was 7.5% (840 proteins), similar to 7.6% in C. lunata m118 (834), but higher than 6.7% in B. maydis C5 (886) ( Table 1). Although the proportions of secreted proteins in the two plant pathogens were close to 7-10% in other sequenced plant pathogens such as Magnaporthe oryzae [17,18], they were far lower than that in three insect pathogenic fungi (16.2% in Cordyceps militaris, 17.6% in Metarhizium anisopliae and 15.1% in Metarhizium acridum) [19,20].
Approximately 55% of C. lunata CX-3 genes have >90% of amino acid sequence identity with C. lunata m118 ( Figure 1A). Although the sexual stages of both C. lunata and B. maydis are Cochliobolus species, only about 30% of C. lunata CX-3 genes have >90% of amino acid sequence identity with B. maydis C5, and the proportion is far lower that between M. anisopliae and M. acridum (more than 50%) [19]. Comparative genomic analysis showed that C. lunata CX-3, C. lunata m118 and B. maydis C5 have a large number of species-specific and strains-specific genes ( Figure 1B). 10082, 9835 and 10476 orthologous core genes and 820, 864 and 1924 species-specific genes were identified in C. lunata CX-3, C. lunata m118 and B. maydis C5, respectively. The proportion of species-specific genes in C. lunata CX-3 (about 7.3%) was similar to C. lunata m118 (7.9%), but far lower than B. maydis C5 (14.4%) ( Figure 1B), showing that C. lunata CX-3 has more differences with B. maydis C5 followed by C. lunata m118. In addition, 859 and 837 strains-specific genes were also identified in C. lunata CX-3 and C. lunata m118, respectively ( Figure 1B), showing that two different C. lunata strains have distinct difference in genome. Genomic islands (GIs) are part of a genome and contain at least three contiguous gene-encoding proteins that do not exist in the reference genome [21]. GIs have many functions, especially they are involved in symbiosis or pathogenesis and may help organism's adaptation. GIs are classed into many subclasses based on their function. For example, GIs associated with pathogenesis are often called as pathogenicity islands (PAIs), and GIs containing many antibiotic resistant genes are referred to as antibiotic resistance islands (ARIs). It was found by reciprocal analysis that C. lunata CX-3 has 40 and 16 GIs separately in comparison to C. lunata m118 and B. maydis C5, which probably play key roles in virulence.
Pfam matches were performed to identify 2830 conserved protein families containing 8471 proteins in C. lunata CX-3 genome close to C. lunata m118 with 2827 families containing 8181 proteins and slightly lower than B. maydis C5 with 2860 families containing 8886 proteins. In phytopathogenic fungi, family expansions were identified in glycoside hydrolases (GHs), cutinases and pectin lyases compare to insect pathogenic fungi [19], showing the important roles of GHs, cutinases and pectin lyases in pathogenicity of phytopathogenic fungi. As a plant pathogen, C. lunata CX-3 has a large number of GHs, cutinases and pectin lyases (Additional file 1: Table S1). In comparison with C. lunata m118, C. lunata CX-3 has family expansions in transposase, fungal specific transcription factors, major facilitator superfamily, ABC superfamily, cytochrome P450, protein kinase, serine protease, subtilisin and glucosidase, and family constrictions in G-protein coupled receptor and heterokaryon incompatibility (Additional file 1: Table S1 and S2), which are expected to be crucial for the fungal survival in varied stress environments.
To excavate potential pathogenic genes, a Blast search of the C. lunata CX-3 genome was performed against the pathogen-host interaction (PHI) database, which collected the experimentally proven genes affecting the outcome of pathogen-host interactions of fungi, bacteria and oomycetes [23]. 17.0% of predicted genes in C. lunata CX-3 (1904), identified by a Blast search of C. lunata CX-3 genome against the PHI database, are putatively related to pathogen-host interaction. The percentage of PHI genes in C. lunata CX-3 (17.0%) is close to other plant pathogenic fungi (16.6% in C. lunata m118, 15.0% in B. maydis C5, 17.0% in F. graminearum and 17.7% in Aspergillus flavus) (Additional file 1: Table S2).

Transposases and repeat-induced point mutation
Like rice pathogen Magnaporthe grisea, C. lunata has a high degree of genetic variability and prone to form novel pathogenic variants to infect the resistant host. Transposases and repetitive elements make a greater contribution to genetic instability and pathogenic variation [24]. The transposable elements are closely associated with the virulence of most ascomycetes including C. lunata. Thus, an understanding of the change of transposons and repeat elements in C. lunata not only provides an insight into their (B) Reciprocal Blast analysis of the protein sequences among the three pathogenic fungi C. lunata CX-3, C. lunata m118 and B. maydis C5 with a cut-off E value of 1e-5. CX-3, C. lunata CX-3; m118, C. lunata m118; C5, B. maydis C5. In relative to B. maydis C5, C. lunata CX-3 and C. lunata m118 have 820 (7.3%) and 864 (7.9%) species-specific genes; In relative to C. lunata, B. maydis C5 has 1924 (14.4%) species-specific genes. C. lunata CX-3 and C. lunata m118 have 859 and 837 strains-specific genes, respectively. (C) A phylogenetic tree constructed with the Dayhoff amino acid substitution model representing the evolutionary relationships of C. lunata and other fungi. MYA, million years ago. B. maydis = C. heterostrophus. impact on genome evolution and also contributes to shedding light on mechanisms of pathogenic variation.
Repeat-induced point mutation (RIP) in ascomycete fungi is a genome defense, which hyper-mutates repetitive DNA and limit the accumulation of transposable elements [25]. RIP preferentially introduces CA/G dinucleoutides to TA during the sexual cycle in many fungi, such as Metarhizium spp. and Trichoderma spp. etc. 68 paired C. lunata CX-3 paralogous genes showing >80% nucleotide sequence identities were used to estimate nucleotide mutation frequencies. The results showed that there was strong mutation bias: C:G to T:A (Figure 2), which is introduced by repeat induced point mutations (RIP) [26]. The C. lunata CX-3 genome contains many transposable elements, suggesting that they can escape damage by RIP.
Protein families involved in degrading plant cell wall and cuticle C. lunata as a plant pathogen need to break through the passive defense (cell wall, cuticle) of plant for successful infection [27]. Therefore, it would be expected to produce CX-3: C. lunata CX-3. a CCC core: C. lunata CX-3, C. lunata m118 and B. maydis C5 genes grouped with a cutoff E value of 1e-5 during reciprocal Blast analysis. b CX-3-m118 specific: C. lunata CX-3 and C. lunata m118 specific genes in relative to B. maydis C5 grouped with a cutoff E value of 1e-5. c CX-3-C5 specific: C. lunata CX-3 and B. maydis C5 specific genes in relative to C. lunata m118 grouped with a cutoff E value of 1e-5. d PHI genes, pathogen-host interaction genes identified by Blast analysis against the PHI database with a cutoff E value of 1e-5. Identity was estimated using amino acid sequences. GPCR, G-protein-coupled receptor; MFS, major facilitator superfamily; NA, not available. B. maydis = C. heterostrophus.
and secrete large numbers of extracellular degrading enzymes to degrade the plant cell wall and cuticle such as glycoside hydrolases, pectinase and cutinase, in which glycoside hydrolase (GH) is one of the representative. The number of GH genes in C. lunata CX-3 (219) is close to C. lunata m118 (214) and the average in plant pathogenic fungi (211) and less than B. maydis C5 (241) (Additional file 1: Table S4), but far larger than that in three insect pathogenic fungi (159 in M. anisopliae, 130 in M. acridum, 135 in C. militaris) [19,20]. There is no distinct difference in the number of GH genes for each family except for GH18 and GH43 family among C. lunata CX-3, C. lunata m118 and B. maydis C5. 31.9% of GH genes in C. lunata CX-3 (75) are identified to be putative PHI genes, 34.1% in C. lunata m118 (83) and 34.0% in B. maydis C5 (82), which are close to the average in plant pathogenic fungi (31.5%) but higher than that in insect fungi Metarhizium spp.
In the ABC transporters, the multidrug resistance (MDR) and the pleiotropic drug resistance (PDR) subfamilies function in resisting antifungal agents [31]. C. lunata CX-3 has 9 MDR subfamily of transporters, more than C. lunata m118 (6), B. maydis C5 (7) and the average of plant pathogens (8) (Additional file 1: Table S7). However, the number of PDR transporters in C. lunata CX-3 (10) is in line with C. lunata m118 (10), B. maydis C5 (10) and the average of plant pathogens (10). The two drug:H + antiporter (DHA) subfamilies (DHA1 with 12 spanner and DHA2 with 14 spanner) of MFS transporter can make toxic compounds be secreted into outer environment [31]. DHA1 subfamily of transporters well exist in the C. lunata CX-3 genome (49 in CX-3 versus 52 in m118, 51 in C5 and an average of 39 in plant pathogenic fungi) (Additional file 1: Table S7). In comparison to DHA1, little DHA2 subfamily of transporters are present in C. lunata CX-3 (26) as well as C. lunata m118 (27) and B. maydis C5 (15), however which is higher than the average of plant pathogenic fungi. These results show that there are differences in gene family expansions occurred in DHA1, DHA2 and MDR among C. lunata CX-3, C. lunata m118 and B. maydis C5. Transcriptome sequencing of both the low virulent C. lunata strain and its virulence-increased variant showed that 16 transporters were up-regulated in the virulenceincreased variant response to the selective pressure of host in the pathogen-plant interactions. Half of these transporters (8) were MFS transporters including CL01123, CL02605, CL03193, CL04682, CL04921, CL05805, CL08969 and CL09075, of which CL01123 and CL05805 were DHA1 subfamily of MFS transporters and CL04682 and CL04921 were DHA2 subfamily of MFS transporters. In the 16 transporters, there was only one MDR subfamily of ABC transporter (CL06556).

Protein families for detoxification
Cytochrome P450 enzymes (CYPs) have a function with the conversion of hydrophobic intermediates of metabolisms and the detoxification of natural and environmental pollutants [32]. The genome sequencing of C. lunata facilitates the excavation of novel CYPs from C. lunata. Through the genome search against cytochrome P450 database [33], 137 P450 genes were identified in C. lunata CX-3, more than 106 in C. lunata m118, but less than 162 in B. maydis C5 (Additional file 1: Tables S1 and S8). 23 subfamilies of CYPs are presented in C. lunata CX-3 and/ or C. lunata m118 but absent in B. maydis C5, and 13 subfamilies of CYPs are absent in both C. lunata CX-3 and C. lunata m118 but present in B. maydis C5, thus showing the significant differences in CYP families expansions in these genomes. Interestingly, there are a high percentage of PHI genes (about 70% or higher) in the CYPs of plant pathogenic fungi, for example, 112/ 137 (81%) in C. lunata CX-3, 95/107 (88%) in C. lunata m118 and 114/162 (70%) in B. maydis C5 (Additional file 1: Table S2), showing that most of CYPs are involved in the pathogen-plant interactions. CYP65 is the biggest subfamily of P450s both in C. lunata CX-3 (17), C. lunata m118 (12) and B. maydis C5 (21) (Additional file 1: Table S8). Additionally, CYP505 subfamily of P450s are present in these genomes (3 in CX-3, 2 in m118 and 4 in C5). It was reported that CYP65 catalyzed the epoxidation reaction in the mycotoxin trichothecene biosynthesis of F. graminearum [34], and CYP505 as well as CYP65 were involved in the fumonisin biosynthesis of a maize pathogen F. verticillioides [35,36], suggesting that CYP505s and CYP65s in C. lunata and B. maydis probably be related to the toxin biosynthesis and thus play roles in fungal virulence.
Likewise, three main signaling pathways (MAPK, cAMP and Ca), being mediated by protein kinases, control virulence-associated development of the fungal pathogen. Fungal protein kinases are classed into 9 groups, of which STE and CMGC (MAPK family) kinases are involved in MAPK pathway, AGC (PKA family) kinases in cAMP pathway, and CMGC (CLK and RCK family), CAMK and AGC (PKC family) kinases in Ca signaling pathway. The C. lunata CX-3 genome contains 153 protein kinases and it is slightly more than 147 in C. lunata m118 and 140 in B. maydis C5 and close to the average (154) of plant pathogenic fungi (Additional file 1: Tables S1 and S10). Because signal transduction plays a critical role in fungal development and infection [20], most of these protein kinases are highly homologous to gene-encoded proteins from PHI database (127/153 in CX-3, 124/147 in m118 and 114/140 in C5) (Additional file 1: Tables S1 and S2). Although there are no distinct differences in the number of protein kinases in the three genomes, the function of signal pathways regulated by protein kinases in different organisms are distinct. For example, Pmk1 MAPK pathway in M. grisea is involved in pathogenesis and regulates appressorium formation, however its homologous MAPK pathway in S. cerevisiae is related to both the pheromone signaling and filamentation.
S. cerevisiae has five MAPK pathways such as Fus3, Kss1, Hog1, Mpk1 and Smk1, which are involved in pheromone response, filamentous growth, osmoregulation, cell wall integrity and spore wall assembly, respectively [42]. M. grisea has three MAPK pathways such as Pmk1, Mps1 and Osm1, which are homologous to Fus3 or Kss1, Mpk1 and Hog1 pathways of S. cerevisiae, respectively [18]. Pmk1 pathway regulates appressorium formation, penetration and colonization; MPS1 pathway invasive growth, conidiation and penetration; OSM1 osmoregulation and stress response [18]. Blastp search analysis reveals that the homologues of MAPK pathways of S. cerevisiae and M. grisea were identified in the C. lunata CX-3 genome (Additional file 1: Table  S11). The homologues of pmk1 and mps1, clk1 (CL06419) and clm1 (CL11087) of C. lunata, were cloned and characterized in our previous study. Clk1 gene influenced conidiospore formation, growth, cell degrading enzymes activity and virulence of C. lunata [9], which supported a result that homologues of pmk1 were essential for fungal pathogenicity in all plant pathogens [43]. Clm1 gene regulated cell-wall integrity, conidiospore formation, infection, cell degrading enzymes activity [10].
Cyclic AMP (cAMP) signaling pathway was involved in the induction of appressorium formation and turgordriven process, therefore leading to plant infection [18]. cAMP can activate downstream effectors such as cAMPdependent protein kinase (PKA) [44]. Catalytic subunit (CPKA) of PKA was required for appressorium maturation of M. grisea [45]. The C. lunata CX-3 genome contains two gene-encoded PKA catalytic subunits (CL03564 and CL01641), therefore proper studies of these genes will contribute to elaborating the cAMP signaling mediating appressorium formation.
Histidine kinase (HK) phosphorelay signaling as a major mechanism is used by some organisms (bacteria, slime molds, plants and fungi) to sense and adapt to their environment [20,46]. In fungi, HK signaling mediates multi biological processes such as secondary metabolite biosynthesis, stress response, virulence and differentiation [46][47][48]. The pathway is two-component signaling pathway containing a sensor HK and a response regulator (RR). HKs well exist in C. lunata CX-3 (19), C. lunata m118 (19) and B. maydis C5 (20) and other plant pathogenic fungi (9 to 21) (Additional file 1: Table S10). It's worth noting that all HKs are PHI genes.

Virulence-associated secondary metabolite genes
One primary goal of studying the genome of a fungal pathogen is to identify secondary metabolites that are served as virulence factors such as host specific toxins (HST), non host specific toxin (NHST) and melanin [16]. The impact of HST/NHST on plant hosts was understood early, since they make the producing fungi be highly virulent to crops. C. lunata CX-3 is capable of producing diverse secondary metabolites such as NHST and melanin that aid in niche exploitation and pathogenicity. Although toxin and melanin are two key pathogenic factors in C. lunata, the core genes for their biosynthesis are not identified in C. lunata yet. The C. lunata CX-3 genome provides the feasibility to identify the core genes for melanin and toxin such as non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS). It was reported previously that PKS1 and PKS2, two PKS of B. maydis C5, functioned as the biosynthesis of T-toxin, and HST1, one NPRS of Bipolaris zeicola played a key role in the HC-toxin biosynthesis. The number of core genes for secondary metabolites in C. lunata CX-3 (36) is close to the average of other plant pathogenic fungi (41) ( Table 3). The C. lunata CX-3 genome encodes similar amounts of NRPS (6), PKS (16) and NRPS-PKS hybrid genes (HYBRID) (2) with C. lunata m118 (5 NRPS, 14 PKS and 2 HYBRID). While, the numbers of NRPS and PKS in the two C. lunata genomes are slightly less than B. maydis C5 (9 NRPS and 19 PKS), and the latter has no HYBRID. These results show that the differences in the number of NPRS and PKS between strains or species are consistent with their adapting to the diverse environments and hosts.
C. lunata CX-3 PKSs were divided into different clusters based on phylogenetic analysis of the ketoacyl CoA synthase (KS) domain of PKSs with other known PKS for melanin or toxin biosynthesis as references ( Figure 3A). There were differences in the domains of different PKSs in C. lunata CX-3 based on domain analysis, but the domains of C. lunata CX-3 PKSs were similar with known PKSs. Interestingly, PKSs were grouped into two kinds based on PKS domain components. One kind was reducing PKSs with KS, acyltransferase (AT) and dehydratase (DH) domains at least, including 12 C. lunata CX-3 PKSs and 6 other known PKSs being involved in different kinds of toxin biosynthesis such as Alternaria alternata ACT-toxin, Gibberella zeae zearalenone, Gibberella moniliformis fumonisin, Aspergillus ochraceus ochratoxin and C. heterostrophus T-toxin ( Figure 3C). The other kind was non-reducing PKSs without DH domain, including 2 C. lunata CX-3 PKSs and 9 other known PKSs for melanin biosynthesis. It was suggested that C. lunata CX-3 has conserved PKSs related to the biosynthesis of toxin and melanin. Notably CL04686 and CL09981 both encoding PKSs were up-regulated in virulence-enhanced strain. However, phylogenetic and modular analyses suggested that the protein structures of C. lunata CX-3 NRPSs were obviously different from other known NRPSs being involved in the biosynthesis of mycotoxins such as HC-toxin of Cochliobolus carbonum, AM-toxin of A. alternate, gliotoxin of Aspergillus fumigatus and enniatin of Fusarium equiseti (Additional file 2: Figure S1).

Small, cysteine-rich peptides and effector proteins
The small cysteine-rich proteins (SCRPs) were secreted directly into host plant cells and perform multiple biological functions such as host recognition or colonization [49,50], the induction of host HR [51][52][53], pathogenicity [54], and antimicrobiosis [55,56]. Some SCRPs as virulence effectors showed carbohydrate binding activity that not only facilitated fungal virulence by perturbing host cell signaling or interfering with host recognition of the pathogen or suppressing pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI) [18,57], but also induced effector-triggered immunity (ETI) governed by a gene-for-gene system in plants containing homologous resistance (R) proteins in the pathogen-host interaction [58]. A recently identified class of conserved effectors in fungi are LysM effectors that contain lysine motifs (LysMs) other than recognizable protein domains [59]. It is feasible to under-predict putative SCRPs in the C. lunata CX-3 genome due to the short length (<150 amino acid residues) of SCRPs, which contribute to extending the annotation of the C. lunata CX-3 genome with a specific search (see Materials and Methods). 76 potential SCRPs were found in the genome, ranging in size from 72 to 150 residues (Additional file 1: Table S12). Among these 76 predicted SCRPs in C. lunata CX-3, CL08250 (147 residues) with a LysM showed 34% of amino acid identity with Ecp6 protein of Cladosporium fulvum, which was a LysM-containing effector and virulence factor [60]. CL08356 (92 residues) containing an antifungal protein domain showed 48% of amino acid identity with an antifungal protein (GenBank: CAR79018) of Fusarium avenaceum. CL11021 (125 residues) containing a hydrophobin domain showed 45% of amino acid identity with a hydrophobin (GenBkan: ABY48863) from B. maydis. Several secreted hydrophobins as fungal effectors were examined for their roles in virulence such as Hum3 and Rsp1 of U. maydis [58,61], fungal development and plant colonization such as Mhp1 of M. grisea [62]. Additionally, although >150 residues in size of a secreted protein (CL01513, 234 residues), it contains a LysM (Additional file 1: Table S12) and shows 42% of amino acid identity with Ecp6 (GenBank: ACF19427) of Passalora fulva. As described above, cysteine-rich polypeptides such as CL01513, CL08250, CL08356 and CL11021 were served as potential candidates for pathogen effectors in C. lunata CX-3.

Comparative transcriptome analysis for pathogenicity variation
In the previous works, under the successive selection pressure of resistant maize germplasm, C. lunata virulence was enhanced [4]. Although some hallmarks related to virulence variation were screened, the variation mechanism at the transcriptional level was not deeply understood yet. In order to further shed light on the molecular regulation mechanism for virulence differentiation of C. lunata, high-throughput RNA-Seq was performed to compare the transcriptional differences between C. lunata WS18 with low virulence and its virulence-enhanced variant WS18-Pob21-11 and to further screen the crux genes that involved in the virulence variation of C. lunata under the host selection pressures. >9.7 million tags were sequenced for each strain and 76.2% and 74.6% of predicted C. lunata CX-3 genes were expressed in WS18 and WS18-Pob21-11 respectively. However, the transcriptional profile of WS18-Pob21-11 was different from WS18 after successive host direction selection (Figure 4). A total of 200 genes including 32 putative PHI genes were significantly (PDR ≤ 0.001,) up-regulated and 164 genes including 35 putative PHI genes down-regulated in the virulence-enhanced strain (Additional file 1: Table S13), showing that C. lunata presented the obvious change at transcriptional regulation under the continued selective pressure from resistant host. In general, differential genes were involved in transport, oxidation-reduction process, metabolic process, mycelium development, response to stress, pigment biosynthetic process and protein metabolic process, etc., most of which were significantly up-regulated under the successive selection pressure from host. In contrast, almost all differentially expressed genes being involved in the carbohydrate metabolism, protein modification and cellular component organization were significantly downregulated ( Figure 5).
Like other plant pathogenic fungi, C. lunata could produce 1,8-dihydroxynaphthalene (DHN) melanin and toxin as two important pathogenic factors to make plant infected. In this study, it was found that the virulenceenhanced strain produced more melanin than the wild-type strain (data not shown), showing that the differential genes related to the biosynthesis of DHNmelanin were at lest partly responsible for the virulence variation under the selective pressures from host [4]. Responses to the selective pressure of host, CL04685 encoding scytalone dehydratase (SCD) were up-regulated PHI genes and it was involved in melanin synthesis [63,64]. Therefore, it was proved that the pathway of melanin synthesis was related to virulence differentiation of C. lunata. Interestingly, it was found by Blastp of differential genes against the proteins sequences of the C. lunata CX-3 genome that 12 flanking genes of CL04685 (scd) in genome were up-regulated in WS18-Pob21-11, including CL04673 (alpha-1,6-mannosyltransferase subunit), CL04676 (epimerase), CL04678 (Pc16g10800), CL04680 (epimerase), CL04681 (NADP(+)-dependent dehydrogenase), CL04682 (MFS transporter), CL04686 (polyketide synthase), CL04687 (metallo-beta-lactamase), CL04688 (17-beta-hydroxysteroid dehydrogenase), CL04689 (epimerase), CL04690 (monoxygenase) and CL04692 (methyltransferase). Among the 12 flanking genes, the CL04682, CL04688 and CL04690 were PHI genes (Additional file 1: Table S13). Genes being involved in the biosynthesis of secondary metabolites were usually clustered [65], and the combination of CL04685 and its 12 flanking genes was similar to a sirodesmin biosynthesis-associated gene cluster of Leptosphaeria maculans [66] and a cercosporinassociated gene cluster of Cercospora nicotianae [67]. Thus, the 13 genes belonged to a gene cluster for the synthesis of secondary metabolites in C. lunata ( Figure 6). In this cluster, CL04683 with no expression change has 27% of amino acid sequence identity with aflatoxin regulatory protein AflR (GenBank: AAM03003) which was involved in the regulation of aflatoxin clusters in A. flavus and Aspergillus parasiticus and sterigmatocystin cluster in Aspergillus nidulans, and 24% of amino acid sequence identity with cercosporin regulatory protein CTB8 (GenBank: DQ991510) of C. nicotianae. CL04682, a MFS transporter, has 61% of amino acid sequence identity with aflatoxin efflux pump (GenBank: XP_002844735) of Arthroderma otae. It was suggested that this cluster in C. lunata is a putative gene cluster for toxin biosynthesis of C. lunata. Therefore, the up-regulation of the cluster genes showed that they were involved in pathogenicity variation.
A complex network of regulatory and signaling components involved in regulation of morphogenesis and virulence [68,69]. Therefore, although a gene cluster related to toxin and melanin biosynthesis played roles in the pathogenicity variation, other differently expressed genes were probably relative to the change, particularly up-regulated genes (Figure 7).

Discussion
In this study, we reported the first genome sequence of C. lunata, an important pathogen of maize. Genome sequencing of the pathogen will make more contribution to the understanding of its evolutionary relationship with other pathogenic fungi, and efficiently screening of pathogenicity-associated genes as well as detection of its virulence variation under its interaction with host. It was showed that C. lunata CX-3 has significant differences from C. lunata m118 and B. maydis C5 in the number and sequence of genes-encoded proteins, although their evolutionary relationships are very close. As a plant pathogen, the C. lunata CX-3 genome like B. maydis contains about thousands of putative PHI genes (17.0% of all the genes) which are well known to be closely involved in the interaction with plants. Additionally, it was also found that the proportion of secretary proteins in C. lunata CX-3 was close to that in C. lunata m118 and B. maydis C5. Secreted proteins of pathogens have some crucial effectors responsible for mediating plant-pathogen interaction, some of which may play a role in the induction/suppression of plant resistance against pathogen infection [70,71]. Transcriptional profile deriving from RNA-seq showed that the virulence variation of C. lunata was not only related to the up-regulation of genes probably being involved in the biosynthesis pathway of toxin and melanin, it also related to the up-regulation of other pathogenicity associated genes. In other words, multiple regulatory networks involved in the virulence differentiation of C. lunata response to the selective pressures from resistant hosts.
It was found by the phylogenomic analysis that the Cochliobolus lineage was diverged from other well known plant pathogenic fungi including S. turcica, P. tritici-repentis and M. oryzae, which was consistent with Leonard's result [72]. Comparatively, Cochliobolus lineage was more similar to S. turcica and P. tritici-repentis than to M. oryzae in protein sequences. A great deal of expressed sequence tags (ESTs) sourced from the suppression subtractive hybridization (SSH) library of C. lunata revealed a similar trend that C. lunata has high sequence similarities with S. turcica and P. tritici-repentis [4], which would allow us more efficiently and more easily to find novel functional factors or mechanisms in C. lunata for attacking host plants. Furthermore the similar resistant genes could also be searched between mentioned pathogenic fungi and applied in a way to share homologous gene sources.
A wide range of plant pathogenic fungi exhibit a high degree of genetic variability through varied ways [4,18], based on the PAMP models (pathogen-associated molecular pattern), in which a numerous effectors are involved in compatible or incompatible interaction between pathogen and host plants. In our case, small cysteine-rich proteins (SCRPs), for example, were speculated to be contributor to host immunity response if incompatible interaction of pathogen and maize resistant varieties happen, which could serve as effectors responsible for inducing host resistance response such as PTI and ETI [70].
Transposases encoded by transposons played a key role in the evolution of eukaryotic gene regulatory network, and coordinated and regulated eukaryotic gene expression [73]. Therefore serious pathogenic variation of C. lunata was attributed to the presence of a large amount of transposases in C. lunata [4]. The amount of transposases in C. lunata CX-3 is much larger than not only in C. lunata m118 and B. maydis C5, also in other Ascomycetes [20]. The phenomenon could be explained by high C to T and G to A transitions mediated by RIP in the C. lunata CX-3 genome during its sexual cycle [26], as also proved in many Ascomycetes [17,19,20,74,75]. Thus, it was speculated that shift of C:G to T:A might make the pathogen more frequently mutate in virulence differentiation of C. lunata, at least partly, suggesting that C. lunata CX-3 has a strong potential to evolve into other pathogenic types of strains.
C. lunata successively undergoes asexual and sexual stages in his life cycle like other fungi. So far, the asexual stage of C. lunata is well known, but its sexual stage is still poorly understood. In sexual stage, C. lunata probably survived in soil in the shape of ascospores [76], but this stage was infrequent in nature. The asexual stage of C. lunata is key in causing disease to host plant. During the interaction of pathogen-plant, C. lunata and other pathogenic fungi undergo several complex and crucial steps, including attachment to the plant surface, germination on the plant surface and formation of infection structures, penetration of the host and colonization of the host tissue, which are crucial to cause disease [77]. Pathogenic factors, including toxin, melanin, cell wall degrading enzymes (pectinase, cellulase, hemicellulase, protease, amylase and phospholipase), cutinase and hormone are essential for completing these processes. Furthermore, many gene-encoded proteins such as GPCRs, kinases, core genes for the biosynthesis of secondary metabolite, CYPs, ABC transporters and MFS transporters are directly or indirectly involved in these processes. Excitingly, a great many of these proteins had been identified in the C. lunata CX-3 genome by bioinformatics and most of these protein families were involved in the pathogen-host interaction (Additional file 1: Table S2), which facilitated the understanding of pathogenic mechanism of C. lunata. In the previous study, the chemical structure of non-host-specific toxin in C. lunata was identified successfully, and it was speculated based on the toxin structure that genes related to toxin biosynthesis may exist in genome with clustering status. It was found through the pathogen genome and transcriptome sequencing that alpha-1,6-mannosyltransferase, epimerase, P450, MFS transcripter, methyl transferase, polyketide synthase, monooxygenase and 17-beta-hydroxysteroid dehydrogenase etc. might be members of a gene cluster and responsible for the toxin synthesis, which was a very great progress on the identification of genes related to toxin biosynthesis, even though there was no idea that those genes how to work in synergistic way so far.
Although C. lunata has greatly close relation to B. maydis and both they are maize pathogens, there are distinct differences in pathogenicity-associated families. MFS transporters, G-protein coupled receptors, protein kinases and proteases families in C. lunata being involved in transport, signal transduction or degradation are expanded in relative to B. maydis. Cytochrome P450, lipases, glycoside hydrolases and polyketide synthases families for detoxification, hydrolysis or secondary metabolites biosynthesis are contracted compared to B. maydis, which are expected to be crucial for the fungal survival in varied stress environments.
Comparative transcriptional studies for the wild type strain with low virulence and its variant with high virulence provided a global realization of gene expression response to the directly selective pressures of resistant host plants, which would contribute to better understand the virulence differentiation of C. lunata and clarify the biosynthesis pathways of secondary metabolites such as toxin and melanin. Results suggested that toxin and melanin may share some biosynthesis related genes or regulation genes, and gene clusters for the toxin and melanin biosynthesis may overlap. Nevertheless toxin and melanin were synergistic pathogenic factors to maize, which might take place in virulence differentiation induced by host selective pressures. In other words, an unknown crossovers regulation mode of toxin and melanin may govern global virulence performance. For example, brn1 (CL07173)-encoded 1,3,8-trihydroxynaphthalene reductase is not just involved in melanin biosynthesis but also mediates toxin biosynthesis in C. lunata [78].
Although the identification of PHI genes in the genome wide and pathogenicity variation analysis in transcriptome level were performed, there are still some limitations in this work. The identification of PHI genes and the family classification were based on Blastp search against corresponding databases. Nevertheless the function of homologous genes in different organisms may be different, thus the specific function of the interested genes especially PHI genes in C. lunata should be further confirmed experimentally.

Conclusions
In conclusion, we report the genome sequence and comparative genome analysis, and conduct transcriptional regulation of pathogenicity variation in the plant pathogenic fungus C. lunata. The genome and transcriptome data should facilitate the identification of candidate genes and accelerate the molecular studies on biology, pathogenic mechanism and virulence differentiation of C. lunata. The candidate genes for plant-pathogen interaction are worthy of interest in C. lunata.

Fungal strains
C. lunata CX-3 strain was selected for genome sequence since it is highly virulent to maize and caused typical lesion on the maize leaves. Its colony morphology on PDA plate at 28°C is less aerial hyphae [9]. The strain is stored in our laboratory and used for the study of pathogenicity mechanism of the pathogen for ten years. The strain is maintained on potato dextrose agar (PDA) medium at 4°C or silicone beads at −20°C.

Genome sequencing and assembly
The whole genome of C. lunata CX-3 strain was de novo sequenced using Illumina, the next generation sequencing technology. Two DNA libraries with 700 bp and 5 kb insert fragments were constructed and Paired-End sequenced with the Illumina Genome Analyzer at the Beijing Genomics Institute (BGI, Shenzhen, China). The genome sequence was assembled by SOAPdenovo software (BGI) [79].

Gene prediction and annotation
The genes structures of the C. lunata CX-3 genome were predicted using Augustus software [80], with the annotated gene information of M. grisea as a reference. The predicted genes were inspected manually to maximize the accuracy of gene prediction. All questionable open reading frames were individually searched against the NCBI (reference proteins) refseq_protein database by Blastp [20]. Repetitive sequences were predicted using RepeatMasker 3.3.0 [81], and the database version and the search engine were 20110920 and rmblastn 2,2,27+, respectively. The potential secreted proteins were screened by combining Target 1.1 [82], SignalP 4.1 Server [83], TMHMM Server v. 2.0 [84] and Big-GPI softwares [85]. Small cystein-rich proteins (SCRPs) were identified from predicted scecreted proteins based on their expected sequence characteristics [86]. Secreted proteins with 20 to 150 amino acids and at least four cysteins were served as putative SCRPs.

Orthology and phylogenomic analysis
Orthologous protein domains in fungi were used for phylogenomic analysis by the Inparanoid 7.0 database [87]. In total, 1328 orthologous proteins were screened out between used fungi with a cutoff E value of 1e-20 and aligned using Clustal W 2.1 [88]. The concatenated amino acid sequences was used to created a neighborjoining tree using with TREE-PUZZLE program with the Dayhoff model [89]. The divergence time between species was estimated using r8s software with the Langley-Fitch medel [90] by calibrating with the origin of the Ascomycota at 500-650 million years ago [91].

Protein family classifications
The protein families of C. lunata CX-3 genome were classified by Pfam analysis (http://pfam.sanger.ac.uk/). To identify virulence-associated genes, Blastp searches of the C. lunata CX-3 genome were performed against protein sequences in the pathogen-host interaction database (version 3.2, http://www.phi-base.org/) with a cutoff E value of 1e-5. G-protein-coupled receptors were screened through local Blastp against GPCRDB database (http://www.gpcr.org/7tm/) with best hits and further confirmed by searching seven transmembrane helices. Pth11-like GPCRs in C. lunata were identified by local Blastp searching against Pth11 with a cutoff E value of 1e-10 [74]. Carbohydrate-active enzymes were classified using local Blastp searching against the CAZy database (http://www.cazy.org/). Protease families and kinases were identified by local Blastp analysis respectively against the MEROPS peptidase database with a cutoff E value of 1e-20 and KinBase database with a cutoff E value of 1e-10 [92]. Transporters were analyzed through local Blastp against Transporter Classification Database with a cutoff E value of 1e-40 [93]. Cytochrome P450s were classified based on Blastp alignment against P450 database with a cutoff E value of 1e-10 (http://drnelson. uthsc.edu/CytochromeP450.html).

Transcriptome analysis
C. lunata stains WS18 and its variant WS18-Pob21-11 were selected for transcriptome analysis using RNA-seq technology. Transcriptomes of WS18-Pob21-11 and WS18 were served as test group and control. C. lunata WS18 and its variant WS18-Pob21-11 obtained by the successive induction of resistant maize population Pob21 were lowly and highly virulent to Pob21, respectively [4]. To extract total RNA of WS18 and WS18-Pob21-11, their mycelia were cultured in potato-dextrose (PD) liquid medium with 140 rpm of shaking at 28°C for 96 h. The incubated medium was harvested by filtration with sterile cheese cloth, washed using sterilized Milli-Q and it was grinded in liquid nitrogen. Then, total RNA was extracted using Trizol™ Reagent (Invitrogen, USA) and treated with DNase I (Takara, Japan) for removing genome DNA with the recommended method by the manufacturer. Messenger RNA was extracted, cut off randomly and then reverse transcribed into cDNA.
A sequencing library was constructed by PCR amplification of cDNA for tag preparation. The tags were Paired-End sequenced using de novo sequencing technique with Illumina HiSeq™ 2000. Tags only containing adapter sequences were omitted, then the remaining tags were mapped to the C. lunata CX-3 genome or annotated genes [19]. For comparing expression of each mapped gene between samples, the abundance of each tag was transformed to the value of transcripts per million tags (TPM) [19]. The mapped genes with a cutoff of P ≤ 0.05 and FDR ≤0.001 were identified as significantly differential genes [101].

Availability of supporting data
This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JFHG00000000 for C. lunata CX-3. The version described in this paper is version JFHG01000000. Phylogenetic data (alignments and phylogenetic trees) can be available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.22v18.

Additional files
Additional file 1: Comparative genomics analysis of C. lunata. The file is composed by 13 tables in separate excel sheets, in which the additional information refer to comparative genomic properties and gene family analysis of C. lunata with other pathogenic fungi. Table S1. The number of genes for selected gene families in C. lunata and other Ascomycetes. Table S2. Major protein families involved in pathogen-host interaction in C. lunata and other Ascomycetes. Table S3. Putative transposase-encoding genes. Table S4. Glycoside hydrolase in C. lunata and other fungi. Table S5. Protease genes in different fungal genomes, classed by MEROPS families. Table S6. Transporters in C. lunata CX-3, C. lunata m118 and B. maydis C5. Table S7. Drug transporters of C. lunata and other fungi. Table S8. -Cytochrome P450 genes in different fungal genomes, classed by CYP families. Table S9. G-protein-coupled receptors in different fungal genomes. Table S10. The number of protein kinases in different fungal genomes. Table S11. Proteins for MAP Kinase pathway in the C. lunata CX-3 genome. Table S12. The pupative small, cysteine-rich peptides encoding genes in C. lunata CX-3 genome. Table S13. Differentially expressed genes in WS18-Pob21-11 compared to WS18.
Additional file 2: Figure S1. Phylogenetic and domain analyses of C. lunata CX-3 NRPSs compared with known NRPS from other fungi for mycotoxin biosynthesis.

Competing interests
The authors declare no competing financial interests.
Authors' contributions SG carried out DNA and RNA extraction, phylogenetic analysis, gene annotation, protein family classification and transcriptome analysis, and drafted the manuscript. YQL conducted RIP analysis, participated in phylogenetic analysis, transcriptome analysis and the design of the study, and drafted the manuscript. JG carried out the identification of secreted proteins and drafted the manuscript. YS participated in nucleic acid extraction, gene annotation and transcriptome analysis. KF participated in gene annotation and transcriptome analysis. YYL revised the manuscript. JC conceived of the study, participated in its design and helped to draft the manuscript. All authors read and approved the final manuscript.