Comparative genomics of plant pathogenic Diaporthe species and transcriptomics of Diaporthe caulivora during host infection reveal insights into pathogenic strategies of the genus

Background Diaporthe caulivora is a fungal pathogen causing stem canker in soybean worldwide. The generation of genomic and transcriptomic information of this ascomycete, together with a comparative genomic approach with other pathogens of this genus, will contribute to get insights into the molecular basis of pathogenicity strategies used by D. caulivora and other Diaporthe species. Results In the present work, the nuclear genome of D. caulivora isolate (D57) was resolved, and a comprehensive annotation based on gene expression and genomic analysis is provided. Diaporthe caulivora D57 has an estimated size of 57,86 Mb and contains 18,385 predicted protein-coding genes, from which 1501 encode predicted secreted proteins. A large array of D. caulivora genes encoding secreted pathogenicity-related proteins was identified, including carbohydrate-active enzymes (CAZymes), necrosis-inducing proteins, oxidoreductases, proteases and effector candidates. Comparative genomics with other plant pathogenic Diaporthe species revealed a core secretome present in all Diaporthe species as well as Diaporthe-specific and D. caulivora-specific secreted proteins. Transcriptional profiling during early soybean infection stages showed differential expression of 2659 D. caulivora genes. Expression patterns of upregulated genes and gene ontology enrichment analysis revealed that host infection strategies depends on plant cell wall degradation and modification, detoxification of compounds, transporter activities and toxin production. Increased expression of effectors candidates suggests that D. caulivora pathogenicity also rely on plant defense evasion. A high proportion of the upregulated genes correspond to the core secretome and are represented in the pathogen-host interaction (PHI) database, which is consistent with their potential roles in pathogenic strategies of the genus Diaporthe. Conclusions Our findings give novel and relevant insights into the molecular traits involved in pathogenicity of D. caulivora towards soybean plants. Some of these traits are in common with other Diaporthe pathogens with different host specificity, while others are species-specific. Our analyses also highlight the importance to have a deeper understanding of pathogenicity functions among Diaporthe pathogens and their interference with plant defense activation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08413-y.


Background
Species of the genus Diaporthe and their asexual morph Phomopsis are endophytes, pathogenic, and saprophytic on a wide range of hosts worldwide [1][2][3]. Pathogenic Diaporthe species cause diseases on a wide range of plant hosts, including forest trees [4], citrus [5], pepper [6], sunflower [7], and soybean [8], among others. In soybean, Diaporthe caulivora (syn. Diaporthe phaseolorum var. caulivora) and Diaporthe aspalathi (syn. Diaporthe phaseolorum var. meridionalis) are the causal agents of soybean stem canker (SSC) [8][9][10][11]. Diaporthe longicolla also causes soybean stem canker, and it is usually found in association with D. caulivora and D. aspalathi [12,13]. SSC causes important losses in soybean growing regions around the world [14][15][16][17][18], and disease symptoms are associated mainly to necrosis of the stem, with discoloration and leaf wilting [10,13]. Currently, SSC management is performed by agronomic practices (crop rotation, avoidance of crop residue), fungicide applications, and the use of resistant cultivars [12,19]. Host-plant resistance is currently the most effective strategy and therefore the identification of sources of resistance in soybean germplasms and breeding lines is of great importance. At present, five major genes that confer resistance to SSC caused by D. aspalathi and one for D. caulivora have been identified [10,20,21]. However, no commercial varieties with resistance to D. caulivora are available. Since D. caulivora is one of the principal causal agents of SSC in soybean producing countries [13,19,22], more information is needed to understand the interaction of this pathogen with the host plant in order to develop breeding strategies to control the disease.
Genomic information has allowed the identification of virulence factors and effector proteins in different pathogenic fungal species [23,24], including plant cell wall degrading enzymes (PCWDEs) and enzymes involved in toxin production that are important for host colonization [25][26][27]. At present, 14 nuclear genome sequences of different Diaporthe species are available at NCBI [6,26,[28][29][30][31][32][33][34][35][36][37]. However, genomic resources of the important pathogen D. caulivora are not available and transcriptomic profiling of Diaporthe species during host infection has not been performed. Genomic and transcriptomic approaches will give insights into the molecular basis of the pathogenicity strategies used by different Diaporthe species. In the present work, we report for the first time the genome sequence of D. caulivora and performed a comparative analysis with five available pathogenic Diaporthe genomes from different hosts [6,13,[35][36][37][38][39][40][41]. Moreover, we present RNAseq data of D. caulivora during host plant infection. Our findings reveal the presence of pathogenicity factors that are shared between all Diaporthe species and D. caulivora-specific virulence components involved in host colonization and plant defense evasion.

Results and discussion
Diaporthe caulivora de novo genome sequencing, assembly and comparison with available Diaporthe genomes The nuclear genome assembly of D. caulivora was resolved taking the advantages of PacBio sequences reads using FlyE v2.7. The assembly consisted in 10 contigs with a total length of 57,864,239 bp and a coverage of 270X (Table 1). The polishing step was performed using Minimap2 v2.18 by mapping the Pacbio raw reads back to the genome assembly [42], with a mapping rate of 97.8%. Genome assembly and the annotation was defined with high precision and completeness, determined by BUSCO analysis; 98% completeness in fungi_odb10.2019-11-20 database. We further looked for nuclear genomes of other Diaporthe species available at NCBI and selected five genomes, D. capsici, D. citri, D. destruens, D. longicolla and D. phragmitis, according to the quality of the assembly based on the total genome size, N50, number of contigs and number of N's (Additional file 1, Table 1). Although D. longicolla has a lower quality assembly, it was included since this species has high occurrence in SCC lesions [13]. D. capsici, D. citri, D. destruens and D. phragmitis are pathogens with different host range (Additional file 2). Phylogenetic analysis with the internal transcribed spacer (ITS) and translation elongation factor 1-alpha gene (TEF1α) sequences of the six Diaporthe species, ex-type strains as well as other pathogenic Diaporthe species, confirmed the identity of the isolates used (Additional file 3). The genome assembly sizes of D. capsici, D. destruens and D. phragmitis ranged from 56,1 Mb to 58,3 Mb [6,36,37], and were comparable in size with D. caulivora genome (57,86 Mb), while D. longicolla and D. citri were ~ 7 Mb larger than the other species analyzed [26,35] (Table 1).
Syntenic blocks analysis provides insights into genomic evolution between related species and defines regions of chromosomes between genomes that share understanding of pathogenicity functions among Diaporthe pathogens and their interference with plant defense activation.
Keywords: Diaporthe pathogens, Soybean, Genomes, RNAseq, Pathogenicity factors, Secretome, Effectors a common order of homologous genes derived from a common ancestor [43]. A whole genome nucleotide alignment using MUMmer [44], implemented in SyMap V 5.0.6, allowed the identification of large regions in syntenic blocks between D. caulivora and the other five Diaporthe species with an average of 96% (Additional file 4). This analysis showed that D. caulivora genome has a range of 82 to 92 syntenic blocks with a coverage of 95% of the total length of the genomes respect to D. capsici, D. citri, D. destruens and D. phragmitis genomes. It is worth noting that D. caulivora contig 10 has a complete synteny with contig 19 of D. capsici. The other contigs of D. caulivora were highly conserved with a few rearrangements compared with the four Diaporthe species mentioned above. The high syntenic similarity among these five Diaporthe species indicates their close genetic relationship, evidenced by a strong overall conservation of genomic organization. The D. longicolla assembly was the only one obtained only by Illumina technology and showed the lowest values of synteny, which is consistent with the degree of fragmentation of the assembly.

Gene prediction, functional annotation and orthologs
The annotation of the D. caulivora genome assembly was performed using a Fungap pipeline with combined abinitio strategies, homology-based searches and RNA-seq data, resulting in the annotation of 18,385 protein-coding regions. The functional annotation of predicted proteincoding genes was completed with the Blast2GO (B2Go) program [45]. The predicted genes were compared by BlastP against NCBI nr database (TaxId: Fungi) and classified by InterProScan v.5.19., Pfam protein families, InterPro domains, Gene Ontology Classification (GO), and metabolic pathways were recovered from proteins identified by BLAST using the B2Go annotation system (Additional file 5). Functional description could be assigned to 16,068 coding genes (87.4%). The remaining 2317 genes were searched at NCBI and 1787 of them did not show any hit and were considered as putative D. caulivora-specific genes validated by transcriptomic data (Additional file 5). Additional genome sequencing of more Diaporthe species will aid to confirm if these genes are only present in D. caulivora and not in other genomes of the genus. In order to make comparisons at gene levels  (Table 1). OrthoFinder analysis combined with an all-versusall protein BLAST strategy was used to cluster protein orthologous groups and infer a phylogeny tree with the six Diaporthe species and Fusarium graminearum as outgroup (Fig. 1). The results show that an overall 94% (86-97%) of genes were assigned to orthologous groups shared by the six Diaporthe species. Phylogenetic analysis showed that the six Diaporthe species were divided into two main clusters (Fig. 1A). Diaporthe caulivora is phylogenetically distant from D. capsici, D. citri and D. phragmitis, which can be explained by the lower number of orthologous genes (87%) shared with these three species.
Diaporthe capsici, D. citri and D. phragmitis are phylogenetically closely related, largely due to sharing nearly 98.,4% of the orthologous genes, which is in agreement with what has been shown by Gai et al. [35]. The second cluster included D. destruens and D. longicolla. Although D. caulivora and D. longicolla are soybean pathogens and they could be expected to be closely related, D. longicolla shares a similar number of genes assigned to orthologous groups with the other Diaporthe species regardless of the host specificity (Fig. 1B). Taking into account the quality of the analyzed assemblages, it is possible that specific genes of each species have not been fully recovered, unlike the D. caulivora genome presented in this work. Besides identifying the orthologous genes, we analyzed the average nucleotide identity (ANIm) of the species, which measure the nucleotide-genomic similarity between two genomes. The ANIm values between the different Diaporthe species varied from 78.6 to 96.4% and similar clusters as those obtained by orthologous analyses were obtained, except for D. caulivora which grouped together with D. longicolla and D. destruens (Fig. 1C).

Diaporthe caulivora-specific and shared Diaporthe virulence components
Fungal secreted proteins mediate communication with the environment including host plants, and have important function in pathogenesis [46]. To define the set of proteins considered within the secretome of the six Diaporthe species analyzed in our work, we used SignalP v5.0, WoLF PSORT for Fungi and Phobius. Diaporthe caulivora has a total of 1501 genes encoding predicted secreted proteins. The predicted secretome of D. capsici, D. longicolla and D. phragmitis was similar in number (ranging from 1588 to 1535 proteins), while D. citri and D. destruens have a smaller secretome, represented by 1383 and 1298 proteins, respectively ( Fig. 2A; Additional file 6). Diaporthe caulivora shared between 1007 and 1208 secreted proteins with the other five Diaporthe species (Fig. 2B).
In total 439 secreted proteins were in common between all Diaporthe species and were defined as the core secretome ( Fig. 3A; Additional file 6). This core secretome includes virulence components related to plant cell wall degradation and modification, such as pectin and pectate lyases, glycoside hydrolases, carbohydrate esterases, endoglucanases and exoglucanases, a xylanase, as well as proteases, peptidases, lipases, peroxidases, among others. Combining the results of the six secretomes with a NCBI search that include recently sequenced Diaporthe and other fungal species, we identified 1375 D. caulivora predicted proteins that have conserved homologs in other Diaporthe and fungal pathogens (Fig. 3B, Additional file 7). The rest comprised 53 conserved proteins among D. caulivora and other fungal species, 46 Diaporthe-specific proteins and 27 putative D. caulivora-specific proteins that exhibited no hit with any other organism. Interestingly, five of the Diaporthespecific proteins belong to the core secretome and represent relevant secreted proteins in the Diaporthe genus. Diaporthe-specific and D. caulivora-specific secreted proteins were all uncharacterized proteins whose roles in pathogenesis needs further investigation.
We looked into more detail to CAZYmes present in the D. caulivora secretome, and identified a total of 460 genes encoding CAZymes (  We further search in the Pathogen-Host Interaction (PHI)-base (Fig. 3A, Additional file 9), which catalogs experimentally verified pathogenicity, virulence and effector genes from different plant pathogens [48]. In total, 556 secreted proteins of D. caulivora (37%), were identified in the PHI-base and among them, 287 were related to reduced  virulence or loss of pathogenicity mutant phenotypes, 18 to increased virulence and 35 to effectors (Additional file 9). From them, 191 belongs to the core secretome and half of them correspond to CAZymes reinforcing their important role in Diaporthe pathogenicity (Fig. 3A). Interestingly, several D. caulivora secreted proteins that are shared with other pathogenic fungi but were absent in available genomes of Diaporthe species include CAZymes, a kievitone hydratase involved in phytoalexin detoxification [49], several FAD-binding domain-containing proteins, a putative versicolorin B synthase involved in mycotoxin production [50], among others (Additional file 7). Five of these D. caulivora secreted proteins, represent homologs of fungal genes that were assigned as effectors or result in reduced virulence in knockout or mutant experiments, including two LysM domain-containing proteins, an endo-beta-1,6glucanase, a pectin lyase-like protein, and a minor extracellular protease vpr (Additional file 9). Moreover, several genes encoding important virulence proteins in other fungi, such as necrosis-and ethylene-inducing proteins (NEP), transporters, oxidoreductases, proteases, hypersensitive response-inducing proteins, CFEM domain-containing protein were also present in the secretomes of several of the analyzed Diaporthe species, and most of them have PHI-base accessions hits.
Effectors are proteins or small molecules secreted by pathogens, which manipulate host cells, facilitating infection and interfering with host immunity [23,51]. We identified 133 secreted D. caulivora effector candidates ( Fig. 2A), including pectate and other polysaccharide lyases, glycoside hydrolases, a pathogenesis-related protein, a hypersensitive-inducing protein, peptidases, carbohydrate esterase and several hypothetical proteins (Additional file 10). The number of predicted effectors in the other five Diaporthe species ranged from 85 to 123 ( Fig. 2A; Additional file 10). Diaporthe caulivora shared between 63 to 98 effectors candidates with the other five Diaporthe species (Fig. 2C). Nine effector candidates were considered core effectors since they were present in all Diaporthe species, including four CAZymes (pectate lyase, polysaccharide lyase, 1,4-beta-Dglucan cellobiohydrolase and xylanase), a protein CAP22 and four hypothetical proteins. Most of the D. caulivora effector candidates were also found in other Diaporthe species and other fungi, 11 were Diaporthe-specific and four were D. caulivora-specific (Fig. 3B, Additional file 7). All Diaporthe-specific and D. caulivora-specific effector candidates are hypothetical proteins, which lack a conserved domain. Moreover, of the total secreted D. caulivora effector candidates, only 17 were identified in the PHI-base; 8 were related to reduced virulence mutant phenotypes and four were identified as effectors (Additional file 9). These virulence factors include several CAZymes, comprising four CAZymes of the core effectors, and a sterigmatocystin biosynthesis peroxidase involved in toxin production [52]. Taken together, our results revealed that the genome of D. caulivora has a large array of pathogenicity-related genes, most of which are in common with other Diaporthe and fungal pathogens, while others are Diaporthe-specific or D. caulivora-specific. Further studies are needed to reveal the function of these genus-and species-specific effector candidates.

Diaporthe caulivora genes encoding virulence factors and effector candidates are induced during soybean infection
In order to identify Diaporthe genes involved in pathogenicity, we performed transcriptional profiling of two early stages of D. caulivora infection of soybean plants (8 and 48 hpi) and included D. caulivora mycelium grown on PDA medium (Additional file 11). A total of 69,940,418 reads mapped to the D. caulivora genome and were considered for further analyses. Biological variability within replicates was analyzed by principal component analysis (PCA). As shown in Fig. 5A, the first principal component (PC1) accounted for 79.1% of the total variation and separates the two time points (8 and 48 hpi), and the control D. caulivora samples. In total, 306 D. caulivora genes were differentially expressed in plant tissues (48 vs 8 hpi), 295 genes were upregulated and 11 downregulated ( Fig. 5B; Additional file 12). In order to obtain more information on the infection process, we also compared differential expression between samples at 8 hpi and 48 hpi with mycelium samples grown on PDA. We identified 2635 additional D. caulivora differentially expressed genes (DEGs); 77 and 593 were upregulated, and 595 and 1561 were downregulated at 8 and 48 hpi, respectively (Fig. 5B-C; Additional file 12). We further focused on the total upregulated DEGs of the three comparisons (806 genes), since they could encode pathogenicity-related proteins involved in D. caulivora infection strategies. The functions of these DEGs were significantly enriched in several Molecular Function enriched GO terms, including oxidoreductase activity, hydrolase activity, lyase activity, ion binding and transporter activity ( Fig. 6A; Additional file 12).
We identified a high proportion of DEGs showing homology to genes previously reported to be involved in fungal infection processes. During D. caulivora infection, genes encoding CAZymes, proteins involved in detoxification and transport of toxic compounds, proteases and effectors were upregulated (Additional file 12). A search against the PHI-base, predicted 119 upregulated D. caulivora genes that may be involved in pathogenicity (Fig. 6B, Additional file 13). Among them, 46 were related to reduced virulence or loss of pathogenicity mutant phenotypes and 15 to effectors, including CAZymes such as lectins, xyloglucan-specific endo-beta-1,4-glucanase, pectin and pectate lyases, as well as proteases, peptidases, among others (Additional file 13). Of the total upregulated DEGs, 246 encode proteins that were present in our predicted secretome (Fig. 6B). Interestingly, 106 (43%) of these upregulated secreted proteins belong to the core secretome, indicating their contribution to Diaporthe pathogenesis (Additional file 13). Moreover, of the 104 upregulated D. caulivora genes encoding secreted CAZymes (Fig. 6B), 47 were present in the core secretome (Additional file 12), including polygalacturonases, endoglucanases, exoglucanases, pectate lyases, pectin lyases, glycoside hydrolases, xyloglucan-specific endo-beta-1,4-glucanase, mannan endo-1,4-beta-mannosidase, rhamnogalacturonan acetylesterases, among others. The number of upregulated DEGs and expression levels of CAZymes-encoding genes, including PCWDEs, increased at 48 hpi compared to 8 hpi (Additional file 12), highlighting the important role played by these enzymes in soybean tissue breakdown and host penetration. In addition, seven genes encoding secreted proteins present only in D. caulivora and not in other Diaporthe species were upregulated, including a kievitone hydratase and six hypothetical proteins, three of which are D. caulivoraspecific with no hit in public databases. Interestingly, kievitone hydratases are involved in detoxification processes of the phenylpropanoid kievitone, which play an important role in legumes defense against pathogen attack [53]. Of the total number of D. caulivora effectors, 23.3% (31 genes) were upregulated, including those encoding several CAZymes, a cell wall glycoprotein, several small secreted proteins, and hypothetical proteins (Additional files 12 and 13). Interestingly, five of the nine core effectors were upregulated during infection, including a xylanase, a polysaccharide lyase, CAP22, a putative 1,4-beta-D-glucan cellobiohydrolase and a hypothetical protein, indicating that they represent common effector candidates in different Diaporthe plant pathogens. Most of these common genes encode proteins involved in cell wall degradation and modification, and CAP22 is expressed in other pathogenic fungi in infection structures such as appresoria [54]. Moreover, one Diaporthespecific (gene_10233.t1) and one D. caulivora-specific (gene_06736.t1) effector candidates were upregulated during infection.
Further inspection in transcriptomic data revealed possible strategies used by D. caulivora, and probably the other Diaporthe species, leading to host infection and tissues colonization. In addition to CAZymes-encoding genes, other upregulated genes belonging to the core secretome include proteins involved in pathogenesis such as peroxidases, SnodProt1, necrosis-and ethylene-inducing peptide 1 (Nep1)-like protein, saponin hydrolase, pathogenesis related protein, CAP22, peptidases, small secreted proteins and lipases. Upon entry into the host, fungal pathogens must resist toxic compounds produced by the plant [55]. Upregulation of a high number of genes encoding proteins involved in oxidoreductase processes during D. caulivora infection, like dehydrogenases, oxidases and reductases, is consistent with their important role played during fungal colonization and maintenance of redox status in both organisms [56] (Additional files 12 and 13). Several upregulated D. caulivora peroxidaseencoding genes belonging to the core secretome showed homology with genes included in the PHI-base. Like in other plant-pathogen interactions, D. caulivora and other Diaporthe peroxidases could be involved in lignin breakdown and detoxification of ROS produced by the host. In addition, expression of aldehyde dehydrogenases-encoding genes increased upon D. caulivora infection, which could be involved in pathogenicity through scavenging reactive aldehydes, fatty acid radicals, and other alcohol derivatives, as occur in other plant fungal pathogens [57] (Additional file 13). Moreover, 24 cytochrome P450 encoding genes were upregulated during D. caulivora infection, several of which exhibited functionally characterized homologs in PHI-base (Additional file 13). Since some cytochrome P450 are capable of detoxifying phytoalexins [58], and P450 monooxygenases are involved in the synthesis of some fungal toxins [59], they probably are also needed for D. caulivora pathogenesis. Interestingly, P450 families expanded significantly in several plant  [30]. Enzymes with oxidative-reductive properties also play relevant functions in degrading antimicrobial compounds such as phytoalexins and polyamines of plant origin [60]. Consistently, D. caulivora genes encoding enzymes involved in detoxification of plant defense molecules show high expression levels during early infection stages. Among them, we found several genes encoding putative pisatin demethylases, most of which were related to reduced virulence phenotypes in other pathosystems according to PHI-base (Additional file 13). In Fusarium oxysporum f. sp. pisi, a pisatin demethylase is responsible for detoxifying the pea phytoalexin pisatin [61]. Pisatin demethylases could encode glyceollin demethylases and detoxify soybean phytoalexin glyceollin [59]. Furthermore, two genes encoding dienelactone hydrolase (gene_04049. t1, gene_11950), involved in the β-ketoadipate pathway [62], were highly expressed, suggesting that they could be involved in detoxification mechanisms of plant defensive aromatic compounds. Interestingly, a saponin hydrolase present in the core secretome was upregulated during D. caulivora infection. These enzymes hydrolyze plant saponins that have antifungal activity and serve as potential chemical barriers against pathogens [63]. Thus, saponin hydrolases could be part of Diaporthe strategies to achieve successful host infection. On the other hand, a kievitone hydratase encoding gene (gene_03751.t1), which was only present in D. caulivora and other fungi such as Fusarium oxysporum and Colletotrichum fructicola, and not in other Diaporthe species, was upregulated during D. caulivora colonization. Kievitone hydratase is an important virulence factor in Fusarium solani that catalyzes the conversion of bean kievitone to a less toxic metabolite [49]. Furthermore, a gene encoding the phytotoxin SnodProt1, present in the core secretome, displays increased expression throughout the D. caulivora infection process. SnodProt1 proteins are involved in plant tissue colonization by several pathogenic fungi and produces ROS and necrosis of the plant tissues [64]. We have observed that D. caulivora infection induces ROS production in soybean stems (Mena, unpublished observations), and D. aspalathi elicitors activate the formation of nitric oxide (NO) in plant tissues that triggers the biosynthesis of antimicrobial flavonoids in soybean [65]. These findings, suggest that ROS and NO production play a role in plant-Diaporthe interactions.
Enzymes such as subtilases and alkaline proteinases have been shown to degrade plant defense proteins [67]. Increased expression of genes encoding three subtilisin-like proteinases (gene_11104.t1, gene_05325, gene_14820), an aspartic proteinase (gene_08152), and several other proteases in D. caulivora-infected tissues suggests their possible involvement in degradation of plant defense proteins. Consistently, disruption of genes with significant homology to gene_11104.t1 and gene_08152.t1, as well as other genes encoding proteases (gene_15501.t1, gene_15174.t1 and gene_06260. t1) showed reduced virulence in other plant pathogens (Additional file 13). The role of proteases and peptidases in Diaporthe pathogenesis is further supported by their presence in the core secretome.
Other upregulated genes encoding virulence factors include several transporters such as major facilitator superfamily (MSF) and ABC transporters, that in plant pathogenic fungi are responsible not only for export of compounds involved in pathogenesis, but also for excretion of plant-derived antimicrobial compounds [68]. In total, 6 MSF and four ABC transporters were upregulated during D. caulivora infection and showed significant homology to other transporters involved in virulence mechanisms in other plant pathogens (Additional files 12 and 13).
Taken together, our results suggest that early infection processes of soybean stems by D. caulivora depends on plant cell wall degradation and modification, detoxification of toxic compounds, transporter activities and toxin production. In addition, increased expression of genes encoding putative effector proteins during host colonization, some of which are species-specific, indicate that D. caulivora infection strategy also relies on plant defense evasion. Interestingly, several of these upregulated genes encoded proteins are part of the core secretome and represent common virulence components among Diaporthe species.

Conclusions
Our findings give novel and relevant insights into the molecular players involved in pathogenicity of D. caulivora towards soybean plants, several of which are in common with other Diaporthe pathogens with different host specificity, while others are species-specific. Future studies towards understanding the role of effector candidates during the infection process of important pathogens among Diaporthe genus could reveal novel effector functions and plant targets that underpin Diaporthe pathogenic lifestyle. Our findings improve our understanding of D. caulivora pathogenicity mechanism involved in SCC development and allow genomic and transcriptomic comparisons among the most important pathogens belonging to the Diaporthe genus. Finally, the knowledge generated in this study provides a foundation for developing effective disease management strategies for SSC.

Methods
All methods were performed in accordance with the relevant guidelines and regulations.

Diaporthe caulivora inoculation
The Diaporthe virulent strain D57 was previously isolated from a stem canker lesion of a soybean plant grown in Uruguay and confirmed as D. caulivora by a phylogenetic tree generated from the analysis of the internal transcribed spacer (ITS) and translation elongation factor 1-alpha gene (TEF1α) [13]. Diaporthe caulivora D57 was growth in potato dextrose agar (PDA; Difco, Detroit, USA) at 24 °C in 12 h light/12 h darkness photoperiod. Plugs of a 5-day-old culture were used for plant inoculation. SSC-susceptible soybean (Glycine max) plants cultivar Williams (PI548631) were used for all assays. Seeds were obtained from USDA ARS Soybean germplasm collection (seed source 13 U-9280), and planted in 12-cm × 12-cm pots filled with a mix of soil and vermiculite at a rate of 3:1. Plants were grown at 24 °C under a 16 h light/8 h dark lighting regime. Soybean inoculation was performed on 3-week-old (V2 stage) plants, using the stem wounding method described by Mena et al. [13]. Briefly, wounding was performed by making a thin slice along the stem with a sterile scalpel (approx. 7 mm), 1 cm above the cotyledon, and single agar plugs bearing mycelium were carefully placed on the wound that was subsequently sealed with vaseline. PDA plugs without pathogen were placed on wounds as a control.

Genomic DNA extraction, sequencing and de novo D. caulivora genome assembly
Agar plugs (5 mm in diameter) from the growing edge of 8-day-old cultures grown on PDA were transferred to new culture media and incubated at 24 °C under 12 h light and dark photoperiod. After 7 days, when the fungal growth completed the Petri dish, the mycelia were collected from three Petri dishes and frozen with liquid nitrogen. Approximately 150 mg of mycelium was ground with liquid nitrogen. Total genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions and quantified with Nanodrop 2000c (Thermo Scientific, Wilmington, USA). PacBio library preparation and sequencing were done in the Integrative Genomics Core Service, Beckman Research Institute, City of Hope (Monrovia, CA, USA). Briefly, a library for single-molecule real-time (SMRT) sequencing was constructed with an insert size of 15 kb using the SMRTbell ™ Template Prep kit (Pacific Biosciences of CA, USA). The size of inserts was determined using the BluePippin device (Sage Science, MA, USA). Finally, the whole genome of D. caulivora was sequenced using the PacBio Sequel platform. Subreads were obtained using the SMRT Analysis RS. Flye v.2.7 [69] was used to assemble the data applying standard parameters and an estimated genome size of 50 Mb. The polishing and correction steps are included in the assembly pipelines. The genome graph structure was visualized with Bandage [70] to survey contiguity and ambiguities. Assembly statistics were obtained with QUAST v5.0.2 [71].

Gene prediction and functional annotation
Gene prediction was performed using the FunGap pipeline [72]. Briefly, gene prediction was performed using three publicly available programs, Augustus [73], Braker 1 [74] and Maker [75] housing into the pipeline. Transcriptomic data of D. caulivora strain D57 obtained with the RNASeq protocol was processed with Trimmomatic v0.39 [76] and used for gene model prediction. The parameters using to run FunGAP were: -sister_proteome: Fusarium, − augustus_species fusarium_graminearum and transcript reads as -trans_read: paired-end. Fusarium graminearum was used as the reference species due to the relatively close phylogenetic relationship to Diaporthe among the genome sequences available in GenBank [26]. Single-copy fungal orthologs (fungi_ odb10.2019-11-20 gene set) from Benchmarking Universal Single-Copy Orthologs (BUSCO v4) [77] were used to assess the completeness of the genome annotation.
Functional annotation was completed with Blast2GO [45], trough Omicbox software v. 2.0.36 (https:// www. biobam. com/ omics box). Gene models were compared with several databases (NCBI nonredundant protein database, Gene Ontology (GO) Consortium (http:// geneo ntolo gy. org), and InterpoScan) with BlastP finding single hit at an e-value threshold of 1E-20 using taxIds for fungi [78]. InterproScan analysis was used to identify domains in the D. caulivora genome [79]. Classification into GO categories was performed with Blast2GO software using Fusarium graminearum as a reference since protein domain information for D. caulivora was not available [80].
ITS and TEF1α sequences were retrieved from the six Diaporthe genomes and aligned with MUSCLE (v3.8.31) [81], and configured for highest accuracy (MUSCLE with default settings). Iqtree software [82] was used to build a phylogenetic tree, with substitution model TIM2 + F + G4 estimated with Model finder [83] and ultraboostrap 1000.
Gene prediction for each Diaporthe assemblies downloaded was performed using AUGUSTUS web Server (http:// bioinf. uni-greif swald. de/ webau gustus/), with D. caulivora gene models from this work as a training set with default parameters (UTR prediction: false; report genes on: both strands; alternative transcripts: medium; allowed gene structure: predict any number of (possibly partial) genes.). The average nucleotide identity (ANI) of all Diaporthe species was calculated using PYANI v0.3. 0-alpha with MUMer to align the input sequences (ANIm) (https:// github. com/ widdo wquinn/ pyani). The orthologous analysis was conducted using OrthoFinder v2.5.4 [84] with all-versus-all BLAST strategy to define the orthogroups among the six Diaporthe species. Phylogenetic species rooted tree was inferred by multiple sequence alignment and maximum likelihood (options: "-S blast -M msa") using all orthogroups from the six Diaporthe species based on Species Tree Inference from All Genes method (STAG).
The carbohydrate-active enzymes (CAZymes) were identified with dbCAN 5.0, which searches CAZy familyspecific HMMs with HMMER3, and NCBI's conserved domain database CDD [94]. Putative polyketide synthases (PKS) genes were identified using InterProScan and identification of conserved domain as indicated in [27]. To identify proteins involved in pathogenicity, the predicted secretome was used as a query for BlastP (e-value 1E-05) search against the pathogen-host interaction database (PHI-base v4.10) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens [48].

RNA extraction, RNA sequencing and data processing
For transcriptomic analysis, samples were taken at 8 h post inoculation (hpi) and 48 hpi in soybean plants and D. caulivora mycelium grown on PDA plates for 7 days was used as a control. Each treatment consisted of three pots with three plants each at each infection time point, and three plates of mycelium grown on PDA plates. Soybean tissues (stem section of 1.5 cm including the wounded area) and D. caulivora mycelium were harvested for RNA extraction, immediately frozen in liquid nitrogen, and stored at − 80 °C. Soybean stem tissue samples (n = 3) were ground in liquid nitrogen. Total RNA from 100 mg of tissue was extracted and purified with TRIzol reagent (Invitrogen, USA), and using the Invitrogen PureLink RNA Extraction Mini kit (Invitrogen, USA), followed by a DNase I treatment (RNase-Free DNase I). The extraction was performed according to the manufacturer's instructions. Quality of the isolated RNA was checked by running samples on 1.2% formaldehyde agarose gel. RNA concentration was measured using a