Prediction of pathogenicity genes involved in adaptation to a lupin host in the fungal pathogens Botrytis cinerea and Sclerotinia sclerotiorum via comparative genomics

Background Narrow-leafed lupin is an emerging crop of significance in agriculture, livestock feed and human health food. However, its susceptibility to various diseases is a major obstacle towards increased adoption. Sclerotinia sclerotiorum and Botrytis cinerea – both necrotrophs with broad host-ranges - are reported among the top 10 lupin pathogens. Whole-genome sequencing and comparative genomics are useful tools to discover genes responsible for interactions between pathogens and their hosts. Results Genomes were assembled for one isolate of B. cinerea and two isolates of S. sclerotiorum, which were isolated from either narrow-leafed or pearl lupin species. Comparative genomics analysis between lupin-derived isolates and others isolated from alternate hosts was used to predict between 94 to 98 effector gene candidates from among their respective non-conserved gene contents. Conclusions Detection of minor differences between relatively recently-diverged isolates, originating from distinct regions and with hosts, may highlight novel or recent gene mutations and losses resulting from host adaptation in broad host-range fungal pathogens. Electronic supplementary material The online version of this article (10.1186/s12864-019-5774-2) contains supplementary material, which is available to authorized users.

infected by S. sclerotiorum only marginally reduces disease [8]. Similarly, B. cinerea has been ranked second in a recently compiled list of the most important fungal plant pathogens [9], infecting more than 200 plant species and causing severe pre-and post-harvest damage worldwide.
Identifying fungal gene products that promote host-infection is important for improving disease management. These include small-secreted protein (SSP) effectors and secondary metabolites (SM) [10,11]. Increased availability of whole-genome sequence resources of plant pathogenic fungi have allowed bioinformatic prediction of effector-like proteins [12] which can then be targeted in the development of durable disease resistance strategies [13]. The genomes of S. sclerotiorum and B. cinerea are of high quality representations of whole chromosomes, supported by long-read sequencing and optical or genetic maps [14,15]. These genomes share significant sequence similarity and conserved synteny, but differ significantly in their repetitive DNA content and repertoire of SSPs and SM synthesis genes [6]. Genome-based studies in Sclerotinia and Botrytis have been instrumental in the identification of several putative effectors that may be associated with virulence [14,16,17]. Comparative genomics of other broad host-range species, such as the Colletotrichum spp. (C. sublineola and C. graminicola) have also predicted several non-conserved effector-like SSPs and SM proteins with potential roles in virulence [18], despite overall high levels of genome sequence conservation. Collectively, these studies have utilized genomic variations between different species to highlight genes that may be relevant to host-pathogen interactions; however it appears that useful variations can also be detected across isolates of a single species. In the broad host-range pathogen Coleosporium ipomoea, it has been postulated that isolates may narrow in host range towards host-specificity as they co-evolve with their respective hosts [19].
In this study, we outline and compare genome sequences for two isolates of S. sclerotiorum isolated from L. angustifolius and L. mutabilis, and one isolate of B. cinerea from L. angustifolius. The development of genomic resources specific to fungal pathogens of lupin will lay the foundations for further improvements on genome-driven integrated disease management of this crop. Furthermore, identification of variable genes between very recently diverged isolates of the same species, may provide insight into recent adaptations that are a result of challenge from defences of differing hosts, region-specific environmental conditions, farming practices or disease controls.

Genome features
Genome assembly of paired-end Illumina reads -with raw coverage of approximately 81x in S. sclerotiorum isolated from L. angustifolius (subsequently referred to as Sscl-Lang), 164x in S. sclerotiorum isolated from L. mutabilis (Sscl-Lmut) and 90x in B. cinerea isolated from L. angustifolius (Bcin-Lang) -resulted in 796 sequences with a total length of~38.40 Mb for Sscl-Lang and 1091 contigs with a total length of length of~38. 44 Mb for Sscl-Lmut. These genomes are predicted to encode 12,196 proteins in Sscl-Lang and 12,146 proteins in Sscl-Lmut (Table 1). The Bcin-Lang assembly produced a total length of~41.97 Mb, present in 216 sequences and encoding 13,353 proteins (Table 1). CEGMA [20] analysis showed a high percentage of highly conserved core eukaryotic genes were present in all three draft assemblies with 95.97% in Sscl-Lang, 97.18% in Sscl-Lmut and 96.37% in Bcin-Lang. Proteins from these genomes were functionally annotated with gene ontology (GO) terms assigned to 4925 (40.38%) and 4922 (40.46%) of predicted proteins of Sscl-Lang and Sscl-Lmut respectively and Pfam domains assigned to 7025 (57.60%) and 6995 (57.59%) genes in Sscl-Lang and Sscl-Lmut respectively. In Bcin-Lang, 5422 (40.60%) and 7772 (58.20%) of genes were assigned GO terms and Pfam domains respectively. The count of genes with assigned Pfam domains was compared between isolates/species using Fisher's exact test (P ≤ 0.05) (Additional file 1). This analysis showed that there was variation in gene content at the functional level between isolates collected from different hosts. Gene-based information for all the isolates are provided in Additional file 2. Repeat content of these genomes were highly similar (Additional file 3). De novo prediction of repeat sequences predicted 6.32, 6.46, and 2.53% of the Sscl-Lang, Sscl-Lmut and Bcin-Lang assemblies as repetitive, while prediction based on comparison to the known fungal repeats in Repbase predicted 2.32, 2.38, and 1.7% (Table 1).
A survey of AT-rich regions in these genomes, which is a common signature of repeat-induced point mutations (RIP) [21], revealed little evidence of RIP in Sscl-Lmut and only one gene in Sscl-Lang that corresponded to an AT-rich region. In Bcin-Lang, 22 genes were associated with AT-rich regions (Additional file 4). However, none of the above genes were predicted to be effector candidates (see below) and only four out of 22 had Pfam domain associated with them. The lengths of these genes ranged from 154 to 19,641 bp (Additional file 4). Summary results of sub-cellular localization of proteins are presented in Additional file 2. Carbohydrate active enzyme (CAzyme) complements of Bcin-Lang, Sscl-Lang and Sscl-Lmut were investigated and the most abundant CAzymes in all three pathogens were Glycoside Hydrolases (GHs) classes.

Prediction of effector genes in Sscl-Lang, Sscl-Lmut and Bcin-Lang
Putative effector genes were predicted using the intersect of EffectorP and SignalP predictions, and were then compared with databases of known pathogenicity factors DFVF [22] and PHI-base [23] (Additional files 2 and 5). This resulted in identification of 98 candidate effector proteins in Bcin-Lang, 94 in Sscl-Lang and 96 in Sscl-Lmut (Additional file 5). Pfam domain assignments were not common among these candidates, however the most commonly assigned was "fungal hydrophobin" (PF06766). Using the same approach, 80 and 74 candidate effector genes were also predicted in reference isolates of B. cinerea B05.10 and S. sclerotiorum 1980-UF, respectively, which were used in subsequent presence-absence variation (PAV) analysis (Additonal file 4, see below).

Presence/absence variation
Sequence conservation analysis showed distinct regions of PAV across Sscl-Lang, Sscl-Lmut and Sclerotinia spp. Similar patterns were also identified between Bcin-Lang and Botrytis spp. (Fig. 1). Isolate-specific genes were identified by reference alignment (Additional file 2, Additional file 6) and by orthology (Additional file 4, Table 2). We found one gene (bcinT_12260) in Bcin-Lang that was missing from reference isolate which was associated with the DnaJ domain (PF00226).

Discussion
We report the genome assemblies of two isolates of S. sclerotiorum (Sscl-Lang and Sscl-Lmut) and one isolate of B. cinerea (Bcin-Lang), isolated from Western Australian lupin hosts. In comparison of these regionally-diverged isolates, that may have also potentially undergone some level of host-specific adaptation, several genome features are expected to vary. First among these we expect background variations in repetitive DNA contents due to their separate evolutionary histories. If host-specific adaptation has also occurred, then we also expect variation in metabolic enzymes involved in degradation of host tissues and in gene content with roles in pathogenicity. In testing for variation in gene content we have employed three methods in parallel: 1) enrichment analysis for functional annotations; 2) PAV analysis of gene orthologs; and 3) PAV of large regions of DNA.

Comparison of general genomic features
The new genome assemblies of the lupin-infecting isolates appear to contain similar gene contents but differ in repetitive content compared to their respective reference isolates. The genome length of Bcin-Lang was 41.9 , with fewer repetitive regions predicted (6.32 and 6.46% respectively, vs 9.5% in the 1980 reference). The number of predicted genes in Bcin-Lang was 13,353, which is higher than in B05.10 (11,701) [15]. Sscl-Lang and Sscl-Lmut were also predicted to have more genes (12,196 and 12,146, respectively) compared to the 1980 reference isolate (11,130 [14]). The observed variation in the genome size, repeat content, and number of predicted genes is most likely due to differences between a near-complete reference and highly-fragmented short-read assemblies. The lack of isolate-specific RNA sequencing data to guide gene prediction in this comparative study may have also been a factor, however this was offset by the use of reference isolate RNA-seq and gene annotations to assist gene prediction in these novel isolates.
Regions of sectional gene absence in reference genomes (see methods), yielded a handful of effector candidate genes, with one specific to the B. cinerea reference (Bcin01p03900), and six (Bcin12p06110, Bcin12p06760, Bcin13p00040, Bcin04p06980, Bcin08p00020, Bcin09p0 7100) conserved across all isolates (including Bcin-Lang) except T4 (Additional file 2, Additional file 6). No effector candidates were found in regions of sectional gene absence for comparisons of S. sclerotiorum however a putatively-secreted cerato-platanin gene (sscle_11g081570) was identified as conserved in both the S. sclerotiorum reference and Sscl-Lmut, but absent from Sscl-Lang. This family of proteins have been recently demonstrated to be important for the virulence of Sclerotinia sclerotiorum on Arabidopsis and Nicotina benthamiana [24,25]. Within sectional PAVs, other functional annotations were sparse and those identifiable were most commonly cytochrome P450s, CAZymes, transcription factors or protein binding functions (Additional file 2, Additional file 6), with no clear indication of their potential roles in host-range adaptation. We also observed all genes of accessory chromosomes 17 and 18 of the B. cinerea B05.10 reference isolate [15] were absent in both the Bcin-Lang and T4 isolates, although their functional roles and consequences of their loss have yet to be determined [26]. For gene orthologs present/absent in lupin isolates we found a gene in Bcin-Lang (bcinT_12260) that was missing from the Botrytis reference, which was associated with the chaperone DnaJ/Hsp40 family of proteins (Table 1). Studies on Ustilago maydis (maize pathogen) and Fusarium oxysporum (tomato pathogen) suggest that some members of this family may have roles in virulence [27]. Further investigation is needed to identify the possible role of this chaperone protein in virulence of Bcin_Lang.
The repeat contents of regions of the S. sclerotiorum and B. cinerea reference isolates exhibiting PAV (by MUMmer) relative to the lupin-infecting isolates were analysed. We observed S. sclerotiorum to have repetitive sequences in~12.5-14.1% of the reference genome exhibiting PAV with lupin-infecting isolates, compared to~4% in its conserved regions (Additional file 7). Conversely, we observed B. cinerea to have similar proportions of repetitive sequences in both PAV and conserved regions at~4%. This may potentially indicate a relationship between repetitive DNA and variable genome regions which are related to host-adaptation in S. sclerotiorum but not B. cinerea.

Genome features involved in plant pathogenicity
Carbohydrate-degrading enzymes are involved in the metabolic breakdown of host cell components during infection [28]. CAZyme profiles were highly similar between isolates versus the references (Additional file 8) [29]. However one gene (bcinT_03819) was present only in Bcin-Lang and predicted to have GH43 (glycosyl hydrolase), CBM1 and CBM6 (carbohydrate-binding module) activities, of which CBM6 was not predicted for any of the reference isolate proteins. CBM1 and CBM6 proteins are usually observed to have cellulose−/xylan-binding activities, with the former almost never found in non-Fungi.
RIP is a mutagenesis process specific to some fungi, which in some cases may play an important role in the evolution of pathogenicity-related genes or genome regions [30,31]. Regions rich in AT bases are typically signatures of RIP, which can be identified within fungal genomes along with genes that are associated with them [21]. Previously the S. sclerotiorum 1980 reference isolate (isolated from Phaseolus vulgaris) was observed to contain negligible AT-rich content [14,21]; however, we observed Sscl-Lang (isolated from L. angustifolius) had (See figure on previous page.) Fig. 1 Genome features, mutation profiles and presence-absence variations across isolates of Botrytis cinerea and Sclerotinia sclerotiorum. Chromosomes of reference isolates Botrytis cinerea B05.10 (a) and Sclerotinia sclerotiorum 1980-UF (b) are visualised alongside (from the innermost ring outwards): genome features including G:C content, gene and repeat densities; ratios of non-synonymous to synonymous mutations (Dn/Ds) relative to alignments of lupin-infecting isolates over 100 kb intervals, and the percent of 100 kb regions that aligned to lupin-infecting isolate assemblies. Yellow boxes indicate large regions of absence in the reference isolate relative to lupin-infecting isolates (Additional file 2) Table 2 Count of variable genes belonging to predicted functional categories across lupin-infecting isolates. Genes unique to lupininfecting isolates were grouped into functional categories predicted by SignalP, EffectorP or Pfam (Additional file 2). Isolatespecificity was determined within sequences with predicted presence-absence variation (PAV) relative to their respective reference isolate, or by non-orthology (N-O) between the gene sets of isolates of the same species (Additional file 4)

Species
Botrytis cinerea Sclerotinia sclerotiorium 0.88% AT-rich content, which although very low was predicted as distinct from the rest of the genome by OcculterCut. This analysis identified a single AT-associated gene SLangusT_02752 that was also unique to that isolate but of unknown function and was not in our candidate effector list. The AT-rich proportion of Bcin-Lang was higher at 4.78% proportion than in genomes of alternate isolates B05.10 (0.932%), T4 (3.6%) and BcDW1 (4.62%) [21]. Bcin-Lang AT-rich regions were associated with 22 predicted genes (Additional file 4), of which five genes (bcinT_12252, bcinT_12257, bcinT_13074, bcinT_13259, bcinT_13332) were isolate-specific (identified as PAV) with no orthologs in the reference genome. None of these genes were predicted to encode candidate effector proteins. However, one gene (bcinT_13074) had a blast match to DFVF (e-value = 5.00E-04) to a putative pathogenicity-related ABC transporter protein of Magnaporthe grisea (Q3Y5V5_MAGGR). Out of these 22 genes, four (bcinT_05697, bcinT_05698, bcinT_11190, bcinT_12252) were assigned Pfam domains of unknown function, reverse transcriptase, and tannase/feruloyl esterase activity (PF07727, PF07519 and PF12013). The various isolates of Botrytis and Sclerotinia appear to have small differences in pathogenicity-related gene content. It may be possible that differences in the relative number of genes grouped by functional annotation may reveal adaptations specific to each isolate that may relate to pathogenicity, environment or response to disease controls. Comparison of functions showed several were over-represented in Bcin-Lang, Sscl-Lang and Sscl-Lmut compared to other isolates, and also to relevant groups of pathogens including legume-and dicot-infecting (Additional file 1). Heterokaryon incompatibility gene (HET), involved in determining the compatability of anastomosis and genetic transfers between cells [32], was more abundant in Bcin-Lang compared to B. cinerea B05.10 isolate [6] (P = 0.031). A recent study in genetic diversity of 13 different isolates of B. cinerea suggested that regions of increased genetic diversity were associated with HET loci [33]. Bcin-Lang and Sscl-Lang genomes also encoded a large number of cytochrome P450 proteins [34] and Bcin-Lang, Sscl-Lang and Sscl-Lmut were enriched in fungal specific transcription factor domain (PF04082) versus their respective reference isolates. NB-ARC domain proteins, which are usually associated with non-self recognition [35], were enriched in Sscl-Lang. However the biological roles of the above functional domains are broad and obscure [33,34,36] and the selection pressures they are geared towards would be experimentally challenging to characterise, but we assume that they are likely to have non-pathogenic roles.
Fungal effector proteins are employed by some plant pathogens to promote host colonisation, prevention of host defence responses or otherwise altering host physiology, and also influence pathogen lifestyle and host specificity [37]. Hence, identifying effector repertoires within pathogen genomes and determining their functions is an important step towards developing durable resistance in plants [38]. Several studies have previously computationally predicted effector proteins in Botrytis and Sclerotinia [6,14,16]. We predict 98 effector proteins in Bcin-Lang, out of which 22 were unique (orthology-based) to that isolate. For Sscl-Lang and Sscl-Lmut, 5 and 3 isolate-specific effector candidates were predicted from 94 and 96 predicted effector candidates respectively. None of the isolate-specific effector candidates were assigned functional annotations or matched known pathogenicity factors, with the exception of SLmutabT_10471 which matched a putative pathogenicity gene from Blumeria graminis [DFVF: Q00639_BLUGR] (Additional file 5). Overall, these subtle differences may indicate small variations in gene content between the isolates, which might be relevant to the process of adaptation of broad host range pathogens to a specific host over time.

Conclusion
In this study, isolates of the broad host-range pathogen species Sclerotinia sclerotiorium and Botrytis cinerea that were isolated from Lupinus spp. were compared to isolates from alternate hosts. With the novel isolates originating in Western Australia from a lupin host, the B. cinerea B05.10 reference isolate from an unknown host in Germany, and the S. sclerotinia 1980 UF-70 isolate from a Phaseolus host in Nebraska, USA [6], these isolates diverged sufficiently to enable bioinformatic detection of numerous sequence variations. Comparisons of isolate genomes revealed minor differences in gene content and/or sequence, some of which may be related to pathogenesis. Overall we observed high levels of similarity, with minor variations such as in repetitive DNA or AT-rich region content, and gene functions unrelated to pathogenicity. Some of these observations are confounded by variable qualities and completeness of genome assemblies, particularly in repeat-rich regions, and lack of supporting data from RNAseq or additional isolates. However among the pool of variable genes that we observed, a small pool of effector-like candidates were predicted, which present interesting opportunities for future analyses. We conclude that comparative genomics can be usefully applied to the predictive analysis of host-specific pathogenicity mechanisms at an intra-species or inter-isolate level in broad host-range fungal pathogens.

Isolate sampling and genomic DNA extraction
Genomic DNA of two isolates of S. sclerotiorum were obtained from L. angustifolius (Western Australia, Department of Primary Industries and Regional Development (DPIRD)) and L. mutabilis (Mt. Barker, 2007, DPIRD), and one isolate of B. cinerea was isolated from L. angustifolius (South Perth, 1994, DPIRD, WAC9891). Samples were collected by and used in this study with the permission of DPIRD. Seven day-old fungal mycelium was inoculated into 100 mL of half strength potato dextrose broth (PDB), which was incubated at 20°C in the dark and agitated at 100 rpm for 4 days. Fungal cultures were centrifuged at 10,000 x g for 20 mins and pellets were washed with sterile-distilled water and freeze-dried overnight. Fungal genomic DNA was extracted by the CTAB method [39]. DNA concentrations were quantitated using a Qubit 2.0 fluorometer (Invitrogen, Waltham, MA). For simplicity, we call these isolates Sscl-Lang, Sscl-Lmut and Bcin-Lang, respectively.

Comparative genomics
Different types of genomic comparisons were made with a view to better understand pathogenic differences between isolates from different hosts. We performed statistical assessment of protein functional attributes, as well as orthology-based functional comparison between lupin infecting and other host-infecting pathogens. The number of genes possessing certain Pfam domains was collected from Integrated Microbial Genomes (IMG/MER) for all published fungal pathogens (as of June 2017), and compared to the functional annotations for the newly sequenced lupin-infecting isolates. Fisher's exact tests were used to assess statistical significance for over-or -under-representation of functional annotations between Sscl-Lang, Sscl-Lmut, Bcin-Lang, B. cinerea B05.10, S. sclerotiorum 1980 UF-70 [6] and S. homoeocarpa 04-21 [58]. Lupin-infecting isolates were also compared to average counts of relevant groupings of multiple isolates/species, including: lupin-infecting, legume-infecting and dicot-infecting (Additional file 1). We used Proteinortho v5.16 to identify orthologs of Sscl-Lang and Sscl-Lmut compared to predicted proteomes of the S. sclerotiorum isolate 1980, [14], then identified non-orthologous genes that were specific to a single isolate. Similarly we compared Bcin-Lang with B. cinerea isolate B05.10 [15].