Phylogenetic relationship between Australian Fusarium oxysporum isolates and resolving the species complex using the multispecies coalescent model

Achari, Saidi R.; Kaur, Jatinder; Dinh, Quang; Mann, Ross; Sawbridge, Tim; Summerell, Brett A.; Edwards, Jacqueline

doi:10.1186/s12864-020-6640-y

Research article
Open access
Published: 20 March 2020

Phylogenetic relationship between Australian Fusarium oxysporum isolates and resolving the species complex using the multispecies coalescent model

Saidi R. Achari^1,2,
Jatinder Kaur¹,
Quang Dinh¹,
Ross Mann¹,
Tim Sawbridge^1,2,
Brett A. Summerell³ &
…
Jacqueline Edwards^1,2

BMC Genomics volume 21, Article number: 248 (2020) Cite this article

6599 Accesses
31 Citations
1 Altmetric
Metrics details

Abstract

Background

The Fusarium oxysporum species complex (FOSC) is a ubiquitous group of fungal species readily isolated from agroecosystem and natural ecosystem soils which includes important plant and human pathogens. Genetic relatedness within the complex has been studied by sequencing either the genes or the barcoding gene regions within those genes. Phylogenetic analyses have demonstrated a great deal of diversity which is reflected in the differing number of clades identified: three, five and eight. Genetic limitation within the species in the complex has been studied through Genealogical Concordance Phylogenetic Species Recognition (GCPSR) analyses with varying number of phylogenetic ‘species’ identified ranging from two to 21. Such differing views have continued to confuse users of these taxonomies.

Results

The phylogenetic relationships between Australian F. oxysporum isolates from both natural and agricultural ecosystems were determined using three datasets: whole genome, nuclear genes, and mitochondrial genome sequences. The phylogenies were concordant except for three isolates. There were three concordant clades from all the phylogenies suggesting similar evolutionary history for mitochondrial genome and nuclear genes for the isolates in these three clades. Applying a multispecies coalescent (MSC) model on the eight single copy nuclear protein coding genes from the nuclear gene dataset concluded that the three concordant clades correspond to three phylogenetic species within the FOSC. There was 100% posterior probability support for the formation of three species within the FOSC. This is the first report of using the MSC model to estimate species within the F. oxysporum species complex. The findings from this study were compared with previously published phylogenetics and species delimitation studies.

Conclusion

Phylogenetic analyses using three different gene datasets from Australian F. oxysporum isolates have all supported the formation of three major clades which delineated into three species. Species 2 (Clade 3) may be called F. oxysporum as it contains the neotype for F. oxysporum.

Introduction

The Fusarium oxysporum species complex (FOSC) is a group of economically important pathogenic [1] and putatively non-pathogenic strains which are morphologically similar but phylogenetically distinct [2, 3]. Members of this species complex display considerable ecological plasticity. Putatively non-pathogenic isolates are readily isolated from soil and roots of asymptomatic plants from both agricultural and natural ecosystems as endophytes [4] or as isolates which suppress soil-borne pathogens including pathogenic isolates of F. oxysporum [5, 6]. Furthermore, members of the FOSC are also associated with decayed plant material as saprophytes [7]. Plant pathogenic isolates are responsible for causing rots, damping-off and vascular wilts on a broad range of agronomically and horticulturally important crops [1]. There are also clinically important isolates which act as opportunistic pathogens causing infections in animals and immuno-suppressed humans [8, 9].

Despite having both mating-type genes, F. oxysporum has not been found to display a sexual life cycle. Historically, F. oxysporum taxonomy was based on the morphology of the asexual propagative structures. This led to a very broad species definition [10] which did not reflect the variability and genetic divergence within the species [11]. The intra-specific divergence was acknowledged by the concept of forma specialis (f.sp.) by Snyder et al. [12], which is a non-taxonomic entity. It is based on the pathogen-host specificity, although most isolates are putatively non-pathogenic soil inhabitants [13]. There are 106 well-characterised formae speciales (ff. spp.) [14] infecting more than 100 plant species [1, 15]. The current understanding of F. oxysporum as a species complex, comprising of many species and clades [16, 17], is far removed from the original broad species definition provided by Snyder et al. [10].

The advent of molecular sequencing technologies has enabled the study of phylogenetic relationships between the members of FOSC using multi-gene genealogies. Multi-gene genealogies use combinations of different mitochondrial and/or nuclear barcoding gene regions and have been increasingly used for molecular systematics. An early phylogenetic study by O’Donnell et al. [16] of 33 F. oxysporum isolates using two barcoding gene regions, translation elongation factor (tef-1α) and mitochondrial small subunit (mtSSU rDNA), divided the FOSC into three monophyletic clades. Laurence et al. [18] used the same barcoding loci and reported that 45 Australian F. oxysporum isolates from the natural ecosystem separated into five clades. Clade 4 comprised of only Australian isolates. More recently, Lombard et al. [19] identified eight clades within the FOSC using four barcoding gene regions, β-tubulin II (tub2), calmodulin (cal), the second largest subunit of DNA-dependent RNA polymerase II (RPB2) and tef-1α.

The uptake of whole genome sequencing resulting from low cost and high throughput of next-generation sequencing platforms has allowed the use of complete protein coding genes and complete mitochondrial (mt) genomes for phylogenetic analysis. The mt genome is present in high copy numbers which allows for mutations to occur without lethal impact [20]. This brings about an accelerated rate of evolution, making the mt genome a suitable region to study eukaryotic evolution [21]. Furthermore, gene loss appears to be irreversible [21] and the transfer of genetic material between or into the mt genome is thought to be limited [20]. Since the mt genome is relatively small, it can be studied in its entirety. The mitochondrial genome consists of two regions, a conserved region with relatively low levels of sequence variation and the large variable region (LV) [22] containing numerous sequence variations [23]. Sequence variations in this region are due to recombination events caused by parasexualism and this has resulted in three variant type sequences within the mitochondrial genome in FOSC [23]. Mt genome sequences have been used in molecular systematics and biodiversity studies of fungi at various taxonomic levels [24]. Three clades were identified in a phylogenetic analysis of the FOSC using the conserved region of the mt genome in combination with nine nuclear protein coding genes [23].

Molecular studies have demonstrated that genetic variations within the FOSC are not necessarily reflected in the ff. spp. concept. Polyphyly has further compounded the ff. spp. concept obscuring the genetic diversity of the isolates [16]. Initially, when the ff. spp. concept was attributed to phytopathogenic isolates of F. oxysporum, it was assumed that isolates which shared a host range would be more genetically similar than with isolates that did not share the same host range. Nucleic acid sequence analyses have shown that many of the ff. spp., previously assumed to be monophyletic, are polyphyletic or paraphyletic [16, 17, 25, 26]. Recent studies demonstrating that horizontal transfer of pathogenicity genes between isolates [27] counters the previous assumption that convergent evolution [28] has driven the polyphyletic phylogeny observed within the FOSC.

Identification and recognition of species within the FOSC is pivotal in areas of biology such as epidemiology (identification of novel pathogens) and evolutionary biology (describing diversification patterns) [29]. Although it is now accepted that the FOSC comprises a number of morphologically-similar cryptic species, the species boundaries and limits of genetic exchange are poorly defined, with different number of species predicted within the species complex in different studies. Two of these studies used Genealogical Concordance Phylogenetic Species Recognition (GCPSR) on different datasets for predicting the species boundaries. Laurence et al. [30] used barcoding regions of eight genes (tef1-α, mtSSU, largest subunit of DNA-dependent RNA polymerase II (RPB1), RPB2, nitrate reductase (NIR), phosphate permease (PHO), calmodulin (cal), ATP citrate lyase(acl1) and predicted two ‘species’, while Brankovics et al. [23] using the sequences of nine genes (γ-actin (act), cal, RPB2, tef1-α, tef3, 60Sribosomal protein L10 (rpl10a), topoisomerase I (top1), rDNA repeat and tub2) and the conserved part of the mitogenome predicted three ‘species’ which were concordant to the three clades in their phylogenetic analysis. Lombard et al. [19] identified 21 ‘species’ with no explanation of their model.

GCPSR in the above studies was implemented in two steps as defined by Dettman et al. [31] (i) identification of the independent evolutionary lineages (IEL) and (ii) exhaustive subdivision of isolates into phylogenetic species. IEL were identified based on concordance and non-discordance. Clades were concordant if they were supported by at least two single loci and compared to remove those that were discordant [23, 30]. IEL supported by at least half of the loci were kept as putative phylogenetic species. Each isolate had to be classified within a putative phylogenetic species. Exhaustive subdivision referred to collapsing of all the subclades of a clade when an isolate was grouped within that clade (putative phylogenetic species). This ensured that all phylogenetic species were monophyletic. The clades that remained were recognised as phylogenetic species [23, 30].

Species concepts in F. oxysporum have progressed from morphological to the use of multi-gene genealogies under GCPSR. The theoretical criteria for GCPSR developed by Taylor et al. [32] are based on Avise and Ball’s [33] genealogical concordance species concept. This states that recombination within a lineage will create conflict between gene trees and the transition from conflict to congruence represents the species limit [32]. However, there are other processes such as incomplete lineage sorting, horizontal gene transfer and population structure which could cause discordance between gene trees and species trees, masking true evolutionary relationships between closely related taxa [34]. Furthermore, the common practice of concatenating sequence data from multiple loci under GCPSR can lead to inaccuracies in species identification [35]. Alternatively, multispecies coalescent (MSC) models that incorporate gene tree uncertainty into species recognition may more accurately and objectively delimit species. Estimation of the speciation process using MSC model provides a more comprehensive speciation event as it recognises more gene discordant events than GCPSR. *Beast uses a multispecies based coalescent model for species delimitation using multi-locus sequence data [36]. Under this model, the gene trees are “embedded” in the species tree following stochastic coalescent processes while allowing for independent evolutionary processes in each genomic region [37]. A maximum clade credibility tree with posterior probability support for the nodes is computed from the gene trees. This is the species tree with each node denoting a species and the posterior probability support of the node showing the support for the denoted species to be called a species. This model allows testing of different scenarios for species assignments to find the best species fit for the lineage. One advantage of this model over other models is that it allows for the integration of knowledge from multi-gene trees into a single higher-level species tree during the delimitation process removing the constraint of specifying a guide tree for depicting species relationships [38].

MSC model-based species discrimination has previously been used for finding species boundaries in animal [39, 40] and plant taxa [41] and now it is being gradually adopted for resolving species complexes in fungal taxa. This model has been used by Stewart et al. [38] for species delimitation in a global population of the asexual fungus, Alternaria alternata, and by Liu et al. [42] for establishing species boundaries in the pathogenic fungal genus, Colletotrichum, which has a sexual state. Additionally, although a sexual state is unknown for Fusarium oxysporum, both mating-type genes are present, so the sexual cycle may have occurred at some point in its evolution.

Objective

Previous studies identified considerable diversity within the FOSC using different datasets and methods. This has resulted in varying numbers of clades and species described. Therefore the objectives of this study were (i) to determine the phylogenetic relationships between Australian F. oxysporum isolates from natural and agroecosystems using three different datasets: the whole genome, the conserved region of the mitochondrial genome and eight informative nuclear genes (concatenated multi-loci), comparing them with previously published phylogenetic analyses, and (ii) to group the isolates into well supported lineages, i.e. ‘species’, using the multispecies coalescent model and to compare the species boundaries in the previous studies using the MSC model.

Results

Mitochondrial genome dataset

Mitochondrial genome sequences

The mitochondrial genome is divided into two parts based on sequence variations [22]. There is the conserved region of the mitochondrial genome and a region that shows higher levels of variation than any other parts of the mitogenome. This region is referred to as the large variable region (LV) which is located between rnl (mitochondrial LSU rRNA gene) and mitochondrially encoded NADH dehydrogenase 2 (nad2).

Sequences of the LV region of the mitochondrial genome were used to determine the mitochondrial genome variant type of the isolates. Variant 1 type mitochondrial genomes were the most dominant with 87 isolates, seven isolates belonged to Variant 2 and five isolates belonged to Variant 3. The average length of the LV region was significantly different (p < 0.05) between the mitochondrial genome variant types, with 11,515 bp for Variant 1, 17,738 bp for Variant 2 and 6065 bp for Variant 3 (Supplementary Figs. 1 and 2).

The average mitochondrial genome size also varied significantly (p < 0.05) between the variant types (Supplementary Table 1, Supplementary Figs. 2 and 3). Variant 1 was 44,455 bp, Variant 2 was 50,327 bp and Variant 3 was 37,148 bp.

The average size of the conserved region of the mitochondrial genome varied significantly between Variant 1 and Variant 3 only (Supplementary Figs. 2 and 4). The average size of the conserved region of Variant 1 was 32,940 bp, Variant 2 was 32,589 bp and Variant 3 was 31,082 bp. The sequences were very conserved between the variant types. They formed two clusters when compared against each other for percentage similarity using cd-hit-est (http://weizhong-lab.ucsd.edu/cdhit-web-server/cgi-bin/index.cgi) [43, 44]. Cluster one had sequences with more than 95% sequence identity, while cluster two had only 2 isolates (VPRI10358 and VPRI10405) with 92% sequence identity.

There were introns present in the following protein coding genes of the mitochondrial genome: mitochondrially encoded NADH dehydrogenase 5 (nad5) and mitochondrially encoded ATP synthase membrane subunit 6 gene (atp6) had one intron, and mitochondrially encoded cytochrome b (cob) had two introns (Supplementary Table 1). All the isolates had an intron of 1009 bp in nad5. Introns in cob ranged from 200 to 500 bp. There were 17 Variant 1, four Variant 2 and one Variant 3 isolates with an intron in position 1 of cob while only three isolates, two Variant 1 and one Variant 2, had an intron in position 2 of cob (Supplementary Table 1). Only three Variant 1 isolates had an intron in atp6 which varied in length from 328 bp to 1238 bp (Supplementary Table 1).

Phylogenetic analysis

The phylogenetic relationship between the isolates was studied using the conserved and the LV regions of the mitochondrial genome. Four phylogenetic trees were constructed from the mitochondrial genome dataset: the conserved region (Fig. 1) and the LV region of the three mitochondrial genome variant types (Supplementary Figs. 5 and 6). The conserved region had 11,899 sites, of which 9785 were conserved sites and 2111 were variable sites. Of these variable sites, 1044 were parsimony informative sites.

The maximum likelihood (ML) tree generated from the conserved mitochondrial region formed four well-supported clades (Fig. 1). Clades 1, 2, 3 and 4 had 15, 52, 30 and 2 isolates respectively. Clade 1 consisted solely of F. oxysporum f.sp. canariensis (Foc) isolates and isolates from the natural ecosystems, plus one isolate (VPRI42181), isolated from a symptomatic tomato seedling (Supplementary Table 1). None of the F. oxysporum f.sp. pisi (Fop) isolates were in Clade 1. Foc isolates were also present in other clades. Clade 4 contained only two isolates RBG6505 and RBG5714. There was no correlation between the variant type and the clades in which they were grouped. Variant 1, 2 and 3 isolates were spread throughout the clades.

The LV region of Variant 1 type mitochondrial genome isolates had 5504 sites with 4806 conserved and 698 variable sites. Two hundred and forty-four of these variable sites were parsimony informative. The LV region of Variant 2 type mitochondrial genome isolates had 9713 sites of which 7540 sites were conserved and 2173 were variable, of which 271 were parsimony informative. The LV region of Variant 3 type mitochondrial genome isolates had 4209 sites with 3950 conserved and 259 variable sites. Out of these 259 sites, 214 were parsimony informative.

Phylogenetic analysis of the LV region of the Variant 1 type mitochondrial genome isolates resulted in three well-supported Clades (Supplementary Fig. 5). Clades 1 and 3 have many sub-clades while Clade 2 has only one isolate, VPRI42176. Phylogenetic analysis of the LV region of the Variant 2 type mitochondrial genome has four clades (Supplementary Fig. 6), while there are three clades in the LV region phylogeny of the Variant 3 type mitochondrial genome (Supplementary Fig. 5) with Clade 2 having a single isolate, RBG5844.

Comparison to earlier studies

Brankovics’s

Comparison of the conserved region of the mitochondrial genome phylogenies from the current study with Brankovics et al. [23] study, showed that both phylogenies were congruent. The isolates from the three clades identified in Brankovics et al. [23] phylogeny and used as reference sequences grouped with the isolates of the respective clades in the conserved region of the mitochondrial genome phylogeny in the current study (Fig. 1).

Whole genome dataset

Whole genome sequence

There were 6800 genes conserved across the genomes of all isolates including the outgroup (99 from this study and 10 from National Center for Biotechnology Information (NCBI GenBank)). These were determined by concatenating the protein sequences of orthologous protein groups created using Basic Local Alignment Search Tool-Protein (BLASTP NCBI) and TRIBE Markov Cluster (MCL) [45].

Phylogenetic analysis

The phylogenetic tree built with 6800 genes was used to study the relationship between the isolates from the natural and agroecosystems. The whole genome phylogeny gave a better resolution and population structure of the isolates than the phylogeny from the other two datasets. The whole genome phylogeny formed five well-supported clades with nodes having a local support value of 1 (100%) (ranges from 0 to 1) and separated by short branch lengths (Fig. 2). Clades 4 and 5 contained a single isolate each (RBG6505 and RBG5714 respectively). There was strong bootstrap support for the clades, with most of the nodes having 100% support. There were many highly supported sub-clades within the three major clades. Clade 1 had three highly supported sub-clades (a, b, c) consisting of 15 isolates and two reference isolates (F. oxysporum f.sp. cucumerinum, Foc011 and Foc013). Clade 2 had four very highly supported sub-clades (a, b, c, d) consisting of 52 isolates and three reference isolates [F. oxysporum f.sp. conglutinans (NRRL54008), F. oxysporum f.sp. raphani (NRRL54005) and F. oxysporum f.sp. vasinfectum (NRRL25433)]. Clade 3 had two single lineages (a) and four highly supported sub-clades (b, c, d, e). There were 30 isolates and three reference isolates [F. oxysporum f.sp. melonis (NRRL26406), F. oxysporum f.sp. lycopersici (Fol4287) and F. oxysporum f.sp. radicis cucumerinum (Forc016)]. Clade 1 isolates consisted mostly of those from the natural ecosystems (NE) and F. oxysporum f. sp. canariensis (CAN), while other clades contained isolates from the agroecosystem. There was no F. oxysporum f.sp. pisi isolate present in Clade 1.

Nuclear gene dataset

Nuclear gene sequences

Eight single copy nuclear genes were concatenated with each gene having different number of informative sites (Table 1). The translation elongation factor 3 was the most informative gene while Calmodulin gene being the shortest gene had comparatively the least number of informative sites.

Table 1 The variability of the individual loci used for nuclear gene dataset phylogenetic analyses and species estimation

Full size table

Phylogenetic analysis

The phylogenetic analysis using the nuclear gene dataset (concatenated eight nuclear single copy genes) resulted in five well-supported clades. Clades 1, 2 and 3 have highly supported sub-clades. Clades 4 and 5 had a single isolate each (VPRI11409 and RBG5714 respectively). The ML tree topology was identical to the Bayesian inference (BI) tree topology, therefore, only the ML tree is presented (Fig. 3).

Individual analyses of the full sequences of the eight gene regions (tub2, cal, mtSSU, RPB1, RPB2, tef1-α, tef3 and Topoisomerase I (Top1)) showed varying degrees of resolution for the formation of five clades. Apart from cal and tef1-α, all other genes had very high support (bootstrap value > 70%) and resolution for grouping of the isolates. Top1 had high statistical support for the formation of Clade 1 and sub-clades in Clade 3 (a, b, c and d) (Fig. 2). Additionally, tub2 supported the formation of Clade 2b (Fig. 2). mtSSU supported the formation of sub-clade 3e (Fig. 2). Tef3 supported the grouping of sub-clades 2b, 3b, 3c and 3e (Fig. 2). RPB2 provided the best resolution with high statistical support for the formation of five clades, like the nuclear gene dataset phylogeny. The clade support from individual loci is representative of the number of informative sites per locus. RPB2 and tef3 had the highest number of informative sites, hence supported the formation of more clades, and conversely cal and tef1-α had the lowest number of informative sites, hence produced polytomies. Individual locus phylogeny trees are not presented.

Clades 1, 2 and 3 obtained from the phylogenies of the three datasets were congruent except for 3 isolates: RBG5714, VPRI11409 and RBG6505. Isolate RBG5714 present in Clade 5 of the nuclear gene dataset phylogeny is also in Clade 5 of the whole genome phylogeny but is present in Clade 4 of the conserved mitochondrial genome phylogeny. Isolate VPRI11409 present in Clade 4 of the nuclear gene dataset phylogeny is in Clade 3 of the whole genome phylogeny and conserved mitochondrial genome phylogeny. Isolate RBG6505 is present in Clade 2 of the nuclear gene dataset phylogeny, but in Clade 4 as a single isolate in the whole genome phylogeny and Clade 4 with isolate RBG5714 in the conserved mitochondrial genome phylogeny. Clade 1 in all phylogenies almost exclusively (exception of a single isolate, VPRI42181) consisted of isolates from natural ecosystems and F. oxysporum f.sp. canariensis. Isolates from agroecosystems were spread across the other clades.

Comparison to earlier studies

Lombard’s, Brankovics’s, Laurence’s and O’Donnell’s datasets

The diversity of the complex was studied by comparing the clades in the current study with previously published phylogenies. The combined dataset from the current study and Lombard et al. [19] produced a phylogenetic tree with two clades (Supplementary Fig. 7). One clade consisted of a single isolate, and the rest were in the other clade. There were many sub-clades within this clade but with poor node support. The isolates from the current study grouped with isolates belonging to seven of the 21 ‘species’ from their study (Table 2). These ‘species’ were F. odoratissimum, F. nirenbergiae, F. contaminatum, F. languescens, F. triseptatum, F. oxysporum and F. hoodiae.

Table 2 Comparison of the clades from Lombard’s dataset phylogeny to the clades in the current study phylogenies

Full size table

The isolates representing the three clades in Brankovics et al. [48] study grouped with the isolates from the respective clades in the current study (Table 3), thus suggesting concordance between the phylogenetic trees obtained in their study to the current study.

Table 3 Summary of the relationship of the clades from the nuclear gene dataset from the current study to the previous studies

Full size table

The combined dataset from the current study and Laurence et al. [30] produced a phylogenetic tree with two clades having many sub-clades (Supplementary Fig. 8). All Clade 1 isolates from the current study grouped with their isolates from Clade 1 (Table 3), which had been determined by GCPSR analysis to be phylogenetic ‘species’ 1. The remaining isolates from the current study grouped with their isolates from Clades 2–5 (Table 3), which were determined by GCPSR analysis to be phylogenetic ‘species’ 2.

The isolates representing the three clades from O’Donnell et al. [16] grouped with the isolates from the three clades in the current study (Table 3). None of the reference isolates from their study grouped with the isolates from Clades 4 and 5 from the current study.

Species tree estimation

Applying the MSC model on the eight single copy nuclear genes (tub2, cal, RPB1, RPB2, tef1-α, tef3, top1 and mtSSU) resulted in the recognition of three species within the FOSC. The three concordant Clades (1, 2, and 3) in all the phylogenetic analyses (Figs. 1, 2, 3) were delimited as separate species with 100% posterior probability support at the nodes (Fig. 4). Species 1 in the tree contained isolates from Clade 1, species 2 contained isolates from Clade 3 and species 3 contained isolates from Clade 2. Species 2 (Clade 3) includes the neotype for F. oxysporum hence is the ‘true’ F. oxysporum species. The isolates RBG6505, RBG5714 and VPRI11409 which formed their own clades or grouped together in a clade in the 3 phylogenetics analyses were determined to belong to species 2. Four clades from conserved mitochondrial genome phylogeny, five clades from the nuclear gene dataset phylogeny and whole genome phylogeny and seven sub-clades within Clades 1, 2 and 3 from whole genome phylogeny were all tested as potential species, but the model gave very poor posterior probability support for the presence of four, five and seven species within the FOSC (data not shown).

Comparison to earlier studies

Analyses of the datasets from previous studies, using the MSC model, showed that the results from the MSC model were concordant with the GCPSR, which was used for species delimitation in these studies. There was 100% posterior probability support for 2 ‘species’ within the FOSC from the Laurence et al. [30] dataset (not presented). There was also 100% posterior probability support for the presence of 3 ‘species’ within the FOSC from the Brankovics et al. [23] dataset (not presented).

Testing the Lombard et al. [19] dataset using the MSC model resulted in some well-supported nodes, depicted by species numbered 21, 17 and 9 (Supplementary Fig. 9). These were F. veterinarium, F. oxysporum and F. foetens respectively. F. foetens was used as an outgroup in their phylogenetic analyses. Species 17 contained the neotype for F. oxysporum isolated from potato tuber (Solanum tuberosum), so represents the ‘true’ F. oxysporum. There was support for the presence of 2 ‘species’ but not enough information within their dataset for supporting the other 19 ‘species’ that were identified in their study.

Discussion

The aim of this project was to use whole genome, mitochondrial genome and nuclear (multi-locus) gene sequences to understand the phylogenetic relationship between Australian isolates of the F. oxysporum species complex and to group these isolates into well supported lineages, i.e. ‘species’. With over 100 well characterised ff. spp. [14] of F. oxysporum, Snyder and Hansen’s [49] definition of F. oxysporum has proven to be too broad to handle variability within the F. oxysporum population. The variability is not even reflected in the f. sp. concept as this naming concept is not a taxonomic entity.

Many studies have been carried out to understand the FOSC phylogeny using combinations of nuclear and mitochondrial barcoding gene regions [16, 18, 19, 25, 30] or genes [23] and effector genes [48, 50]. However, this is the first report of the FOSC phylogeny using the whole genome. Gene alignment and tree construction were based on 6800 genes. Whole genome phylogeny has provided a very robust phylogenetic framework, dividing the FOSC into five very well supported clades. The genome phylogeny has provided more resolution within the sub-clades providing more evolutionary information and a comprehensive population structure within these clades and the sub-clades.

Mitochondrial genome analyses revealed that there is recombination in the LV region of the mitochondrial genome which has resulted in three variant mitochondrial genome types present in Australia. Although F. oxysporum has an asexual lifecycle, recombination in the mitochondrial genome suggests that there is some mechanism which is allowing genetic exchange between the isolates. According to Brankovics et al. [23] this could be due to parasexualism. Variant 1 type mitochondrial genome was the most common (Supplementary Table S3), a finding similar to that of Brankovics et al. [23]. This finding has been supported by Xu et al. [51] who have reported that in ascomycetes there is no genetic factor that ensures uniparental mtDNA inheritance. There was no distinct pattern in the grouping of the variant types within the conserved region of the mitochondrial genome phylogeny. The three variant mitochondrial genome types were present in all three clades. The phylogenetic tree of Variant 2 mitochondrial genome type (Supplementary Fig. 6) is congruent with the conserved mitochondrial region phylogeny suggesting co-evolution of the conserved mitochondrial genome region and the LV region in these isolates. The conserved region of the mitochondrial genome phylogeny (Fig. 1) has four well-supported clades while in Brankovics et al. [23], there were only 3 clades. The nuclear gene dataset supported the formation of five well-supported clades (Fig. 4). Most isolates were in Clades 1, 2 and 3, with Clades 4 and 5 each made up of a single isolate. This shows that there is considerable genetic diversity in Australian F. oxysporum and this is in agreement with Laurence et al. [18] who had five clades within the FOSC from natural ecosystems when using the same gene barcoding region as O’Donnell et al. [16], whose isolates only grouped into three clades. Apart from Lombard et al. [19] and Laurence et al. [18] dataset, all other studies had three clades in the phylogenetic analyses. Greater genetic diversity within the Australian FOSC and Lombard et al. [19] datasets could be due to convergent evolution of isolates with their hosts [52]. Unique agricultural and ecological environments provide conducive environments for the pathogen to evolve. Another explanation for higher diversity, despite being an asexual pathogen, is the ability of the pathogen to transfer genes horizontally [53].

Different genes provide different topologies to a phylogenetic tree due to differences in their evolutionary history. Three basic sources of topological variations are mutation, lineage sorting and phylogenetic reconstruction artifacts [54]. Mutation and lineage sorting are natural sources of variation between genes. Mutation is caused stochastically and is more prevalent in short genes, preventing these genes in different species or isolates to truly reflect their phylogeny [54]. Lineage sorting is when diverging lineages maintain random ancestral polymorphism of a gene. Base-compositional bias, saturation of substitutions and artificial grouping of the most rapidly evolving lineages are some of the phylogenetic reconstruction artifacts which can also be a reason for topological variation in phylogenetic trees. The fact that previous studies except for Brankovics et al. [23] have used different gene regions may explain why the phylogenies are incongruent. Furthermore, these studies have used only the gene barcoding regions thus reducing the amount of parsimony informative sites to produce a comprehensive evolutionary story. Phylogenies from Brankovics et al. [23] study and the current study shared six whole gene sequences (tub2, cal, tef1-α, tef3, RPB2 and top1) and a conserved region of the mitochondrial genome suggesting that the evolutionary history of these combined genes were similar for the isolates in both studies, presenting concordant clades.

The diversity in the dataset could be another reason why the findings from the current study differs from Lombard et al. [19] dataset. They had included strains which cause disease on humans and animals. The genes in these isolates may be subject to different selective pressures in comparison to isolates from agricultural and natural ecosystems. These selective pressures may be inducing different mutation rates in these strains, creating a different evolutionary history and this may be one of the reasons why they had many clades or groups which were absent in the current study and earlier studies.

In summary, Clades 1, 2 and 3 obtained from the three datasets were congruent, suggesting a similar evolutionary pattern for nuclear and mitochondrial genes of the isolates in these clades. There were three isolates: RBG6505, RBG5714 and VPRI11409, which made the other clades incongruent. Isolates RBG6505 and RBG5714 were grouped in Clade 4 of the conserved region of the mitochondrial genome phylogeny but were separate in the whole genome phylogeny. In the nuclear gene phylogeny, RBG 5714 was in Clade 5 and VPRI11409 was in Clade 4.

In the whole genome and nuclear gene phylogenies, Clade 1 had diverged from the other clades early, while the other clades have separated from each other more recently. Clade 1 has been hypothesised as the ancestral clade of FOSC, originating in South East Asia due to the association of Clade 1 ff. sp. with hosts that evolved in that region [16]. This hypothesis was based on the initial study of the species complex. With more sampling of the complex, Clade 1 is found to be an ancestral clade, but does not seem to have originated from South East Asia as it has isolates associated with hosts originated from other regions as well. Clade 1 in the current study contained isolates from natural ecosystems and F. oxysporum f.sp. canariensis isolates and a single isolate which had been isolated from a symptomatic tomato seedling.

Accurate establishment of species boundaries and delimitation of species is critical to taxonomy and informs biosecurity and disease control. Concatenation analyses of multi-locus DNA sequence data represents a powerful and commonly used approach to understanding independent evolutionary lineages and phylogenetic relationships between isolates. Such data produce well-supported phylogenies which in many instances are inconsistent with the true species tree [35, 55]. Discordance can be masked between individual gene trees if well-supported clades are recognised as distinct species without implementing a careful examination of species boundaries [42]. It is not necessary for every population or lineage in a phylogenetic analysis to be recognised as a species [56].

GCPSR has been previously used in resolving the species complex in F. oxysporum. This concept uses discordance between the nodes to find the species boundaries. Discordance arises due to recombination between the genes. Previous studies on resolving the complex identified two ‘species’ [30], three ‘species’ [23] and 21 ‘species’ [19]. Lombard et al. [19] and Laurence et al. [30] used concatenated four and eight barcoding gene regions respectively while Brankovics et al. [23] used concatenated nine genes and the conserved region of the mitochondrial genome. In Laurence et al. [30] GCPSR study, Clade 1 from an earlier study [18] was resolved to being phylogenetic ‘species’ 1 while Clades 2–5 were in phylogenetic ‘species’ 2. In Brankovics et al. [23] study, the three clades were recognised as three ‘species’, while in Lombard et al. [19] study, every lineage from the concatenated multi-locus phylogeny was recognised as a ‘species’.

The current study used a different approach, the MSC model for species delimitation to recognise the species boundaries within the complex. This is a relatively new and arguably successful approach to phylogenomics whereby the evolutionary history of multilocus sequences is explained through gene trees and species trees [57], and gene trees are estimated simultaneously with the species tree for estimating phylogenetic relationships. This model has been successfully used in identification of the number of species in the Alternaria alternata [38] and Colletotrichum [42] species complexes. As far as we are aware, this is the first time this model has been used for unravelling the species complex in F. oxysporum. The model provided very strong support for the three concordant clades from the three datasets in this current study to represent the three ‘species’ within the FOSC (Fig. 4). These species are concordant with the three ‘species’ identified by Brankovics et al. [23] using GCPSR. This concordance between both the datasets having six common genes implies that both models are producing similar results. The model, however, rejected the presence of four, five and seven ‘species’ which were based on the number of clades in conserved mitochondrial genome phylogeny, nuclear gene dataset and whole genome phylogeny and sub-clades of the whole genome phylogeny respectively. Testing the two and three ‘species’ theory of Laurence et al. [30] and Brankovics et al. [23] dataset, using the MSC model, produced high support for the presence of two ‘species’ and three ‘species’ respectively giving the same prediction as that of the GCPSR. Analysis of Lombard et al. [19] dataset resulted in very well supported nodes for only two ‘species’; F. veterinarium and F. oxysporum (Supplementary Fig. 9). There was also high node support for Fusarium foetens which was the outgroup. Their dataset did not have enough information for delimiting the complex into further nineteen species with high node support. A similar finding was also reported by Liu et al. [42] whereby the concatenated multi-locus analysis was not supported by the coalescent based analysis. The potential source of discordance between the two may be explained by incomplete lineage sorting [35] of some genes in some isolates which then delineated as separate lineages. This suggests that species might be overestimated if all well-supported clades from phylogenetic analysis of single or multi-locus DNA sequence on a small sample dataset are accepted as distinct species. The species named as F. oxysporum contains the neotype for F. oxysporum (F. oxysporum isolated from rotten potato tuber from Germany) and correlates to sub-clade in Clade 3 in the current study (Fig. 2 and Supplementary Fig. 7). Clade 3 is represented as ‘species’ 2 in the current study and this may be called F. oxysporum. Clades 2 and 3 (‘species’ 3 and 2 respectively) had isolates which were obtained from both the agricultural and natural ecosystems. Clade 1, ‘species’ 1 had only isolates from the natural ecosystems (soil substrate) and F. oxysporum f.sp. canariensis except for VPRI42181 which was isolated from Lycopersicon esculentum (tomato).

Conclusion

The phylogenetic analyses of the FOSC has demonstrated considerable genetic diversity. Despite this, the current study has shown that there were three clades that were congruent in nearly all the FOSC phylogeny studies. The three datasets used in the current study showed four and five clades within the FOSC but have three clades which were concordant in all the phylogenies. Three ‘species’ were identified within the complex using the MSC model and these three ‘species’ represent the three concordant clades. This result is concordant with Brankovics et al. [23] GCPSR study. Clade 3 which is ‘species’ 2 contains the neotype for F. oxysporum, hence clade 3 may be called F. oxysporum.

Methods and materials

Isolates used

The whole genome was sequenced for a total of 99 isolates obtained from the Victorian Plant Pathogen Herbarium (VPRI) and from the Royal Botanic Gardens (RBG) Sydney collections. Fifty-two of these isolates were obtained from VPRI, with 36 isolates isolated from symptomatic plants sent for diagnosis between 1976 and 2018 (Supplementary Table 2). These isolates were confirmed as F. oxysporum using a polymerase chain reaction (PCR)-based assay with RPB1-Fa and RPB1-G2R primers. The amplicons were sequenced and blasted on GenBank and confirmed as F. oxysporum.

The remaining 16 isolates were obtained from diseased Canary Island Palm samples sent for diagnosis and were confirmed as F. oxysporum f. sp. canariensis (Foc) using a PCR-based assay with the Foc-specific primers HK66 + HK67 [58]. Forty-seven isolates were obtained from RBG. Thirteen of these were isolated from natural ecosystem soil (NE) by Laurence et al. [18], two isolates were characterised as F. oxysporum f. sp. niveum and 32 Fusarium oxysporum f.sp. pisi (Fop) isolates were collected between 2005 and 2009 during field surveys in Victoria, New South Wales and Queensland from symptomatic Pisum sativum (snow pea) plants [59]. Details of these isolates are presented in Supplementary Table 3.

Different sets of reference sequences were used for different phylogenetic analyses. For whole genome phylogenetic analysis, whole genome reference sequences of F. oxysporum f. sp. vasinfectum (NRRL25433), F. oxysporum f. sp. raphani (NRRL54005), F. oxysporum f. sp. conglutinans (NRRL54008), F. oxysporum f. sp. melonis (NRRL26406), F. oxysporum f. sp. radicis cucumerinum (Forc016), F. oxysporum f. sp. lycopersici (Fol4287), two F. oxysporum f. sp. cucumerinum strains (Foc011, Foc013) and two F. proliferatum strains (ITEM2341, NRRL62905) were included from NCBI GenBank.

For nuclear gene phylogenetic analysis, sequences of two F. oxysporum f. sp. cucumerinum strains (Foc011, Foc016), F. oxysporum f. sp. vasinfectum (NRRL25433), F. oxysporum f. sp. melonis (NRRL26406), F. oxysporum f. sp. raphani (NRRL54005) and F. proliferatum (ITEM2400) were retrieved from the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena).

For mitochondrial genome analysis, the reference sequences of F. oxysporum f. sp. cucumerinum (Foc001), F. oxysporum f. sp. cumini (F11), F. oxysporum f. sp. dianthi (Fod001), F. oxysporum f. sp. lycopersici (DF041), F. oxysporum f. sp. niveum (Fon020), F. oxysporum f. sp. pisi (NRRL37622), F. oxysporum f. sp. radicis cucumerinum (Forc016), F. oxysporum f. sp. raphani (NRRL54005), F. oxysporum f. sp. vasinfectum (NRRL25433), F. oxysporum f. sp. radicis lycopersici (NRRL26381) and F. proliferatum (ITEM2287) were included from ENA (https://www.ebi.ac.uk/ena).

Culture growth and DNA extraction

All cultures were single spored using the method described by Burgess et al. [60]. Working cultures were maintained on silica gel beads using the procedure described by Leslie et al. [61]. For DNA extraction, cultures were grown on Potato Dextrose Agar (PDA; Diffco Laboratories, Detroit) under dark incubation for 5 days at 25 °C. Two plugs from vigorously growing regions of the culture were cut out and placed in an Eppendorf tube then ground using a sterile micro-pestle before transferring into a Falcon tube containing 45 ml of Potato Dextrose Broth (PDB; Diffco Laboratories, Detroit). These tubes were placed onto a Ratek orbital shaker/mixer and gently shaken at 7RPM in the dark for five days. The resultant mycelia were harvested by filtering each isolate at a time through Miracloth™ (CalBiochem, San Diego, CA) with a pore size of 22–25 μm. This was then washed using sterile water and lyophilised for 48 h. Genomic DNA was extracted using the Cetyl trimethylammonium bromide (CTAB) protocol described by O’Donnell et al. [62].

Sequencing

Long and short-read sequencing technologies were both used for sequencing the 99 isolates for the current study. For long-read sequencing, MinION was used while for short reads, the Illumina sequencing platform was used.

MinION

MinION is a long-read DNA sequencer developed by Oxford Nanopore. Assemblies generated from MinION sequence data were used to guide assembly from the Illumina platform. Eight isolates (two from NE, one Foc, five Fop) were sequenced on MinION flow cells using the library kit SQK-LSK109 and the protocol as per 1D Genomic DNA by Ligation. Six of these isolates (RBG6477, RBG6423, RBG6462, RBG5783, RBG6313 and VPRI42117) were sequenced on the old flow cell type: FLO-MIN106, while the other 2 isolates (RBG6418, RBG6425) were sequenced on the newly released flow cell version ‘Rev D’ ASIC type: FLO-MIN106D. There were some modifications to the library preparation protocol. The fragmentation step was omitted due to the need for longer reads and the starting material was increased from 1 μg of genomic DNA to2 μg in-order to have at least 1 μg of the finished library for loading onto the FLO-MIN106 flow cell. A single library was loaded per FLO-MIN106 flow cell while for the FLO-MIN106D flow cell, 2 libraries were loaded onto a single flow cell for increased sequencing efficiency.

The raw signal data (fast5 files) from sequencing were basecalled into DNA sequence data in fastq format using Albacore v 2.3.1 (https://nanoporetech.com). Albacore was also used to categorise the basecalled reads based on the average quality score and only reads with average quality score of >Q7 were saved for genome assembly. Adapter sequences were then trimmed from the reads using Porechop v 0.2.1 (https://github.com/rrwick/Porechop) and these reads were saved in fastq format.

b) Illumina

Paired-end libraries were prepared for 99 isolates using the Illumina Nextera XT DNA library prep kit according to the manufacturer’s protocols (Illumina). These libraries were sequenced using Illumina HiSeq Rapid. Fastq sequence files generated from the sequencing run were filtered using nuclear software v3.6.16 (GYDLE Inc., Montreal, Canada), filtering sequences based on a minimum length of 50 bp and removal of adaptors. Low-quality reads (<Q20) from the fastq sequence files were filtered using fastp [63].

Data assembly and analysis

Phylogenetic analyses were carried out using three datasets: mitochondrial genome (conserved region of the mitochondrial genome), whole genome and nuclear gene dataset (concatenated eight nuclear genes). Different procedures were used for preparing and analysing these datasets.