Genome mining and UHPLC–QTOF–MS/MS illuminate the potential antimicrobial active compounds and specicity of biosynthetic gene clusters in Bacillus subtilis NCD-2

Background Bacillus subtilis strain NCD-2 is an excellent biocontrol agent against plant soil-borne diseases and shows broad-spectrum antifungal activities. This study aimed to explore some of the secondary metabolite biosynthetic gene clusters and related bioactive compounds in strain NCD-2. An integrative approach, which combined genome mining with structural identication technologies using ultra-high-performance liquid chromatography coupled to quadrupole time-of-ight tandem mass spectrometry (UHPLC-MS/MS), was conducted to interpret the chemical origins of the signicant biological activities in strain NCD-2. Results Genome mining revealed that strain NCD-2 contained nine gene clusters having predicted functions involving secondary metabolites with bioactive abilities. They encoded six known products including fengycin, surfactin, bacillaene, subtilosin, bacillibactin, bacilysin and three unknown products. Fengycin, surfactin, bacillaene and bacillibactin were successfully detected from the fermentation broth of strain NCD-2 by UHPLC-QTOF-MS/MS. Bacillaene, subtilosin, bacillibactin, and bacilysin related biosynthetic gene clusters showed 100% amino acid sequence similarity with B. velezensis strain FZB42(cid:0)however, the biosynthetic gene clusters for surfactin and fengycin showed 83% and 92%, respectively. Further comparison of gene clusters encoding fengycin and surfactin revealed that strain NCD-2 had lost the fenC and fenD genes in the fengycin biosynthetic operon. Moreover, biosynthetic enzyme-related gene srfAB for surfactin had divided into two parts. Bioinformatics analysis predicted that FenE function in strain NCD-2 was same to that of FenE and FenC in strain FZB42, and FenA function in strain NCD-2 was same to that of FenA and FenD in strain FZB42. Five kinds of fengycin, with 26 homologs, and surfactin, with 4 homologs, were detected from strain NCD-2. To the best of our knowledge, this is the rst report of a non-typical and unique gene cluster related to fengycin synthesis. Conclusions It was found that there were many gene clusters encoding antimicrobial compounds in the genome of strain NCD-2, and the fengycin synthetic gene cluster might be unique by using genome mining and UHPLC–QTOF–MS/MS. The production of fengycin, surfactin, bacillaene and bacillibactin might explain the biological activities of strain NCD-2. domain; T domain: thiolation domain; Te: thioesterase domain; E domain: epimerization domain; N90: the minimum contig length to cover 90 percent of the genome; PDA: potato dextrose agar; BGC: biosynthetic gene cluster; FPLC: Fast protein liquid chromatography; m/z: mass-to-charge ratio; TFA: triuoroacetic acid.β-OHFA: β-hydroxy-fatty acid.

The traditional method of screening for new active products is based on testing for biological activity. However, this method is time-consuming and the same products have been repeatedly discovered [22]. Thus, the discovery of natural products had encountered a bottleneck [23], and the development of a more rapid and effective screening strategy to detect new secondary metabolites was necessary [24,25]. Genome mining is a technology that uses modern bioinformatics to recognize speci c functional genes or gene clusters from genome sequences [26]. With the rapid development of gene sequencing technology and the decreasing cost of genome sequencing, increasing numbers of microbial genome sequences have been determined [27]. Therefore, genome mining has become a more accurate and e cient screening strategy for discovering new metabolites [26].
B. subtilis strain NCD-2 is a promising biological control agent against plant soil-borne diseases that produces lipopeptides, fengycin, and surfactin [28]. Fengycin has an antifungal activity, and surfactin facilitates the root colonization ability of strain NCD-2. Both fengycin and surfactin play important roles in strain NCD-2's ability to suppress plant soil-borne diseases [29]. The purpose of this study was to identify potential secondary metabolites in strain NCD-2 using genome mining. Then, bioinformatics analysis was conducted to reveal the differences between gene clusters for these secondary metabolites in strain NCD-2 and reference strain B. velezensis FZB42. Finally, ultra-high-performance liquid chromatography coupled to quadrupole time-of-ight tandem mass spectrometry (UHPLC-QTOF-MS/MS) was used to identify the potential secondary metabolites produced by strain NCD-2.

Results
Genomic features of strain NCD-2 A total of 501,671,500 paired-end reads and 5,016,715 clean single reads (412-bp library; paired-ends of 75 bp) were assembled using the software Velvet [30]. The genome of B. subtilis NCD-2 contained 189 contigs (>133 bp; N90, 16,187) of 4,644,322 bp, with an average G+C content of 43.5%. The nal assembled genome comprised 4,444 genes, including 4,329 protein-coding genes (418 signal peptide-coding genes), 83 tRNA genes for all 20 amino acids, 30 rRNA genes, and 2 CRISPR repeat genes. A total of nine putative gene clusters responsible for antimicrobial metabolite biosynthesis were identi ed. These gene clusters included PKS and NRPS genes (Fig. 1).
The taxonomic status of strain NCD-2 At present, 272 B. subtilis genome sequences were deposited in the GenBank database, including 113 whole-and 159 incomplete genome sequences. The genome sizes of the 272 B. subtilis strains ranged from 2.68 Mb to 5. 35 Mb, and the GC contents ranged from 42.9% to 46.6%. These genome sequences were downloaded from the GenBank database, and their accession numbers were listed (Additional le 1, Table S1). To analyze the evolution of different B. subtilis strains, a phylogenetic tree was constructed based on the complete genome sequences. The 272 strains of B. subtilis were divided into four subspecies, subtilis, inaquosorum, spizizenii, and stercoris because of producing different bioactive secondary metabolites [31]. As shown in Fig. 2, strain NCD-2 (represented by the black bar) clustered together with B. subtilis strain UD1022 and was closely related to B. subtilis strains XF-1, BAB-1, HJ5, SX01705, and BSD-2.
Secondary metabolite biosynthetic gene clusters in strain NCD-2 The secondary metabolite biosynthetic gene clusters in the genome of strain NCD-2 were predicted using the online website antiSMASH [32]. In total, nine secondary metabolic gene clusters were identi ed in the NCD-2 genome sequences (Table 1), including three NRPS, two terpenes, one heterozygous NRPS-TransAT PKS-Other KS, one type III polyketide, one sactipeptidehead to tail gene cluster, and a gene cluster with an unknown function. The structural compositions of the gene clusters were shown in Fig. 3. These clusters were composed of core biosynthetic, additional biosynthetic, transport-related, regulatory, and Loading [MathJax]/jax/output/CommonHTML/jax.js other genes. Among these nine gene clusters, clusters 3, 7, 8, and 9 had 100% amino acid sequence homology with known gene clusters that synthesize bacillaene, bacillibactin, subtilosin, and bacilysin, respectively (Table 1). Gene cluster 1 showed 82% amino acid similarity with a surfactin synthetase gene cluster, and gene cluster 4 showed 93% amino acid similarity with a fengycin biosynthetic gene cluster in B. velezensis strain FZB42. However, gene clusters 2, 5, and 6 did not match any known gene clusters. Clusters 1 and 4 of strain NCD-2 were further compared with those of the model strain 168 and B. subtilis strains closely related phylogenetically to strain NCD-2. The fengycin potentially being coded by biosynthetic gene cluster of strain NCD-2 contained three genes, fenEAB, while the other strains contained ve genes, fenCDEAB (Additional le 1, Fig. S1). SrfAB of surfactin was synthesized by the typical transcription and translation of srfAB in the 11 strains. However, the same SrfAB was potentially assembled with Gms0366 and Gms0367 and then transcribed and translated by gms0366 and gms0367 separately in strain NCD-2 (Additional le 1, Fig. S2). Therefore, we hypothesized that the structures and functions of fengycin and surfactin from strain NCD-2 may be different from those of the other B. subtilis strains.
Speci city of surfactin and fengycin synthetase gene clusters in B. subtilis NCD-2 The surfactin biosynthetic gene cluster in strain NCD-2 was analyzed using PRISM, and the core genes were selected for a PKS/NRPS analysis. This gene cluster contained four genes: gms0365, gms0366, gms0367, and gms0368. Gms0365 showed an identical conserved structural and functional domain, CATCATCATe, with SrfAA in strain FZB42, in which C, A, T, and Te represent the condensation, adenylation, thiolation, and thioesterase domains, respectively (Fig. 4a). Compared with SrfAB in strain FZB42, Gms0366 in strain NCD-2 had lost the T and E domains, but the amino acid residues for the binding pockets of Gms0366 were exactly the same as those of SrfAB. The residues of the different adenylation domains A6 and A2 from the enzymes Gms0365 and Gms0366, respectively, were exactly the same, and both bound the amino acid leucine. Gms0367 had only T and E domains, with no speci c substrate-binding domain. The superposition of Gms0367 and Gms0366 domains formed a complete SrfAB. The T domain was reversed between Gms0367 and Gms0368. The domains of Gms0368 were CATe, in which the thioesterase domain released linear peptide chains. The domains of Gms0368 were exactly the same as those of SrfAC, but the amino acid residues forming the binding pockets were not completely conserved. The residue sequence was DAF-LGCV, compared with DAFXLGCV of strain FZB42, revealing a difference of one residue.
The fengycin biosynthetic gene cluster was analyzed by PRISM, and the core genes were selected for a PKS/NRPS analysis.
This cluster contained ve genes in strain FZB42's genome, they were ordered as fenCDEAB (Fig. 4b). However, according to Fig.4b, the fengycin biosynthetic gene cluster in strain NCD-2 contained only three genes: gms1961, gms1960, and gms1959. Gms1961 of strain NCD-2 corresponded to FenE in strain FZB42 had conserved residues of A8 and A9, which bound two amino acids Glu and Val, respectively (Fig. 4b). Gms1960 and Gms1959 in strain NCD-2 had conserved amino acids sequences related to FenA and FenB in strain FZB42, respectively. Interestingly, no homologs of FenC and FenD were identi ed in the genome of strain NCD-2. Consequently, the amino acid sequences of FenC and FenD from strain FZB42 were compared with the strain NCD-2 proteome using BioEdit. Gms1961 was most similar to FenC, and Gms1960 was most similar to FenD To further con rm the unique structure of fengycin synthetase gene cluster in strain NCD-2, a pair of primers that binding the fenE and dacC were designed, the binding sites were identical between strain NCD-2 and FZB42 (Fig. 5a). With the primers set, a 4791 bp fragment was successfully ampli ed from strain NCD-2, but failed to amplify target the fragment from strain FZB42 due to the larger target fragment (20290 bp) in it (Fig. 5b). The amplicon from strain NCD-2 was puri ed and ligased to pEASY-Blunt Zero vector (Fig. 5c), and then was sequenced. The sequences alignment con rmed that fenC and fenD were de cient in strain NCD-2 (Fig.5d). The role of gms1961 in the fengycin production was also tested. Strain NCD-2 could produce abundant fengycin, however, the in-frame deletion of gms1961 in strain NCD-2 completely lost the fengycin production ( Fig. 6a-c).
To further investigate whether the structure of the fengycin synthetase gene cluster in NCD-2 was strain speci c, the fengycin biosynthetic gene clusters from 11 different B. subtilis strains that were closely related to strain NCD-2 or are model strains were compared (Additional le 1, Fig. S1). The gene cluster sequences of all 11 strains were fenCDEAB (also ppsABCDE), and only that of strain NCD-2 was fenEAB. Therefore, the fengycin biosynthetic gene cluster of strain NCD-2 is unique.

MS/MS of fengycin and surfactin in NCD-2
Fengycin was separated from the lipopeptide extract of strain NCD-2 using Fast protein liquid chromatography (FPLC) Detection of other antimicrobial active compounds in NCD-2 Except for the fengycin and surfactin, bacillaene, bacilysin, bacillibactin and subtilosin were also predicted from the genome of strain NCD-2. The four predicted antimicrobial active compounds were extracted from the fermentation broth of strain NCD-2 by using different extracting methods, respectively. However, only bacillaene and bacillibactin were detected from the extracts by UHPLC-QTOF-MS (Fig. 8a, 8b).

Discussion
Species of B. subtilis have the potential to produce two dozen antimicrobial substances, and 5%-8% of the B. subtilis genome contributes to the production of antimicrobial substances [33]. Some inhibit the growth of pathogens and the germination of spores. The lipopeptide mixture of B. subtilis C232 inhibits the formation of Verticillium dahliae microsclerotia [34], and the volatile compounds secreted by B. subtilis JA inhibit the conidial formation and mycelial growth of Glomus etunicatum [35].
However, certain bioactive compounds are synthesized only under special conditions or as the result of external stimulation; therefore, it is di cult to obtain all the antimicrobial compounds produced by Bacillus using traditional cultivation and extraction methods, and this limited the comprehensive understanding of the mechanisms of biological control and biocontrol bacteria [22]. Genome mining allows the prediction of metabolites based on genome sequences and is widely used in obtaining new antibiotics [26]. It was used to identify a new NRPS pathway product, coelichelin, in Streptomyces coelicolor [36]. Pseudomycoicidin in Bacillus pseudomycoides DSM 12442 was discovered through the heterologous expression of its BGC in Escherichia coli [37]. Traditional cultivation and extraction methods were used to identify lipopeptide, fengycin, and surfactin from B. subtilis NCD-2, and fengycin showed strong antifungal abilities against V. dahliae and B. cinerea. In this study, genome mining was conducted to analyze the potential antimicrobial compounds of the strain NCD-2, and some of them were identi ed using MS/MS. In total, nine kinds of secondary metabolite gene clusters related to surfactin, bacillaene, fengycin, bacillibactin, subtilosin, bacilysin, two terpenes, and one unknown product were identi ed from the genome of strain NCD-2. Surfactin Loading [MathJax]/jax/output/CommonHTML/jax.js exhibited antibacterial, antiviral, antitumor and hemolytic action [38]. Bacillaene was active compound which could inhibit growth of bacteria by inhibiting prokaryotic protein synthesis [39]. Fengycin showed speci c antifungal activity against lamentous fungi [40]. Bacillibactin was siderophore that could uptake iron especailly when iron was scarce, B. subtilis expressed genes involved in the synthesis for bacillibactin to pirate other microbial siderophores. [41]. Subtilosin possessed antibacterial activity against a diverse range of bacteria [42]. Bacilysin was active compound which showed antibacterial against a wide range of bacteria and Candida albicans [43]. They showed antimicrobial abilities and played different roles in suppressing plant diseases. However, only the fengycin, surfactin bacillaene and bacillibactin were successfully detected from the extract of strain NCD-2 by UHPLC-MS/MS (Fig. 7, 8). The bacilysin and subtilosin could not be detected, and it maybe caused by low expression level of biosynthetic gene clusters under the experimental conditions. B. velezensis FZB42 is a model strain of plant bene cial rhizobacteria. 13 gene clusters involved in non-ribosomal and ribosomal synthesis of secondary metabolites with putative antimicrobial action have been identi ed within the genome of strain FZB42, including fengycin. The mechanism of fengycin synthesis has been well studied in B. velezensis strain FZB42 [48]. B. subtilis 168 has the entire gene cluster for synthesizing fengycin, but it couldn't produce fengycin because of de cient of a native sfp gene [49]. The BGC repository MIBiG (Minimum Information about a Biosynthetic Gene cluster) just has one fengycin biosynthetic gene cluster from B. velezensis FZB42 [50,51]. Therefore, the fengycin biosynthetic gene cluster of strain NCD-2 was compared with that of B. velezensis FZB42. Fengycin comprises a peptide ring circled by 10 amino acids with a fatty acid chain tail. The fengycin biosynthetic gene cluster in the strain consists of ve genes (38 kb) that encode the synthetases FenCDEAB, of which FenC recognizes and carries glutamate and ornithine, FenD recognizes and carries tyrosine and threonine, FenE recognizes and carries glutamate and valine, FenA recognizes and carries proline, glutamine, and tyrosine, and FenB recognizes and carries isoleucine. FenCDEAB recognizes 10 amino acids and carries them to the β-OH FA chain to form fengycin [52][53][54]. However, NCD-2 only had fenEAB, lacking fenC and fenD, compared with the typical cluster structure of fenCDEAB in the FZB42 strain and 10 other Bacillus strains ( Fig. 4b; Additional File 1, Fig. S1). To exclude the errors introduced by genome sequencing or assemly, the fragment between fenE and dacC was cloned and sequenced, it was con rmed that fenC and fenD were lost in strain NCD-2 ( Fig.5a-d). To identify the enzymes FenC and FenD in the NCD-2 genome, their amino acid sequences from FZB42 were selected to screen for homologs by scanning the local NCD-2 proteome using BioEdit. The Gms1961 protein in the NCD-2 strain had the greatest similarity to FenC at an amino acid sequence level (Additional File 1, Table S2). The Gms1961 protein contained 2,550 amino acids, and the molecular weight was 287.50 kDa. The substrate bound by the adenylation domain of the Gms1961 protein was predicted (Additional File 1, Table S4). The adenylation A9 domain bound valine and N5-hydroxyornithine, with the latter being a transitional form of ornithine combined with the adenylation domain [55]. The UHPLC-QTOF MS/MS of the fengycins revealed that all the structures possessed the amino acid ornithine at position 2 ( Fig. 7a-e), indicating that there was a protein that transports ornithine in the NCD-2 strain. Thus, it was hypothesized that Gms1961 functions as FenC and FenE. The same analysis was performed using the Gms1960 protein and it had the greatest similarity with FenD (Additional File 1, Table S3); however, the FenD domains in Gms1960 and FZB42 varied greatly. Therefore, it was hypothesized that Gms1960 or other enzymes may have function similar to those of FenD.
Although the fengycin biosynthetic gene cluster in the NCD-2 strain lacked two important genes-fenC and fenDcompared with the reported fengycin biosynthetic gene cluster, the NCD-2 strain was capable of producing 26 homologs of 5 kinds of fengycins. The amino acids at position 6 and 10 of the fengycin cyclic peptide ring determine the type of fengycin. There are currently ve types of reported fengycins, A, B, A2, B2, and C (Additional File 1, Fig. S4). When the amino acid at position 6 was valine and at position 10 was isoleucine or valine, then fengycin B or fengycin B2, respectively, was produced (Fig. 7a, b) and (Additional File 1, Fig. S4); however, if the amino acid at position 6 was alanine, then fengycin A or fengycin A2, respectively, was produced (Fig. 7c, d) and (Additional File 1, Fig. S4). When the amino acid at position 6 was isoleucine or leucine and at position 10 was valine, then fengycin C was produced (Fig. 7e) and (Additional File 1, Fig. S4). The MS analysis of the fengycins in the NCD-2 strain revealed that the strain was capable of producing these ve kinds of fengycins. Based on differences in the number of carbon atoms in the β-OH FA, fengycin had different homologs, and the molecular weight of each homologs differed by 14 (-CH2) [56]. The molecular structure of the lipopeptide determines its biological activity, and longchain fatty acids increase the hydrophobic activities of lipopeptides, making them more likely to have membrane-bound antimicrobial effects [57]. A B. circulans strain produces four fengycin homologs, but only fengycins with C16 and C17 carbon atoms in their β-OH FA chains had antibacterial activities [58]. The NCD-2 strain produced 14 fengycin homologs having more than 16 carbon atoms, and they accounted for a large proportion of all the homologs. It was speculated that these long-chain fengycins played important roles in the antimicrobial functions of NCD-2. The B. siamensis SCSIO 05746 strain produces a great number of fengycin homologs, including 19 homologs of fengycin B [59]. Using MS/MS analysis, the ve fengycins produced by the NCD-2 strain were divided into 26 homologs (Fig. 7a-e) and (Additional File 1, Fig. S5-S9). Therefore, NCD-2 is currently the strain with the largest number of known fengycin homologs [60].
During the microbial synthesis of secondary metabolites, such as lipopeptide, the relatively high energy-consuming process of protein synthesis takes priority [61]. Excessive energy consumption is not conducive to the normal growth of microbes, and, generally, microbes produce antibiotics in large amounts only when encountering pathogens or other stresses [62]. We hypothesized that the key biosynthetic genes fenEAB involved in synthesizing fengycin were conserved, while two important biosynthetic genes fenCD were lost in the long-term evolution of NCD-2. However, ve fengycins were still produced. Gms1961 might played the dual roles of FenC and FenE, indicating that NCD-2's fengycin biosynthetic process was unique to the strain, and was more energy-e cient than the process used in the other strains..

Conclusions
In this study, genome mining and UHPLC-QTOF-MS/MS were performed. It was found that there were many gene clusters encoding antimicrobial compounds in the genome of the NCD-2 strain and that the fengycin biosynthetic gene cluster might be unique. The results indicated that the NCD-2 strain might have a unique mechanism for synthesizing fengycin. Using bioinformatics and biochemistry to analyze the new mechanism of fengycin synthesis may provide a new theory for the synthesis of antimicrobial compounds through the NRPS pathway. 30 ℃ and 180 rpm. Phytopathogen Botrytis cinerea BC-10 was used for antifungal activity test following the method described by Guo et al [29] with some modi cations. Brie y, a 6-mm diameter disc of B. cinerea was placed in the center of a 9-cm potato dextrose agar (PDA) plate, and the plates were inoculated with B. subtilis NCD-2 using a sterilized toothpick 2 cm from the center. Finally, the diameter of the inhibition zone was measured after a 3-d incubation at 25℃.

Genome sequencing of strain NCD-2
The Illumina Solexa platform was used for the whole-genome sequencing following the method described by Karim [67] with some modi cations. The quality of reads was checked using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) [68], paired-end reads were trimmed using Sickle (https://github.com/najoshi/sickle), and were assembled using the software Velvet [30]. QUAST 5.02 was used to assess the quality of contigs and scaffolds [69]. The assembled scaffolds were annotated using Prokka (version v.1.13) [70]. The annotation of strain NCD-2 genome was performed using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (http://www.ncbi.nim.nih.gov/genome/annotation_prok/) utilizing GeneMark, Glimmer, and tRNAscan-SE tools [71], and the functional annotation was carried out using the Rapid Annotations by subsystems Technology (RAST) server with the seed database [72]. Finally, the genome of strain NCD-2 was deposited in the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/), and the GenBank accession number is CP023755.

Evolutionary analysis, signal peptide and CRISPR repeat detection
The whole-genome sequences of B. subtilis and closely related species were downloaded from the NCBI database, and the REALPHY website (http://realphy.unibas.ch) [73] was used for genome-wide comparisons with default parameters. A Loading [MathJax]/jax/output/CommonHTML/jax.js phylogenetic analysis was conducted using MEGA5 [74] with the Maximum Composite Likelihood parameter model [75]. A phylogenetic tree was constructed using the Neighbor-joining algorithm method with bootstrap values based on 1,000 replications. The signal peptide was predicted using the SignalP-5.0 website (www.cbs.dtu.dk/services/SignalP-5.0/) [76].

Separation of lipopeptides by FPLC
Lipopeptides were extracted using the method described by Guo et al [29]. Brie y, strain NCD-2 or derived strains were cultured to obtain crude lipopeptides. The crude lipopeptides were separated and puri ed using an AKTA Puri er (GE Healthcare, Uppsala, Sweden) with the SOURCE 5RPC ST 4.6/150 column as described previously [82]. The lipopeptides were eluted by solvent A [2% acetonitrile containing 0.065% tri uoroacetic acid (TFA) (V/V)] and solvent B [80% acetonitrile containing 0.05% TFA (V/V)] using a linear gradient of 0%-100% acetonitrile over 57 min at a ow rate of 1 mL/min. The detection wavelength was 215 nm. All the main peaks were collected by FPLC automatically. Finally, each peak was concentrated using a rotary evaporator and was analyzed using UHPLC-QTOF-MS/MS.

UHPLC-QTOF-MS/MS
The UHPLC-QTOF-MS/MS analysis was conducted on a hybrid quadrupole time-of-ight tandem mass spectrometer (AB SCIEX TripleTOF 5600 Q-TOF/MS, Foster City, CA, USA) with an HPLC (Shimadzu, Kyoto, Japan) that was equipped with LC-30AD binary pumps, a SIL-30AC autosampler, and a CTO-30AC column oven. A C18 reversed phase LC column (Shim-pack GIST 2-μm particles, 2.1 mm×100 mm) was used for separation. The mobile phases A and B were water and acetonitrile with 0.1% formic acid, respectively, in both phases with an optimized linear gradient eluting procedure, as follows: 0.0-0.5 min, 30% B; 0.5-50 min, 60% B; 50-52 min, 95% B; 52-55 min, 95% B; 55-55.1 min, 30% B; 55.1-60 min, 30% B. The injection volume was 20 μL with a ow rate of 0.30 mL/min. The column oven was set at 40℃. The MS analysis was performed using a 5600 TripleTOF system equipped with a DuoSpray TM Ion Source, and the data were processed using Analyst TF 1.7 software (Applied Biosystems Sciex, Toronto, ON, Canada). PeakView TM software 2.0 (Applied Biosystems Sciex, Toronto, ON, Canada) was used for investigating and interpreting mass spectral data with special tools for processing accurate mass data and structural elucidation. The DuoSpray TM ion source was used in positive ion mode. The instrumental parameters were set as Loading [MathJax]/jax/output/CommonHTML/jax.js follows: ion spray voltage oating, 5,000 V; nebulizing gas, 50 psi; heater gas, 50 psi; curtain gas, 35 psi; temperature, 350℃; declustering potential (in TOF MS experiments, 100 V; and collision energy, 10.0 V. During the TOF-MS/MS declustering potential, the collision energy spread was between 100 V and 5 V, with rolling collision energy. The MS was operated in fullscan TOF-MS (m/z 200-2,000) and MS/MS (m/z 50-1,600) modes using Information Dependent Acquisition for a single run analysis.

Detection of bacillaene, bacilysin, bacillibactin and subtilosin
For bacillaene, strain NCD-2 was cultured in 100 mL Landy broth at 30℃ for 72 h with shaking at 180 rpm, and the bacillaene was extracted by methanol using the method described by Reddick et al [83]. For bacilysin, strain NCD-2 was cultured in 100 mL PA medium at 30℃ for 72 h with shaking at 180 rpm, and the bacilysin was extracted by ice-cold ethanol as described by Wu et al [64]. For bacillibactin, strain NCD-2 was cultured in 100 mL MSA medium at 30℃ for 72 h, and the bacillibactin was extracted by ethanol as described by Li et al [65]. For subtilosin, strain NCD-2 was cultured in 100 mL TSB medium at 30℃ for 72 h, and the subtilosin was extracted by precipitation with 65% ammonium sulphate as described by Charles et al [66]. The extracts were detected by UHPLC-QTOF-MS/MS as described as above.

Declarations
Availability of data and materials The datasets used and analysed during the current study are available from the corresponding author on reasonable request.    Surfactin of fatty acid with the chain length varying from C11-C15 are identi ed based on key product ions. Table S1. All the B. subtilis strain with the assembly level of chromosome and their RefSeq assembly accession.   Figure 1 Circular genome of strain NCD-2 with speci c features. The circular genome map was created using Circos v0.66 with COG (Cluster of Orthologous Groups of proteins) function annotation. From outside to inside, circle 1, the size of complete genome; circle 2 and 3, the predicted protein-coding genes on the + and -strands, different colors represent different COG function classi cation; circle 4, tRNA (green) and rRNA (red); circle 5, G + C content, peaks out/inside the circle indicate above or below average GC content, respectively; the inner circle, G + C skew, with G% < C% in purple, with G% > C% in blue. The potato dextrose agar plate inside the representation of the circular genome showed the antifungal activity of strain NCD-2 and its derived strain against Botrytis cinerea. The black bars outside the circular genome indicate the secondary metabolite biosynthetic gene clusters.

Figures
Loading [MathJax]/jax/output/CommonHTML/jax.js Figure 2 Phylogenetic tree of 113 B. subtilis strains based on whole genome alignments. The position of strain NCD-2 in the phylogenetic tree was indicated by a black bar. Single Nucleotide Polymorphisms (SNPs) and short insertions or deletions (indels) within the multiple sequence alignments constructed by REALPHY pipeline were extracted for subsequent phylogeny reconstruction. The phylogenetic tree was constructed using MEGA 5.0 by the Neighbor-joining method, with a bootstrap of 1,000 replications. Bootstrap con dence levels > 50% are indicated at the internodes.
Loading [MathJax]/jax/output/CommonHTML/jax.js Figure 3 Schematic diagram of nine secondary metabolite biosynthetic gene clusters in B. subtilis strain NCD-2. antiSMASH was used to predict potential secondary metabolite biosynthetic gene clusters. Different color blocks represent genes with different functions; the genes marked with dark red, light red, blue, green, and gray are core biosynthetic, additional biosynthetic, transport-related, regulatory, other genes, respectively.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.