Complete genome sequences of Streptomyces spp. isolated from disease-suppressive soils
BMC Genomics volume 20, Article number: 994 (2019)
Bacteria within the genus Streptomyces remain a major source of new natural product discovery and as soil inoculants in agriculture where they promote plant growth and protect from disease. Recently, Streptomyces spp. have been implicated as important members of naturally disease-suppressive soils. To shine more light on the ecology and evolution of disease-suppressive microbial communities, we have sequenced the genome of three Streptomyces strains isolated from disease-suppressive soils and compared them to previously sequenced isolates. Strains selected for sequencing had previously showed strong phenotypes in competition or signaling assays.
Here we present the de novo sequencing of three strains of the genus Streptomyces isolated from disease-suppressive soils to produce high-quality complete genomes. Streptomyces sp. GS93–23, Streptomyces sp. 3211–3, and Streptomyces sp. S3–4 were found to have linear chromosomes of 8.24 Mb, 8.23 Mb, and greater than 7.5 Mb, respectively. In addition, two of the strains were found to have large, linear plasmids. Each strain harbors between 26 and 38 natural product biosynthetic gene clusters, on par with previously sequenced Streptomyces spp. We compared these newly sequenced genomes with those of previously sequenced organisms. We see substantial natural product biosynthetic diversity between closely related strains, with the gain/loss of episomal DNA elements being a primary driver of genome evolution.
Long read sequencing data facilitates large contig assembly for high-GC Streptomyces genomes. While the sample number is too small for a definitive conclusion, we do not see evidence that disease suppressive soil isolates are particularly privileged in terms of numbers of biosynthetic gene clusters. The strong sequence similarity between GS93–23 and previously isolated Streptomyces lydicus suggests that species recruitment may contribute to the evolution of disease-suppressive microbial communities.
Roughly one third of pre-harvest crops are lost each year worldwide due to agricultural pests and disease . Ninety percent of the 2000 major diseases of the 31 principle crops in the US are caused by soil-borne pathogens [2, 3], and soil microbial communities can have a protective effect . Crops are particularly susceptible to disease during their establishment period and when introduced into a new geographic location [5, 6]. With the predicted changes in agricultural land use that will accompany climate change or a shift towards crops that support biofuel production, it is important to develop innovative approaches to combat crop losses to disease.
Natural and agricultural disease-suppressive soils (DSSs) have been identified that provide long-lasting and stable protection against numerous bacterial and fungal pathogens . In addition to preventing crop loss, DSSs can lower the cost of production by removing the need for pesticide application. They have been reported against many major crop pathogens, including wheat take-all disease, potato scab, and wilt on melon [8,9,10,11,12]. Disease-suppression is correlated with increased antagonistic or competitive capacities in one or more isolates from the soil microbial community, and this behavior can emerge in a soil following long-term monoculture [7, 13,14,15,16]. However, long-term monoculture is not an attractive management strategy to create DSSs, as it generally takes a decade or more for DSSs to emerge and there would be increased plant losses in the short-term. A better understanding of the composition and ecology of DSSs will facilitate engineering soil communities for crop protection.
Recent investigations into the mechanisms of disease suppression, including metagenomic analyses of DSSs [7, 17] and phenotypic characterization of microbial isolates [18, 19], point to the importance of natural product biosynthesis within a few privileged microbial taxa. Not only are known natural product producers, Actinomycetes and Pseudomonads, enriched in DSS samples, but interruption of natural product biosynthesis genes interferes with disease-suppression . Further, ecological models that describe the emergence and maintenance of DSSs propose a link between plant biodiversity and the evolution of DSSs. In soils supporting diverse plant species, root exudates and decomposing biomass supply diverse nutrients to soil microbes, which can evolve to co-exist via niche-differentiation. However, in long-term mono-species plant plots, the abundant but non-diverse plant nutrients create a competitive soil environment that favors the evolution of antagonism through antibiosis .
Because the metagenomics, phenotypic, and theoretical work all point to the importance of natural products in the formation and maintenance of DSSs, we have sought to better understand natural product biosynthesis in these communities. The observation that isolates from DSSs are more likely to produce antibiotics that target sympatric isolates  supports several alternative hypotheses surrounding natural product biosynthesis. Highly antagonistic microbial strains should either (i) encode more natural product biosynthetic gene clusters (BGCs) in their genomes than isolates from non-suppressive soils, (ii) encode the same number but actively express a greater percentage of their BGCs, or (iii) produce the same number of natural products, but these compounds are enriched in the biological activities that are important for the formation of DSSs. The first hypothesis is directly testable through whole genome sequencing and comparison.
Here we present the first genome sequences for Streptomyces spp. isolated from DSSs. Genomes were sequenced with both long-read PacBio and short-read Illumina technology to produce high-quality and nearly complete sequences for each strain. Bioinformatic analyses highlight the importance of natural product biosynthesis in these isolates, and comparative genomics provides insight to the evolution and ecology of DSSs.
Isolation and phenotypic characterization of strains
Each of the strains sequenced for this study were selected because (i) they were isolated from soils with measurable disease-suppressive characteristics, and (ii) they displayed strong phenotypes in competition or signaling assays.
Streptomyces sp. GS93–23 was isolated from a potato scab-suppressive plot in Grand Rapids, MN using the Anderson Air Sampler isolation method [21, 22]. This strain performed the best of ~ 800 isolated strains at combating potato scab . GS93–23 also shows antifungal activity against Phytophthora medicaginis and Phytophthora sojae, two fungal pathogens of alfalfa. This activity extended to soil studies, where GS93–23 protected alfalfa, reducing the percentage of dead plants from 50 to 0% when pathogens were seeded at low density . Further, compared to no-treatment controls, GS93–23 increased plant growth and yield (forage weight per pot), suggesting direct or indirect plant growth promotion activity. Lastly, GS93–23 was found to be strongly antagonistic against other Streptomyces spp., but did not reduce nodule production by rhizobial bacteria .
Streptomyces spp. S3–4 and 3211–3 were isolated from pathogen suppressive soils located in the Cedar Creek Ecosystem Science Reserve (CCESR), an NSF long-term ecological research site . S3–4 was isolated from soil in a long-term big bluestem (Andropogon gerardii) monoculture plot and is antagonistic against sympatrically evolved soil isolates . Strain 3211–3 was isolated from a native prairie control plot at CCESR. It has a strong signaling phenotype, defined as the ability to elicit antibiotic/antifungal production in strains with which it is cultured on close spatial proximity .
PacBio sequencing and assembly of genomes
Initial genome sequencing and scaffold assembly was performed on a Pacific Biosciences (PacBio) RS single molecule sequencer (October 2014). Genomic DNA was size-selected using Blue-Pippen 20 kb and sequenced in three SMRTcells each. The first two SMRTcells for each genome were run using P4 chemistry, and third SMRTcell was run for each genome with P6 chemistry. Initial read assembly using the PacBio HGAP2 algorithm and sequence polishing using the PacBio Resequencing algorithm produced genome sizes of and contig numbers shown in Table 1. Final coverage was >100x for each genome.
The high GC-content of Streptomyces genomes produces many homopolymer G and C stretches, which can produce errors during base-calling and genome assembly. Low-coverage Illumina sequence data was collected for error correction. Illumina sequencing was performed on a Mi-seq instrument to collect 2 × 250 base paired end reads equating to 110-fold (3211–3), 118-fold (GS93–23), or 155-fold (S3–4) coverage for each genome. Final, error-corrected genome sequences were generated by mapping Illumina short reads to PacBio-generated reference genomes using the BreSeq algorithm , and incorporating single nucleotide polymorphisms (SNPs) and short Indels using the Pilon algorithm .
Comparison of Illumina-corrected and PacBio-alone genome sequences
The short-read corrected genome sequences were compared to the PacBio-only assemblies, and 70, 295, and 335 SNP/Indels were present between the two assemblies for GS93–23, S3–4, and 3211–3, respectively. In each case, the vast majority were single base insertions in homopolymer stretches. We next sought to verify that the short-read corrected sequences were indeed a better representation of the actual genome sequence, as the two sequencing platforms are known to generate different types of errors. To determine which sequence variant was correct for each SNP/indel, translated protein sequences at each of the 295 SNP/indel loci in the S3–4 genome were compared against the NCBI GenBank non-redundant database, with the assumption that a frameshift resulting from an indel will result in a worse top blast hit for a stretch of DNA. Additional file 1: Figure S1 shows the comparison of significance score for BLASTx results of searching a fragment of DNA +/− 150 bases from the variant loci. This analysis is only expected to reveal the correct sequence variant when (i) the indel is present within a coding DNA sequence (CDS), (ii) correct protein sequences for close homologs are present in GenBank, and (iii) the 300 basepair window that is searched is sufficiently focused such that top BLAST hits align to the translated query in the region of the variant locus (i.e. at the center of the query, not the edges). We find that the Illumina-corrected sequence returns a top BLASTx hit with lower (better) E-value twice as often as the uncorrected sequence. The average E-values for the top BLASTx hit alignment are six orders of magnitude lower (better) for the short-read corrected sequences compared to the PacBio-only sequences. Because of this, we use the short-read corrected genome sequences for the remaining analyses.
General characteristics of the genome sequences
We were able to assemble the chromosome as a single large contig for strains GS93–23 (8.24 Mb) and 3211–3 (8.23 Mb), and as two large contigs for S3–4 (4.19 Mb and 3.31 Mb) (Fig. 1 and Table 1). For S3–4, the two chromosome arms can be oriented relative to one other with high confidence based on GC-skew, orientation of rRNA operons, and enrichment of specialized metabolite gene clusters at chromosome arms (Fig. 1, rings 8, 6, and 4, respectively). Manual attempts to close the gap by retrieving PacBio reads that mapped to each contig were unsuccessful. The gap is present in a locus that is especially repetitive, with 3 rRNA operons in close proximity. The overall G + C content (71–73%) and differences in G/C skew for the chromosome arms in each genome are similar to what has been reported for other genomes from this genus [29,30,31,32,33,34]. In addition to the large linear chromosomes, strains 3211–3 and S3–4 each contain two large linear plasmids (519 Kb and 240 Kb for 3211–3, 349 Kb and 203 Kb for S3–4).
Annotation of the genomes with the Prokka software tool  identified 7188 CDSs, 7 ribosomal RNA operons, and 66 tRNAs for GS93–23. Similar numbers of annotated genes were present in the S3–4 genome (7071 CDSs, 8 rRNA operons, 73 tRNAs), and slightly more in the 3211–3 genome (8087 CDSs, 7 rRNA operons, 77 tRNAs), accounting for its larger total size. Gene products were assigned to Clusters of Orthologous Groups (COGs) using the BASys platform . Functional categorization of proteins reported in Table 2 in comparison to the model organism, S. coelicolor A3 (2) were performed with EggNOG-mapper .
Annotation of natural product biosynthetic gene clusters
Because natural product biosynthesis is thought to play a mechanistic role that underpins the ecology of disease suppressive soils [17, 38], we have analyzed the genomes for their biosynthetic potential using the antiSMASH 3.0 toolkit . We conservatively assigned specific molecules to these BGCs only when the annotated gene clusters share 100% of the biosynthetic genes from previously characterized BGCs by manual comparison (Additional file Information). For ribosomally produced and post-translationally modified peptides (RiPPs), we predict the production of minor structural variants when the sequence of precursor peptides is slightly different than in characterized BGCs. The 26 high-confidence BGCs identified in the GS93–23 genome include known pathways for RiPP cyclothiazomycin , the dienoyltetramic acid streptolydigin , and the lipoglycopeptide mannopeptimycin . The 38 high-confidence BGCs in the 3211–3 genome include known pathways for the chlorinated non-ribosomal peptide tambromycin , the siderophore coelichelin , and terpenoid 2-methylisoborneol . The 28 high-confidence BGCs in the S3–4 genome include known pathways for 2-methylisoborneol, and the aminoglycoside streptothricin . In addition, all three genomes contain the highly conserved BGCs for the siderophore desferrioxamine b , terpenes geosmin  and hopene , minor structural variants of lantibiotic SapB , and osmoprotectant ectoine .
The majority of BGCs identified in these genomes remain uncharacterized. Intriguing pathways include a 178 Kb polyketide cluster on a plasmid in S3–4 that putatively encodes a 60-member macrolide, and a pyrrolopyrrole-containing metabolite in 3211–3.
Comparison to closest sequenced relatives
We compared the draft genome sequences to a collection of 500 publicly available actinomycete genomes using multi-locus sequence comparison to identify the closest sequenced relative of each (Fig. 2). S3–4 groups with the small Streptomyces katrae clade near type strain NRRL-ISP 5550 . Strain 3211–3 is in the neighboring Streptomyces virginiae clade defined by the type strain NRRL ISP-5094 . GS93–23 clusters with the Streptomyces lydicus type strain NRRL-ISP 5461 .
We identified closely related genomes in the available whole-genome sequence databases for each of our DSS isolates (Fig. 3). For each of our newly sequenced strains, a previously published genome was available with high sequence similarity in several common phylogenetic markers (16S rRNA, rpoB, and multi-locus sequencing (MLS) using ribosomal proteins) (Fig. 3a). Our closest pair of new and previously reported genomes is GS93–23 and S. lydicus NRRL ISP-5461, which share 100% identity of 16S rRNA and 99.92% identity using MLS comparison. Even our most divergent pair, S3–4 to Streptomyces sp. WM6372, shared > 98% identity at the 16S rRNA level and > 96% identity at the rpoB level, and 93.72% by four-gene MLS comparison (atpD, gyrB, rpoB, trpB).
Genome pairs were compared to determine the amount of shared sequence across the entire genome (Fig. 3b). Alignments were constructed in Mauve and alignment gaps were mapped back to the new high-quality reference genomes. Alignment gaps between of GS93–23 and ISP-5461 are uniformly distributed across the chromosome. Insertions or deletion events greater than 100 bp account for only 4.5% of the genome sequence as a whole (Fig. 3b), with a similar proportion being lost/gained in BGCs as in the rest of the genome (Fig. 3b).
The high-level of sequence conservation between GS93–23 and ISP-5461 allowed us to examine the micro-scale evolution of these genomes. There are approximately 40,000 SNPs between the two, making the sequence identity in the aligning sequences greater than 99.5%. Interestingly, the position of SNPs relative to CDSs shows a marked de-enrichment in (i) the approximate position of the Shine-Dalgarno sequence in the 5′-UTR, and (ii) the 5′ end of the CDS (Fig. 3c). This suggests a selection for maintaining relative translation rates of encoded genes, as both loci are important in determining translation initiation rates in bacteria . Most of the ~ 33,000 SNPs in CDSs encode silent mutations. Of the missense mutations, the majority are conservative in terms of amino acid chemistry (Fig. 3d). The ratio of synonymous to non-synonymous mutations (dS/dN) is 1.8, which is substantially lower than seen in housekeeping genes in E. coli and invasion genes from S. enterica [56, 57], suggesting that there has been little selective pressure against non-synonymous mutations and that these two strains belong to the same clonal complex [58, 59].
Despite the strong similarity between GS93–23 and ISP-5461, there are still substantial differences between the two strains. GS93–23 contains 98 genes that are missing in ISP-5461, and ISP-5461 contains 11 unique genes. 66/98 genes unique to GS93–23 are of unknown function. Of genes with functional annotations the largest categories specific to GS93–23 are transcriptional regulators (11/98) and metabolic enzymes (10/98). Of the genes unique to ISP-5461, only a single gene was of unknown function. The largest functional categories for genes unique to ISP-5461 also were transcriptional regulators (3/11) and metabolic enzymes (3/11).
The other two DSS genomes presented here are more divergent from the nearest sequenced relative. Both 3211–3 and S3–4 have two large plasmids that are absent in their closest relatives, S. virginiae NRRL B-1447 and S. katrae NRRL ISP-5550, respectively. These changes alone account for 9 and 7% of the total genome content, respectively. The plasmids in S3–4 are rich in secondary metabolism genes, with four large gene clusters totaling roughly 500 kb of sequence. Besides the plasmid differences, the chromosome of 3211–3 contains 285 large (> 100 bp) insertions compared to B-1447, totaling 609 kb of new sequence, and 309 large deletions totaling 758 kb of sequence lost. In the regions that do align, there are 102,000 SNPs, corresponding to an average sequence identity of 98.7% across the genome. The S3–4 genome lacks a close homolog in the sequence databases. Despite sharing 96.3% sequence identity of the rpoB gene, 26% of the S3–4 genome does not align with the WM6372 sequence.
We next compared the natural product biosynthetic potential for these three strains by analyzing their BGC content. Our closest pair, GS93–23 and ISP-5461, share 26/26 of the high-confidence BGCs and 61/64 ‘putative’ clusters (co-localized clusters of genes that belong to COGs typically found in BGCs, but which lack canonical secondary metabolism signature sequences). The next closest pair, 3211–3 and B-1447, which share 99.7% similarity of the rpoB gene, have in common only 31/38 of the high-confidence BGC annotations, which is driven mostly by the presence of two plasmids in 3211–3 missing from B-1447. Between S3–4 and WM6372 (96.3% identity of rpoB), 12/28 of high-confidence BGCs are shared, and 27/54 ‘putative’ clusters. These relationships between genetic distance and BGC overlap follow the general trend for rpoB conservation and non-ribosomal peptide synthetase (NRPS) BGC overlap described by Doroghazi et al. .
Signaling potential analysis
One possible organization for a highly antagonistic microbial community would have a keystone species that produces a signal to induce antibiotic production in many other community members. The University of Minnesota DSS strain library was assayed for signaling potential using a plate-based phenotypic assay  (Kinkel, unpublished data). Strain 3211–3 was selected for whole genome sequencing because it is among the best signalers of antibiosis in our library of DSS isolates. The signaling assay requires dilution of a signaling molecule through solid agar medium, so signaling through cell-cell contact can be ruled out as a mechanism. We looked for genomic features that could explain the signaling promiscuity in 3211–3.
Signaling between Streptomyces can be mediated by several well-known classes of hormone-like signaling molecules  including γ-butyrolactones , furans , γ-butenolides , SapB  -like RiPPs, diamino-bis(hydroxymethyl)-butanediol , and diketopiperazines . Signaling can also be mediated by sub-inhibitory concentrations of antibiotics [68,69,70]. We first looked for the presence of BGCs encoding hormone-like signaling molecules in 3211–3. There are two γ-butyrolactone BGCs in this genome and a SapB BGC, but this number is comparable to other sequenced Streptomyces. There is no evidence that 3211–3 produces an unusually diverse set of hormone-like signaling molecules.
A second possibility is that 3211–3 does not produce many diverse hormone-like signaling molecules, but the molecule they do produce can be sensed by many species of Streptomyces. There are at least fifteen unique γ-butyrolactone signals produced by the genus, and unfortunately it is not possible to predict the specific γ-butyrolactone chemical structure from sequence information alone. However, we reasoned γ-butyrolactone biosynthesis genes and receptors that produce/sense the same compound will have a higher degree of sequence similarity than those producing/sensing different compounds (i.e. functionally similar gene clusters would share greater sequence similarity), as this gene cluster does not closely correlate with other phylogeny (Fig. 4). We performed a CLUSTER-BLAST analysis with the γ-butyrolactone biosynthesis protein ScbA and the receptor AfsR against the set of sequenced Streptomyces genomes. Again, we did not see any evidence that 3211–3 produces a more widely-sensed hormone-like signaling molecule.
A third possibility is that 3211–3 is a prolific signaler due to production of sub-inhibitory concentrations of antibiotics (SICA). This genome encodes more ‘high-confidence’ BGCs than the two genomes from strongly antagonistic DSS isolates (Table 1). Among 125 complete Streptomyces genomes with antiSMASH 4.1 (Additional file 1: Table S4), the number of high-confidence BGCs in 3211–3 places it in the top 16% in terms of BGC content. Since there is no clear genomic signature that allows us to explain the signaling potential in 3211–3, teasing apart its ability to elicit antibiosis in so many diverse isolates will require future molecular genetic experiments.
Bacteria within the genus Streptomyces are ubiquitous in terrestrial soils and marine sediments and have garnered much attention for their ability to produce medicinal natural products. The past decade and a half of genome sequencing efforts [30, 60, 71] revealed that the majority of natural products encoded in the genomes of Streptomyces spp. remain undiscovered and have reinvigorated natural product discovery via genome mining [72, 73]. Most genomes deposited in public sequence databases have been sequenced using Illumina short-read technology. The large size, repetitive nature, and high G + C content of Streptomyces genomes makes them difficult to fully assemble from short reads, and so roughly 90% of the available genomes are only available in draft status; typically hundreds of contigs with an average N50 of thousands of bases. With a combination of PacBio and Illumina sequence data, we were able to assemble high-quality genome sequences where the > 8 Mb chromosome assembles as a single contig in two strains and as two contigs in the third.
We initially predicted that the increase of genome quality would correspond to an improved ability to identify BGCs that would have been broken up between many small contigs in a short-read only assembly. However, the difference in quality does not appear to effect estimations of natural product biosynthetic potential. For example, in S. lydicus ISP-5461, 26 of the 26 high-confidence BGCs found in GS93–23 were also predicted using the short-read only assembly contigs.
One advantage to generating single-contig genomes using long-read data is the ability to map the chromosomal location of BGCs. In order to help prioritize isolated Streptomyces strains for whole-genome sequencing, there have been previous attempts to correlate sequence conservation of phylogenetic markers with BGC conservation between two or more genomes . After sequencing 1000 actinomycete genomes, Metcalf et al. found that a 99% sequence identity between concatenated ribosomal protein sequences correlates with a 73 and 80% conservation of Type I polyketide synthase (PKS) and NRPS clusters, respectively . Our data supports the rapid diversification of secondary metabolite gene clusters, and suggests that this is primarily driven by changes in episomal elements, not by changes to the core genome. This information could make future sequencing campaigns more efficient by limiting sequencing efforts in closely related strains to isolated plasmids.
Bacterial genome organization has been described as mosaic [74,75,76], referring to the composition of a vertically-inherited (clonally-expanded) backbone interspersed with laterally-transferred mobile elements. Mutations accumulate in clonal complexes between bouts of periodic selection . The genomic comparison of GS93–23 and ISP-5461 suggests that these strains are part of the same clonal complex, despite being isolated 850 km apart and several decades removed. Our analysis of the SNP accumulation in relationship to relative location within genes shows a de-enrichment of sequence variation in regions known to control translation initiation rates. This points to a microevolution of genomes where there is a selection to maintain relative expression levels of genes during clonal expansion. We have previously shown that transfer of multi-gene systems between hosts from the same genus can result in wildly different relative expression levels . These likely result from the accumulation of subtle differences between the transcription/translation machinery and corresponding cis-acting regulatory elements that co-evolve during clonal expansion. Taken together, the importance of maintaining relative expression levels during microevolution and the changes between seemingly closely related species likely contributes to low success rates and low titers during heterologous introduction of BGCs to model host strains .
We sequenced the three strains presented here in hopes to gain insight towards the mechanisms and ecology that underlie DSSs. While the sample size is small, there is no indication that the increased antibiosis observed in DSS isolates compared to isolates from non-suppressive soils is due to an increased number of BGCs. Transcriptomic and chemical characterization of these and other DSS isolates is pending. With over 500 species of Streptomyces currently recognized  and roughly 800 draft Streptomyces genomes available in public databases at the time of this study, we were initially surprised by the level of sequence conservation between these strains and previously sequenced genomes. The level of divergence between GS93–23 and ISP-5461 is only ten times greater than clonally-related lab-cultivated strains of E. coli separated by only 20 years of evolution . There are a few possible explanations for this. First, species groups are not expected to be equally abundant. It is likely that the genomes already present in the public databases are those of highly abundant clonal complexes. The similarity between these genomes and extant sequences reflects the fact that no attempts were made to bias our strain selection towards rare Streptomyces. A second possibility is that the ecology of DSSs has selected for strains that are also abundant in sequenced collections. This makes sense in light of the experimental data and ecological models that suggest DSSs community members are selected for their antagonistic phenotypes . Likewise, most Streptomyces strains whose genomes are in public databases were originally isolated and maintained in collections of drug discovery groups. If this is true, it will suggest that evolution of DSS isolates occurs on the level of the genome/strain, not the individual genes, contrary to what has been observed in other environments . Strain recruitment is a proposed mechanism of the establishment of disease suppressive soils , in which plants support the maintenance of those microbial strains which inhibit phytopathogens. 16S sequencing and denaturing gel electrophoresis of the rhizosphere microbiome of strawberry plants showed that the Actinobacteria community profile was more similar between species of strawberry plant, regardless of site, when compared to oil rape rhizosphere communities . It is not unreasonable, then, to assume that under the dispersal-recruitment model, that ancestral bacterial strains that were beneficial to plant growth would be under similar selective pressures if co-evolving with the same plant species in distant locations.
In summary, we have added three high-quality whole genome sequences to the growing number of sequenced Streptomyces isolates. Each genome is rich with yet-uncharacterized natural product biosynthetic potential. While genome sequence alone was not sufficient to explain the observed phenotypes of DSS isolates, it is an important first step to future investigations of gene expression and function.
Preparation of high molecular-weight DNA
The three strains of Streptomyces sequenced for this study were obtained from a culture collection maintained by Linda Kinkel at the University of Minnesota. Single colonies are isolated on IWL-4 solid medium and used to inoculate 4 mL liquid cultures in R2YE medium. Following three days of growth, cells are harvested by centrifugation and washed with a 10% sucrose solution. Mycelia are resuspended in 450 μL TSE buffer (15% sucrose, 25 mM Tris, 25 mM EDTA, pH 8) with 5 mg/mL lysozyme and incubated at 37 °C for one hour. Cells are lysed by addition of 225 μL of 2% SDS over a 5 min room temperature incubation. Following a phenol:chloroform extraction (100 μL neutral phenol, 50 μL chloroform), supernatant is transferred to a tube containing 60 μL 3 M sodium acetate and 700 μL isopropanol to precipitate gDNA. DNA is pelleted by centrifugation and resuspended in 500 μL TE buffer (10 mM Tris, 1 mM EDTA, pH 8). To remove RNA, 10 μL RNase (10 mg/ml) is added to the sample and incubated at room temperature for at least 15 min. Next, a second phenol:chloroform extraction (300 μL neutral phenol, 150 μL chloroform) is performed followed by a final extraction with 300 μL chloroform to remove trace phenol. DNA in the supernatant is precipitated with 50 μL 3 M sodium acetate and 350 μL isopropanol and incubated on ice for 30 min. Final gDNA is resuspended in 150 μL TE buffer and quality is assessed by agarose gel electrophoresis, spectrophotometry, and PicoGreen analysis.
DNA sequencing and assembly
We performed PacBio long-read sequencing using protocols for 20 Kb insert size with BluePippin Size Selection (Saga Science). For each of the three genomic DNA samples, sequencing was performed using P4 chemistry on two SMRT cells and using P6 chemistry on an additional SMRT cell from November 2014 to January 2015. In total, subread filtering from the three SMRT cells yielded 1.26 Gb (S3–4), 1.40 Gb (GS93–23), and 1.18 Gb (3211–3) of sequence data with average read lengths of 6703 kb, 6782 kb, 6478 kb, respectively and N50 values of 9095 kb, 8819 kb, and 8680 kb, respectively.
Short-read sequencing and error correction
Illumina MiSeq sequencing was performed at the UMN Genomics center in March 2015. The three genomic DNA samples were uniquely barcoded and sequenced alongside genomes from unrelated bacteria to account for 30% of a MiSeq lane. Nextera library prep was performed using standard protocols at the University of Minnesota Genomics Center. The 250 nt paired-end reads were mapped to the PacBio-reference genome sequence using Breseq  to generate. BAM files. Single-base differences and small indels were corrected using Pilon to generate the final error-corrected genome assembly.
Annotation of genomic features
Prokka  is a command line software tool that uses Prodigal  for coding DNA sequence (CDS) annotation, RNAmmer  for ribosomal RNA annotation, Aragorn  for transfer RNA annotation, SignalP  for signal leader peptide annotation, and Infernal  for non-coding RNA annotation. Each genome was annotated with the Prokka software package using default options and the ‘--compliant’ command to force compliance with GenBank.
Assignment of putative functional categories to CDSs was performed using the BASys  web server (https://www.basys.ca/). For each CDS, start position, end position, strand information, and a unique identifier was provided in tabular format to ensure that Prokka-generated annotations would be used for clusters of orthologous genes (COG) assignment in place of the default Glimmer algorithm. The following options were selected for functional assignment by BASys: Gram positive, Linear contig, Bacterial genetic code. Functional assignments of proteins in Table 2 were performed with EggNOG-mapper . The following EggNOG-mapper settings were selected: mapping mode was set to DIAMOND , taxonomic scope was set to all bacteria, all orthologs were used, and non-electronic gene ontology evidence terms were selected.
Streptomyces genomes were obtained from PATRIC (https://www.patricbrc.org/). Nucleotide sequences for molecular phylogeny markers atpD, gyrB, recA, rpoB, and trpB were extracted. Regions for comparison were identified and concatenated head-to-tail in-frame [90, 91]. Multi-sequence alignment of concatenations, and maximum-likelihood tree construction was performed in MEGA7 . For the S3–4 subtree phylogeny the recA sequence was not available for WM6372 and a four-gene concatenation was used.
Availability of data and materials
The genome sequences reported here are available in GenBank under the accession numbers NZ_CP020042 for Streptomyces sp. S3–4, NZ_CP020039 for Streptomyces sp. 3211–3, NZ_CP019457 for Streptomyces sp. GS93–23.93 For the NCBI submitted S3–4 genome, the two large chromosomal contigs were joined together by 100 ambiguous bases. The second half of the chromosome starts at 41915460 bp.
biosynthetic gene cluster
basic local alignment search tool
Cedar Creek Ecosystem Science Reserve
clusters of orthologous groups
genomic deoxyribonucleid acid
Hierarchical Genome Assembly Process
National Center for Biotechnology Information
nonribosomal peptide synthetase
Northern Regional Research Laboratory
ribosomally synthesized and post-translationally modified peptides
ribosomal ribonucleic acid
subinhibitory concentrations of antibiotics
single molecule, real-time
single nucleotide polymorphism
transfer ribonucleic acid
University of Minnesota
Oerke E-C. Crop losses to pests. J Agric Sci. 2006;144:31.
Lewis JA, Papavizas GC. Biocontrol of plant diseases: the approach for tomorrow. Crop Prot. 1991;10:95–105.
Wilson, C. Roots: Miracles Below. (Doubleday & Co., 1968).
Schroth MN, Hancock JG. Disease-suppressive soil and root-colonizing bacteria. Science. 1982;216:1376–81.
Finch-Savage WE, Bassel GW. Seed vigour and crop establishment: extending performance beyond adaptation. J Exp Bot. 2016;67:567–91.
Papaïx J, Burdon JJ, Zhan J, Thrall PH. Crop pathogen emergence and evolution in agro-ecological landscapes. Evol Appl. 2015;8:385–402.
Kinkel LL, Bakker MG, Schlatter DC. A coevolutionary framework for managing disease-suppressive soils. Annu Rev Phytopathol. 2011;49:47–67.
Landa BB, Mavrodi DM, Thomashow LS, Weller DM. Interactions between strains of 2,4-Diacetylphloroglucinol-producing Pseudomonas fluorescens in the Rhizosphere of wheat. Phytopathology. 2003;93:982–94.
Alabouvette C, Lemanceau P, Steinberg C. Recent advances in the biological control of fusarium wilts. Pestic Sci. 1993;37:365–73.
Menzies JD. Occurence and transfer of a biological factor in soil that suppresses potato scab. Phytopathology. 1959.
Mazzola M. Mechanisms of natural soil suppressiveness to soilborne diseases. Antonie van Leeuwenhoek. Int J Gen Mol Microbiol. 2002;81:557–64.
Murakami H, Tsushima S, Shishido Y. Soil suppressiveness to clubroot disease of Chinese cabbage caused by Plasmodiophora brassicae. Soil Biol Biochem. 2000;32:1637–42.
Weller DM, Raaijmakers JM, Gardener BBM, Thomashow LS. Microbial populations responsible for specific soil suppressiveness to plant pathogens. Annu Rev Phytopathol. 2002;40:309–48.
Mazzola M. Assessment and management of soil microbial community structure for disease suppression. Annu Rev Phytopathol. 2004;42:35–59.
De La Fuente L, Landa BB, Weller DM. Host crop affects Rhizosphere colonization and competitiveness of 2,4-Diacetylphloroglucinol-producing Pseudomonas fluorescens. Phytopathology. 2006;96:751–62.
Janvier C, et al. Soil health through soil disease suppression: which strategy from descriptors to indicators? Soil Biol Biochem. 2007;39:1–23.
Mendes R, et al. Deciphering the rhizosphere microbiome for disease-suppressive bacteria. Science. 2011;332:1097–100.
Schottel JL, Shimizu K, Kinkel LL. Relationships of in vitro pathogen inhibition and soil colonization to potato scab biocontrol by antagonistic Streptomyces spp. Biol Control. 2001;20:102–12.
Bakker MG, Otto-Hanson L, Lange AJ, Bradeen JM, Kinkel LL. Plant monocultures produce more antagonistic soil Streptomyces communities than high-diversity plant communities. Soil Biol Biochem. 2013;65:304–12.
Kinkel LL, Schlatter DC, Xiao K, Baines AD. Sympatric inhibition and niche differentiation suggest alternative coevolutionary trajectories among Streptomycetes. ISME J. 2014;8:249–56.
Paulsrud B. Characterization of antagonistic Streptomyces spp. from potato scab-suppressive soils and evaluation of their biological potential against potato and non-potato pathogens. Minnesota: University of Minnesota; 1996.
Buxton E, Kendrick J Jr. A method of isolating Pythium spp. and Fusarium oxysporum from soil. Ann Appl Biol. 1963:215–21.
Xiao K, Kinkel LL, Samac DA. Biological control of Phytophthora root rots on alfalfa and soybean with Streptomyces. Biol Control. 2002;23:285–95.
Franklin, J. F. et al. Contributions of the Long-Term Ecological Research Program. BioScience . 2008;40:509–23.
Essarioui A, LeBlanc N, Kistler HC, Kinkel LL. Plant community richness mediates inhibitory interactions and resource competition between Streptomyces and Fusarium populations in the Rhizosphere. Microb Ecol. 2017;74:157–67.
Vaz Jauri P, Kinkel LL. Nutrient overlap, genetic relatedness and spatial origin influence interaction-mediated shifts in inhibitory phenotype among Streptomyces spp. FEMS Microbiol Ecol. 2014;90:264–75.
Deatherage, D. E. & Barrick, J. E. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Meth Mol Biol. 2014;1151.
Walker, B. J. et al. Pilon : An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLOS ONE. 2014;9.
Harrison J, Studholme DJ. Recently published Streptomyces genome sequences. Microb Biotechnol. 2014;7:373–80.
Bentley SD, et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature. 2002;417:141–7.
Gomez-Escribano JP, et al. The Streptomyces leeuwenhoekii genome: de novo sequencing and assembly in single contigs of the chromosome, circular plasmid pSLE1 and linear plasmid pSLE2. BMC Genomics. 2015;16:485.
Ikeda H, et al. Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat Biotechnol. 2003;21:526–31.
Zaburannyi N, Rabyk M, Ostash B, Fedorenko V, Luzhetskyy A. Insights into naturally minimised Streptomyces albus J1074 genome. BMC Genomics. 2014;15:97.
Rückert C, et al. Complete genome sequence of Streptomyces lividans TK24. J Biotechnol. 2015;199:21–2.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.
Van Domselaar GH, et al. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 2005;33.
Huerta-cepas, J. et al. eggNOG 4 . 5 : a hierarchical orthology framework with improved functional annotations for eukaryotic , prokaryotic and viral sequences. Nucleic Acids Res. 2018;44:286–93.
Smanski MJ, Schlatter DC, Kinkel LL. Leveraging ecological theory to guide natural product discovery. J Ind Microbiol Biotechnol. 2016;43:115–28.
Weber T, et al. antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 2015;43:W237–43.
Wang J, et al. Identification and analysis of the biosynthetic gene cluster encoding the thiopeptide antibiotic cyclothiazomycin in Streptomyces hygroscopicus 10-22. Appl Environ Microbiol. 2010;76:2335–44.
Olano C, et al. Deciphering biosynthesis of the RNA polymerase inhibitor Streptolydigin and generation of glycosylated derivatives. Chem Biol. 2009;16:1031–44.
Magarvey NA, Haltli B, He M, Greenstein M, Hucul JA. Biosynthetic pathway for mannopeptimycins, lipoglycopeptide antibiotics active against drug-resistant gram-positive pathogens. Antimicrob Agents Chemother. 2006;50:2167–77.
Goering AW, et al. Metabologenomics: correlation of microbial gene clusters with metabolites drives discovery of a nonribosomal peptide with an unusual amino acid monomer. ACS Cent Sci. 2016;2:99–108.
Lautru S, Deeth RJ, Bailey LM, Challis GL. Discovery of a new peptide natural product by Streptomyces coelicolor genome mining. Nat Chem Biol. 2005;1:265–9.
Wang CM, Cane DE. Biochemistry and molecular genetics of the biosynthesis of the earthy odorant methylisoborneol in Streptomyces coelicolor. J Am Chem Soc. 2008;130:8908–9.
Maruyama C, et al. A stand-alone adenylation domain forms amide bonds in streptothricin biosynthesis. Nat Chem Biol. 2012;8:791–7.
Barona-Gómez F, Wong U, Giannakopulos AE, Derrick PJ, Challis GL. Identification of a cluster of genes that directs desferrioxamine biosynthesis in Streptomyces coelicolor M145. J Am Chem Soc. 2004;126:16282–3.
Jiang J, He X, Cane DE. Biosynthesis of the earthy odorant geosmin by a bifunctional Streptomyces coelicolor enzyme. Nat Chem Biol. 2007;3:711–5.
Siedenburg G, Jendrossek D. Squalene-hopene cyclases. Appl Environ Microbiol. 2011;77:3905–15.
Kodani S, et al. From the cover: the SapB morphogen is a lantibiotic-like peptide derived from the product of the developmental gene ramS in Streptomyces coelicolor. Proc Natl Acad Sci. 2004;101:11448–53.
Ofer N, et al. Ectoine biosynthesis in Mycobacterium smegmatis. Appl Environ Microbiol. 2012;78:7483–6.
Gupta K, Chopra I. Streptomyces katrae - a new species of Streptomyces isolated from soil. Indian J Microbiol. 1963;3:1–4.
Grundy WE, et al. Actithiazic acid. I Microbiological studies. Antibiot Chemother. 1952;2:399–408.
Deboer, C., Dietz, A., Savage, G. M. & Silver, W. S. Streptolydigin, a new antimicrobial antibiotic. I. Biologic studies of streptolydigin. Antibiot. Annu. 3, 886–892.
Salis HM, Mirsky EA, Voigt CA. Automated design of synthetic ribosome binding sites to control protein expression. Nat Biotechnol. 2009;27:946–50.
Wang FS, Whittam TS, Selander RK. Evolutionary genetics of the isocitrate dehydrogenase gene (icd) in Escherichia coli and salmonella enterica. J Bacteriol. 1997;179:6551–9.
Boyd EF, Jia LI, Ochman H, Selander RK. Comparative genetics of the inv-spa invasion gene complex of salmonella enterica. J Bacteriol. 1997;179:1985–91.
Feil EJ, et al. How clonal is Staphylococcus aureus? J Bacteriol. 2003;185:3307–16.
Feil EJ. Small change: keeping pace with microevolution. Nat Rev Microbiol. 2004;2:483–95.
Doroghazi JR, et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat Chem Biol. 2014;10:963–8.
Willey JM, Gaskell AA. Morphogenetic signaling molecules of the streptomycetes. Chem Rev. 2011;111:174–87.
Takano E. γ-Butyrolactones: Streptomyces signalling molecules regulating antibiotic production and differentiation. Curr Opin Microbiol. 2006;9:287–94.
Haneishi T, Terahara A, Hamano K, Arain M. New antibiotics, Methylenomycins A and B. J Antibiot (Tokyo). 2012;27:400–7.
Arakawa, K., Tsuda, N., Taniguchi, A. & Kinashi, H. The Butenolide Signaling Molecules SRB1 and SRB2 Induce Lankacidin and Lankamycin Production in Streptomyces rochei. ChemBioChem. 2012;13:1447–57.
Guijarro J, Santamaria R, Schauer A, Losick R. Promoter determining the timing and spatial localization of transcription of a cloned Streptomyces coelicolor gene encoding a spore-associated polypeptide. J Bacteriol. 1988;170:1895–901.
Recio E, Colinas A, Rumbero A, Aparicio JF, Martín JF. PI factor, a novel type quorum-sensing inducer elicits pimaricin production in Streptomyces natalensis. J Biol Chem. 2004;279:41586–93.
Holden MTG, et al. Quorum-sensing cross talk: isolation and chemical characterization of cyclic dipeptides from Pseudomonas aeruginosa and other gram-negative bacteria. Mol Microbiol. 2002;33:1254–66.
Romero D, Traxler MF, López D, Kolter R. Antibiotics as signal molecules. Chem Rev. 2011;111:5492–505.
Davies J. Are antibiotics naturally antibiotics? J Ind Microbiol Biotechnol. 2006;33:496–9.
Yim G, Huimi Wang H, Davies FRS, J. Antibiotics as signalling molecules. Philos Trans R Soc B Biol Sci. 2007;362:1195–200.
Cimermancic P, et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014;158:412–21.
Rutledge PJ, Challis GL. Discovery of microbial natural products by activation of silent biosynthetic gene clusters. Nat Rev Microbiol. 2015;13:509–23.
Smanski MJ, et al. Synthetic biology to access and expand nature’s chemical diversity. Nat Rev Microbiol. 2016;14:135–49.
Chiapello H, et al. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops. BMC Bioinformatics. 2005;6:171.
Sebaihia M, et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet. 2006;38:779–86.
Welch RA, et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A. 2002;99:17020–4.
Smanski MJ, et al. Expression of the platencin biosynthetic gene cluster in heterologous hosts yielding new platencin congeners. J Nat Prod. 2012;75.
Galm U, Shen B. Expression of biosynthetic gene clusters in heterologous hosts for natural product production and combinatorial biosynthesis. Expert Opin Drug Discovery. 2006;1:409–37.
Encyclopedia of Life. Streptomyces Available at: http://www.eol.org. (Accessed: 15th January 2016).
Tenaillon O, et al. Tempo and mode of genome evolution in a 50,000-generation experiment. Nature. 2016;536:165–70.
Shapiro BJ, Timberlake SC, Szabó G, Polz MF, Alm EJ. Population Genomics of Early Differentiation of Bacteria. Science. 2012;336:48–51.
Cook RJ, et al. Molecular mechanisms of defense by rhizobacteria against root disease. Proc Natl Acad Sci U S A. 1995;92:4197–201.
Costa R, et al. Effects of site and plant species on rhizosphere community structure as revealed by molecular analysis of microbial guilds. FEMS Microbiol Ecol. 2006;56:236–49.
Hyatt D, et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119.
Lagesen K, et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35:3100–8.
Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 2004;32:11–6.
Dyrløv Bendtsen J, Nielsen H, von Heijne G, Brunak S. Improved Prediction of Signal Peptides: SignalP 3.0. J Mol Biol. 2004;340:783–95.
Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–5.
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60.
Labeda DP, Doroghazi JR, Ju KS, Metcalf WW. Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis and proposals to emend the description of Streptomyces albus and describe Streptomyces pathocidini sp nov. Int J Syst Evol Microbiol. 2014;64:894–900.
Guo Y, Zheng W, Rong X, Huang Y. A multilocus phylogeny of the Streptomyces griseus 16S rRNA gene clade: use of multilocus sequence analysis for streptomycete systematics. Int J Syst Evol Microbiol. 2008;58:149–59.
Kumar S, Stecher G, Tamura K, Dudley J. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets downloaded from. Mol Biol Evol. 2016;33:1870–4.
Heinsch, S. C., Otto-Hanson, L., Hsu, S.-Y., Kinkel, L. & Smanski, M. J. Genome sequences for Streptomyces spp. isolated from diseasesuppressive soils and long-term ecological research sites. Genome Announc. 2017;5.
We thank Bill Metcalf and Hyoung Sook Ann from the University of Illinois for providing raw sequence data for S. lydicus NRRL ISP-5461 and S. virginiae B-1447.
SCH is supported by a grant from the Biocatalysis Initiative at the University of Minnesota. SCH and MJS are supported by an award from the Damon Runyon Cancer Research Foundation. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
LK has an equity interest in, and serves as Chief Scientific Officer and on the Board of Directors of Jord BioScience, a company which may commercially benefit from the results of this research project. These interests have been reviewed and managed by the University of Minnesota in accordance with its conflict of interest policy.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Streptomyces sp. GS93–23, Table S2. Streptomyces sp. 3211–3,Table S3. Streptomyces sp. S3–4 gene clusters, Table S4. Cluster abundance for 125 Complete Streptomyces genomes, Figure S1. Indel comparison of Illumina polished vs. PacBio only assemblies.
About this article
Cite this article
Heinsch, S.C., Hsu, SY., Otto-Hanson, L. et al. Complete genome sequences of Streptomyces spp. isolated from disease-suppressive soils. BMC Genomics 20, 994 (2019). https://doi.org/10.1186/s12864-019-6279-8