Evolutionary paths of streptococcal and staphylococcal superantigens
© Okumura et al.; licensee BioMed Central Ltd. 2012
Received: 3 April 2012
Accepted: 30 June 2012
Published: 17 August 2012
Skip to main content
© Okumura et al.; licensee BioMed Central Ltd. 2012
Received: 3 April 2012
Accepted: 30 June 2012
Published: 17 August 2012
Streptococcus pyogenes (GAS) harbors several superantigens (SAgs) in the prophage region of its genome, although speG and smez are not located in this region. The diversity of SAgs is thought to arise during horizontal transfer, but their evolutionary pathways have not yet been determined. We recently completed sequencing the entire genome of S. dysgalactiae subsp. equisimilis (SDSE), the closest relative of GAS. Although speG is the only SAg gene of SDSE, speG was present in only 50% of clinical SDSE strains and smez in none. In this study, we analyzed the evolutionary paths of streptococcal and staphylococcal SAgs.
We compared the sequences of the 12–60 kb speG regions of nine SDSE strains, five speG + and four speG – . We found that the synteny of this region was highly conserved, whether or not the speG gene was present. Synteny analyses based on genome-wide comparisons of GAS and SDSE indicated that speG is the direct descendant of a common ancestor of streptococcal SAgs, whereas smez was deleted from SDSE after SDSE and GAS split from a common ancestor. Cumulative nucleotide skew analysis of SDSE genomes suggested that speG was located outside segments of steeper slopes than the stable region in the genome, whereas the region flanking smez was unstable, as expected from the results of GAS. We also detected a previously undescribed staphylococcal SAg gene, selW, and a staphylococcal SAg -like gene, ssl, in the core genomes of all Staphylococcus aureus strains sequenced. Amino acid substitution analyses, based on dN/dS window analysis of the products encoded by speG, selW and ssl suggested that all three genes have been subjected to strong positive selection. Evolutionary analysis based on the Bayesian Markov chain Monte Carlo method showed that each clade included at least one direct descendant.
Our findings reveal a plausible model for the comprehensive evolutionary pathway of streptococcal and staphylococcal SAgs.
Bacterial superantigens (SAgs) have been shown to cause the massive activation of host T cells, strongly influencing immunological disorders. To date, nearly 50 bacterial SAgs and related molecules have been described, primarily from Gram-positive bacteria [1–3]. Streptococcus pyogenes (GAS) is one species of bacteria that harbors SAg genes. Analyses of the entire genomes of 13 GAS isolates have shown that each contains two to seven SAg genes (Additional file 1), almost all located in the prophage regions of the genome. In contrast, genes encoding the SAgs speG and smez in GAS strains are not located on these mobile genetic elements, although some are surrounded by transposons. Thus, speG and smez in GAS may have been inherited from an ancestor by horizontal gene transfer. Although speJ in M1 GAS is not located on these mobile genetic elements, speJ is not conserved in the genome sequence of other GAS isolates, except for MGAS6180 (data not shown); in some strains, an SAg similar to speC is called speJ. We recently sequenced the entire genome of Streptococcus dysgalactiae subsp. equisimilis (SDSE) [DDBJ: AP010935] , a bacterium that causes life-threatening infectious diseases, including sepsis and streptococcal toxic shock syndrome, similar to GAS [5–7]. Analyses of its sequence showed that SDSE is the closest relative of GAS sequenced to date, with 65% identity (Additional file 2). Streptococcal bacteria other than GAS, such as S. dysgalactiae subsp. dysgalactiae and S. equi[9–11], have been reported to harbor more than one gene encoding proteins similar to SAgs. In contrast, targeted microarray analyses of 216 GAS virulence genes including SAgs in 58 SDSE strains isolated from human infections showed that the only SAg gene present in SDSE was speG, with about 50% of SDSE strains not harboring this gene [13–15].
Other representative bacterial SAgs and their related products have been identified in Staphylococcus aureus. At least 20 distinct staphylococcal SAgs have been described, including toxic shock syndrome toxin-1 (TSST-1), staphylococcal enterotoxins (SEs), and staphylococcal superantigen-like proteins (SSL), also called staphylococcal enterotoxin-like proteins (SEls) [1–3]. Almost all staphylococcal SAg genes are located in mobile genetic elements, such as prophages, transposons, plasmids, and pathogenicity islands (PIs). The distribution of these mobile elements among S. aureus isolates varies considerably . PIs that harbor the gene encoding TSST-1 can be excised and transduced with high efficiency by a staphylococcal phage .
In addition to these staphylococcal SAgs, recent studies have identified staphylococcal superantigen-like proteins (SSLs, also known as SETs), which have structural features similar to those of SAgs but do not possess SAg activity . All of the SSLs described to date are located in mobile genetic elements . Interestingly TSST-1, a functional SAg, shows higher sequence and structural similarity to SSL than to staphylococcal SAgs .
Structural analysis of SAgs has suggested that they evolved through the recombination of two smaller β-strand motifs, similar to the immunoglobulin binding motifs of streptococcal proteins G and L and the oligosaccharide/oligonucleotide binding family, such as the B subunits of AB(5) heat-labile enterotoxins, including cholera toxin, pertussis toxin, and verotoxin [19, 20]. However, the origin and evolutionary pathways of streptococcal and staphylococcal SAgs have not been well described.
To elucidate the origin of streptococcal SAgs based on de novo sequencing of SDSE strains and whole genome sequences, we have analyzed the synteny of the regions surrounding speG and smez in 13 GAS and 9 SDSE genomes. We also analyzed the genomic structures of all S. aureus strains for which whole genome data are available. We detected a previously undescribed gene that encodes a SEA-like protein (designated selW) and genes encoding SSL-like proteins, all of which are conserved in all S. aureus strains sequenced to date and are located in the core chromosome, not in any mobile elements. These findings, in addition to amino acid substitution analyses based on window analysis, cumulative TA-skew analysis and evolutionary analysis according to the Bayesian Markov chain Monte Carlo method, which allows the evolutionary path of SAg to be determined in chronological order, we were able to trace the origin and molecular evolution of streptococcal and staphylococcal SAgs.
To exclude the possibility that speG was acquired from a streptococcal phage, we compared the 50 kb sequences surrounding speG, a size sufficient to detect sequences derived from prophages. Synteny maps of the respective speG regions were essentially conserved in GAS strains and GGS_124 (Figure 1A), except for MGAS10750, which did not harbor the speG sequence present in the corresponding speG regions of GAS and GGS_124. We found that the speG region of each GAS genome contains two to ten genes, which encode factors similar to mobile elements and phage-related genes, such as transposase, IS and co-activator of prophage gene expression. In contrast, these mobile elements could not be detected in the corresponding speG region of GGS_124 (Figure 1A). The synteny of the regions surrounding the speG gene was highly conserved in eight GAS genomes (i.e. SF370, MGAS5005, MGAS2096, MGAS9429, MGAS10270, SSI-1, NZ131, and MGAS6180), each of which contains seven to eight transposase- and phage-related genes. These regions were 94% to 100% identical with each other. In the Manfredo genome, we found that IS1239, which is widely distributed in various isolates of GAS , had been inserted into the speG coding sequence, resulting in speG being a pseudogene in this strain (Figure 1A). In the MGAS8232 genome, IS1239 flanked speG.
Although a previous study suggested that speG transferred from SDSE to GAS , our results clearly indicate that the synteny surrounding speG in the GAS and GGS_124 genomes has been essentially conserved and that modifications of this context, by insertion of mobile elements, occurred only in GAS strains. These results strongly suggest that speG in GAS and SDSE is an orthologous, not a xenologous, gene, the latter defined as a gene displaced by horizontal transfer from another lineage . Moreover, speG in GAS and SDSE is a descendant of an ancestral streptococcal SAg and has been conserved in evolution.
Although the synteny of speG regions of GAS strains and GGS_124 has been highly conserved, about 50% of SDSE strains do not harbor speG[13–15]. We therefore selected nine SDSE isolates, five with (GGS_124, 163, 164, 168 and 170) and four without (RE378, SDSE_118, 160, and 165) speG (Additional file 1). Following direct genome sequencing (GGS_124 and RE378) or PCR amplification using speG specific primers (Additional file 3), we compared the sequences of these nine strains. Each of these isolates harbored a different emm type (Additional file 1), widely used to type GAS and SDSE strains . We also included the full sequence of the ATCC 12394 genome [GenBank: CP002215], an SDSE that does not harbor speG.
When we analyzed the genetic structures surrounding speG (12 to 60 kb) in these SDSE isolates (Figure 1B), we found that, in general, these structures were highly conserved, especially in the 12 kb regions between pgi (pink) and perR (blue), but that speG itself and its corresponding regions were not. Outside these 12 kb regions, we found that most of these strains contained 1 or 2 coding sequences similar to transposase or IS elements, including several that appeared to be common to the sequenced GAS genomes.
Remarkably, all four speG-negative strains (e.g. RE378, SDSE_118, 160, and 165) showed the insertion of an approximately 20 kb fragment between the hypothetical protein gene (locus_tag: SDEG_1990) and the gene similar to peptidoglycan endo-beta-N-acetylglucosaminidase (locus_tag: SDEG_1992) present in the GGS_124 genome, replacing speG (locus_tag: SDEG_1991) at the exact same site. These 20 kb fragments were composed of 19 or 22 coding sequences, which were similar to genes derived from evolutionally distant species such as Clostridium botulinum and C. tetani. However, the arrangements of these genes did not exactly match those of the clostridial genomes (data not shown), with most coding sequences sharing <60% similarity (e.g. Additional file 4). In contrast, genetic structures other than these 20 kb fragments were highly conserved among the speG-positive and -negative strains (Figure 1B). These findings indicated that synteny had been conserved in the regions surrounding speG, or the inserted 20 kb fragments, of these SDSE strains.
We also analyzed the genome context surrounding smez, another chromosomally encoded SAg, in GAS and SDSE strains. We found that the synteny of approximately 20 kb regions containing smez (see details for MGAS10270 and NZ131, below) were highly conserved in all GAS strains sequenced (Figure 2). Of the 13 completely sequenced GAS strains, 11 harbored smez genes, primarily at approximately 1.7 Mb. Although smez was present at the same site in SSI-1 as in the other GAS genomes, the former was functionally inactive due to a frame-shift mutation. In contrast, the MGAS10270 and NZ131 genomes did not contain smez fragments, even at other locations, despite their corresponding surrounding genome structures being highly conserved when compared with the other GAS strains. All GAS genomes contained highly similar dpp operons (dppA, dppB, dppC, dppD, and dppE) immediately downstream of smez, and all contained flaR and trpG, located upstream of smez coding sequences (Additional file 5).
Analysis of the GGS_124, RE378, and ATCC 12394 strains revealed that none contained fragments similar to the 702 bp smez coding sequence derived from the SF370 genome. We therefore searched for flaR and the dpp operon, which were highly conserved in the smez flanking regions of GAS genomes (Figure 2). In these three SDSE genomes, flaR was located at about 0.2 Mb, whereas the dpp operon was located at about 0.9 Mb, far from the position of flaR (Figure 2 and Additional file 5). Furthermore, synteny of the regions surrounding flaR and the dpp operon was not well conserved in these three SDSE genomes, suggesting rearrangement of the genome context. The flaR gene and the dpp operon show high similarities in GAS and SDSE (Additional file 5), with concomitant sequences observed only in GAS and SDSE but not in other streptococci (data not shown).
We next plotted cumulative TA-skew diagrams of the three sequenced SDSE chromosomes (GGS_124, RE378, and ATCC 12394). Use of a similar method on 12 sequenced GAS genomes showed that all cumulative TA-skew curves of GAS genomes displayed a V-shape, interrupted by segments of steeper slopes, called steep-slope regions (SSRs) . Diagram distortions including SSRs are thought to correspond to positions in which foreign genetic elements are integrated, including prophage-related genes , horizontally acquired elements , and pathogenicity islands , and in which genome rearrangements occur . The SSR was conserved among GAS strains, with smez at the border of the SSR, suggesting that this region is predisposed to be unstable .
S. aureus also harbors many types of SAgs, such as TSST-1, SEs, and SSLs. We identified a relatively unknown staphylococcal SAg, selW, and an ssl gene cluster, both of which are conserved in all S. aureus genomes examined to date. Moreover, we found that each of these genes was located in the same chromosomal region of the S. aureus genomes, not within any mobile elements. The highly syntenic conservation of selW and the ssl gene cluster among S. aureus genomes and their similarity to SEs and SSLs, respectively, suggest that they are likely the direct descendants of common ancestral SEs and SSLs, respectively.
The physiological activities and three-dimensional structures of SAgs are quite similar in streptococci and staphylococci. Although many studies have focused on staphylococcal SAgs in mobile elements, little is known about staphylococcal SAg-related gene(s) located on the core chromosome. To analyze the relationship among SAgs, we employed the Bayesian MCMC method. Although the phylogenetic tree we obtained was similar to that observed previously report , the method we used makes possible the determination of the temporal evolution of SAgs. Evolutionary analysis of the streptococcal and staphylococcal SAgs, and their related products, SSLs, showed that those molecules could be divided into three clades, each of which contains at least one direct descendant of an ancestor. SAgs of clades I and III consist of streptococcal and staphylococcal SAgs, respectively. In contrast, clade III is a mixture of streptococcal and staphylococcal SAgs, containing only SELW of S. aureus.
Streptococcal SAgs are one of the important virulence factors involved in life-threatening diseases such as streptococcal toxic shock syndrome (STSS) and scarlet fever. At present, a total of 11 SAgs have been identified by GAS genome sequencing, with most GAS isolates possessing several SAg genes in their genomes. Although the diversity of SAgs is thought to arise during horizontal transfer, their evolutionary pathway has not been determined. To better understand SAg evolution, we sequenced the entire genome of SDSE, the closest relative of GAS, which harbors speG as its only SAg gene. Genome-wide comparisons of GAS and SDSE provided evidence that speG is the direct descendant of a common ancestor of the streptococcal SAg. Furthermore, we also detected previously undescribed inter-species horizontal SAg gene transfer events among three pathogens, S. pyogenes, S. dysgalactiae subsp. equisimilis and S. aureus. This study is the first time to describe the origin and evolution of SAgs in pathogenic streptococci and staphylococci. These findings suggest that horizontal gene transfer is a more ubiquitous genetic exchange system than previously known, and that it sometimes crosses interspecies barriers.
All S. dysgalactiae subsp. equisimilis (SDSE) strains used in this study were isolated from patients with invasive infections in different hospitals throughout Japan (Additional file 1). Each SDSE isolate was cultured in 5% sheep blood agar or Brain Heart Infusion medium at 37°C under 5% CO2 as described .
Streptococci were lysed as described , and genomic DNA was purified using Wizard® Genomic DNA Purification Kits (Promega).
PCR reactions were performed in volumes of 50 μl containing TaKaRa ExTaq DNA polymerase (TaKaRa), with amplification on a GeneAmp PCR System 9700 (Applied Biosystems). Primer sets for direct sequencing were based on GGS_124 and RE378 genome sequence data, with each set designed to amplify 5 kbp PCR products with 500 bp overlapping regions. The PCR primer set for the speG-specific region has been described  (see also Additional file 3). PCR products were electrophoresed on 1.0% agarose gels and purified using QIAquick PCR Purification kits (QIAGEN). All DNA fragments were sequenced on an ABI3100 DNA sequencer with a redundancy of 4.
All isolates were grown overnight in 10 ml of Brain Heart Infusion medium at 37°C under 5% CO2 in 15 ml conical tubes. The cells were harvested, total RNA was purified using RNeasy Mini Kits (QIAGEN), and RNA concentrations were measured using a NanoDrop™ 1000 spectrophotometer (Thermo Scientific).
Total RNA was reversed transcribed into cDNA using Superscript III reverse transcriptase kits (Invitrogen) and oligo dT primers. PCR amplifications were performed using the primer sequences in Additional file 3. The PCR products were electrophoresed on 1.0% agarose gels and detected by UV-fluorescence after ethidium bromide staining.
Homology searches and IS searches were performed using BLAST ( http://blast.ncbi.nlm.nih.gov/Blast.cgi) and IS finder ( http://www-is.biotoul.fr/is.html), respectively. Cumulative TA skew analysis was performed using GenSkew ( http://genskew.csb.univie.ac.at/). Both window size and stepsize sequence length were set at 1000 bp. Rates of evolution were estimated by a Window Analysis of dN and dS, using the online interface of WINA 0.34 , in a sliding window size of 60 bp (20 codons) at 3 bp intervals. A phylogenetic tree was constructed with CLUSTALW ( http://clustalw.ddbj.nig.ac.jp/top-j.html), MrBayes 3.1.2 ( http://mrbayes.csit.fsu.edu/index.php) and TreeView X ( http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/index.html) software. Sequences were manually corrected using GENETYX-Mac (GENETYX Co.) and gene analysis was performed by in silico molecular cloning (in silico Biology Co.).
The DNA sequences of the region surrounding speG (10–60 kb) in each strain have been deposited in the DDBJ under the accession numbers listed in Additional file 1. Accession numbers for SAgs used in this study are listed in Additional file 9.
The authors thank Dr. Omoe at Iwate University for valuable discussions about the evolution of SEs. This work was partly supported by a grant for Research on Emerging and Reemerging Infectious Diseases (H22 Shinkouh-013). T. M. A was supported by JSPS KAKENHI Grant Number 21590503 and 24390109.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.