Genetic variability of mutans streptococci revealed by wide whole-genome sequencing

Background Mutans streptococci are a group of bacteria significantly contributing to tooth decay. Their genetic variability is however still not well understood. Results Genomes of 6 clinical S. mutans isolates of different origins, one isolate of S. sobrinus (DSM 20742) and one isolate of S. ratti (DSM 20564) were sequenced and comparatively analyzed. Genome alignment revealed a mosaic-like structure of genome arrangement. Genes related to pathogenicity are found to have high variations among the strains, whereas genes for oxidative stress resistance are well conserved, indicating the importance of this trait in the dental biofilm community. Analysis of genome-scale metabolic networks revealed significant differences in 42 pathways. A striking dissimilarity is the unique presence of two lactate oxidases in S. sobrinus DSM 20742, probably indicating an unusual capability of this strain in producing H2O2 and expanding its ecological niche. In addition, lactate oxidases may form with other enzymes a novel energetic pathway in S. sobrinus DSM 20742 that can remedy its deficiency in citrate utilization pathway. Using 67 S. mutans genomes currently available including the strains sequenced in this study, we estimates the theoretical core genome size of S. mutans, and performed modeling of S. mutans pan-genome by applying different fitting models. An “open” pan-genome was inferred. Conclusions The comparative genome analyses revealed diversities in the mutans streptococci group, especially with respect to the virulence related genes and metabolic pathways. The results are helpful for better understanding the evolution and adaptive mechanisms of these oral pathogen microorganisms and for combating them.


Background
Traditionally and supported by 16S rRNA gene and rnpB gene sequence analyses, the genus Streptococcus is divided into several groups, with the mutans group streptococci consisting of the species S. mutans, S. sobrinus, S. ratti, S. criceti, S. downei, S. macacae, andbut controversially discussed -S. ferus (for update refer to www.bacterio.net/s/streptococcus.html) [1]. Mutans streptococci are considered significant contributors to the development of dental caries [2]. By attaching to the tooth surfaces and forming biofilms, they can tolerate and adapt to the harsh and rapidly changing physiological conditions of the oral cavity such as extreme acidity, fluctuation of nutrients, reactive oxygen species, and other environmental stresses [3]. They occasionally also cause bacteremia, abscesses, and infective endocarditis [4,5]. Many strains of mutans streptococci are genetically competent, i.e. they can take up DNA fragments from the environment and recombine them into their chromosome, an important mechanism for horizontal gene transfer (HGT). The ability of some bacteria to generate diversity through HGT provides a selective advantage to these microbes in their adaptation to host eco-niches and evasion of immune responses [6,7]. Due to diversities in the genetic contents between different isolates, the genome content of a single isolate does not necessarily represent the genomic potential of a certain species. With the rapid development of DNA sequencing technologies, the steadily increasing genome data enable us to dig the evolutionary and genetic information of a species from a pan-genome perspective. In 2002, the release of the genome sequence of S. mutans UA159, the first genome sequence of mutans group streptococci, has greatly helped in understanding the robustness and complexity of S. mutans as an oral and odontogenic (e.g. infective endocarditis and abscesses) pathogen [8]. Later, after the genome sequence of S. mutans NN2025 became available, a comparative genomic analysis of S. mutans NN2025 and UA159 provides insights into chromosomal shuffling and species-specific contents [9]. Recently, Cornejo et al. have studied the evolutionary and population genomics of S. mutans based on 57 S. mutans draft genomes and revealed a high lateral gene transfer (LGT) rate of S. mutans [10].
In this study, we determined the whole genome sequences of six S. mutans strains (5DC8, KK21, KK23, AC4446, ATCC 25175 and NCTC 11060), one S. ratti strain (DSM 20564) and one S. sobrinus strain (DSM 20742) and performed cross-comparison with the genome contents of S. mutans UA159 and NN2025, focusing on issues that are highly related to pathogenicity. The coreand pan-genome of S. mutans was analyzed by including 67 currently available S. mutans genome sequences. By constructing and comparing the genome-scale metabolic networks, the diversities in sub-networks (pathways) are systematically revealed. The results should be helpful for understanding the evolution and pathogenicity, as well as for prevention and treatment, of these very common opportunistic pathogens.

Results and discussion
Genome sequencing, assembly and annotation of eight mutans streptococci strains As expected, the overall genomic features of the eight S. mutans strains are more close to each other than to S. ratti and S. sobrinus. This is consistent with the results of the phylogenetic analysis, as visualized by the phylogenetic tree constructed based on 16S rRNA and core genes single-nucleotide polymorphisms (SNPs) information shown in Figure 1. An overview of the genome assemblies and annotations of the 6 S. mutans isolates as well as S. ratti DSM 20564 and S. sobrinus DSM 20742 is summarized in Table 1 in comparison with two previously sequenced S. mutans strains, namely UA159 and S. mutans NN2025. The average GC contents are in the range of low GC organisms [8]. The genome sizes are very close to each other, with the largest one from S. sobrinus DSM 20742 and the smallest one from S. mutans KK23 showing merely 5.7% differences. The total numbers of protein-coding sequences per genome are also similar among all the strains compared.

Chromosomal rearrangement of the S. mutans strains
Genome rearrangements have important effects on bacterial phenotypes and the evolution of bacterial genomes [12]. A comparison of the genomes of S. mutans NN2025 and UA159 discovered a large genomic inversion across the replication axis and similar genomic variations were also confirmed among 95 clinical S. mutans isolates using long-PCR analysis [9]. In this study, genome rearrangements among the eight S. mutans strains are determined by genome alignment using the MAUVE software [13]. The results are presented in Figure 2, which shows the locally collinear blocks (LCBs) representing the landmarks, i.e. the homologous/conserved regions shared by all the input sequences in the chromosomes. A LCB is defined as a collinear (consistent) set of exactly matched subsequences (multiple maximal unique matches, namely 'multi-MUMs') which are shared by all the chromosomes considered, appear once in each chromosome and are bordered on both sides by mismatched nucleotides. The weight (the sum of the lengths of the included multi-MUMs) of a LCB serves as a measure of confidence that it is a true homologous region rather than a random match.
As shown in Figure 2, 16 LCBs (marked as A to P) are identified by multi-alignment of the eight S. mutans genome sequences. Compared to UA159 and NN2025, the chromosome segment represented by LCB E is reversely inserted between the LCB G and H in the strain AC4446, and between the LCB L and M in the strain KK23. This segment is related to the genomic island "SMU.100-SMU.116" of S. mutans UA159 which mainly contains sorbitol phosphotransferase system (PTS), transposase and hypothetical proteins [15]. LCB N is found to be reversed and relocated to the position between LCB A and B in the strain AC4446. A cluster of tRNA genes is found to be located downstream of LCB N. In KK23, LCB I and J are moved to position between LCB F and G. A tRNA-Gln and a tRNA-Tyr is found to be adjacent to the left of LCB I. LCB K in NCTC 11060, AC4446, KK23 and NN2025 are very similar to each other but much smaller than those of other strains (with sequence length reduced about two-thirds). The missing sequence corresponds to the genomic island TnSmu2 of S. mutans which harbors a nonribosomal peptide synthetase-polyketide synthase gene cluster responsible for the biosynthesis of pigments [16]. Using the known information about genomic islands in S. mutans UA159, additional genomic islands were found to be present/absent in the mutans streptococci strains of this study [15,17] (see Additional file 1). Furthermore, there are much more diversities as shown by the white areas inside the LCBs which show regions with low similarities. However, it should be noticed that there might be more genome rearrangements among the strains, because draft genome sequences are used in current analysis and all contigs in each genome are sorted according to the reference genome sequence of the strain UA159.
Core-and pan-genome analysis of S. mutans The genetic variability within species in the domain Bacteria is much larger than that found in other domains of life. The gene content between pairs of isolates can diverge by as much as 30% in species like Streptococcus Figure 1 Phylogenetic analysis of the 10 mutans streptococci strains compared in this study and their phylogenetic relationship to other Streptococcus species with genomes known before 01/01/2011. a) 16S rRNA phylogenetic tree of Streptococcus species with genomes known before 01/01/2011 (Since the 16S rRNA sequences were almost identical between the different S. mutans strains, only UA159 is shown as representative). b) Phylogenetic tree of mutans streptococci constructed with the core-genome SNPs obtained by PGAP pipeline [11]. All phylogenetic trees were constructed using ClustalX and Phylip by applying the maximum likelihood (ML) method with bootstrap value set to 100. Values beside branches depict ML bootstrap support values. The scale bars in the unit of "substitution/site" are shown below the trees.
pneumoniae [18]. This unexpected finding led to the introduction of the pan-genome concept, which describes the sum of genes that can be found in a given bacterial species [19,20]. The genome of any isolate is thus composed of a "core-genome" shared with all strains of this particular species, and a "dispensable genome" that accounts for the phenotypic differences between strains. The pan-genome is usually much larger than the genome of any single isolate, constituting a reservoir that could enhance the ability of many bacteria to survive in stressful environments. The pan-genome concept has important consequences for the way we understand bacterial evolution, adaptation, and population structure, as well as for more applied issues such as vaccine design or the identification of virulence genes [21]. In this study, we performed core-genome and pan-genome analyses of 67 S. mutans strains, including the eight S. mutans strains sequenced in this study and 59 S. mutans strains with genome available in NCBI till April 2013.

Core-genome
The core-genome size of the 67 S. mutans strains was calculated to be 1,373. Detailed information of the core genes are provided in the Additional file 2. To estimate the theoretical core-genome size achievable with an infinite number of S. mutans genomes, core-genome size medians Figure 2 Comparison of local collinear blocks (LCBs) of chromosomal sequences of the eight S. mutans strains. In total 16 local LCBs, marked as A to P, were generated and compared by applying the MAUVE software [13,14] with default settings and using strain UA159 as reference. The red vertical bars indicate contig ends. The white areas inside each LCB show regions with low similarities. corresponding to different genome numbers as shown in Figure 3a by the red rectangles were first calculated by random sampling 1,000 genome combinations of n genomes out of the 67 S. mutans genomes. Then, we applied the exponential regression core-genome model Tettelin et al. [19,20] to fit the median data points of the core-genome sizes, where K c , τ c and Ω are parameters, n represents the number of genomes, and Ω stands for the theoretical core-genome size. To take into consideration the different deviations of the core-genome size medians, as clearly indicated by the blue error bars in Figure 3a, we modified the fitting process by introducing the genome number as weight to the corresponding data point. The fitting parameters thus obtained are as follows: r 2 = 0.97403, K c = 325.74718 ± 10.00912,Ω = 1,369.41225 ± 1.986,τ c = 15.90248 ± 0.66807 (Detailed information of all core and pan-genome modeling are given in Additional file 3). Using this fitting result to describe the core-genome of S. mutans, the theoretical core-genome size (Ω) was estimated to be around 1,370 genes, which is slightly lower than the calculated core-genome size (1,373) using 67 genomes. Compared with other streptococcus species, the core-genome of S. mutans is at the same level to the core-genome of S. pyogenes (1,400 genes determined using 11 strains), less than that of S. pneumoniae (1,647 genes determined using 47 strains) and S. agalactiae (1,800 genes determined using eight strains) [19,22,23]. However, we should be cautious with such comparison. In a recent study of Cornejo et al. [10], the core genome size of S. mutans was determined as 1,490 by using 57 S. mutans genomes which is obviously different to the core genome size of S. mutans we estimated, although we included the 57 S. mutans genomes used by Cornejo et al. in our study. The difference can be caused by different reasons, such as difference in the correction step for core gene determination and, very likely, different methods and parameter settings used for determining orthologs. Apparently, we have used a more stringent process to determine orthologs which led to smaller core genome size of S. mutans estimated.

Pan-genome
We applied three models, namely y = a + bx c , y = a − bln (x + c) and y = a × e − x/b + c (where a, b and c are parameters) for modeling the pan-genome of S. mutans, as shown in Figure 3b by green, blue and red curves respectively (all fitting results are detailed in Additional file 3). Both the fitting results of using y = a + bx c and y = a − bln(x + c) indicated an infinite pan-genome, while the fitting result of using y = a × e − x/b + c resulted in a negative value of the parameter a, suggesting a finite pan-genome However, the last fitting shows obvious deviations to many of the data points. Especially, the deviations even become larger with increased genome numbers, indicating that this model is not suitable. The best fitting result obtained with the model y = a + bx c shows fittings to all the data points with very high confidence. Based on this model, the pan-genome of S. mutans is still "open" while 67 genomes were included, for S. agalactiae based on the use of 9 S. agalactiae genomes. The three regression models used in this study are all based on the assumption that contingency genes are independently sampled from the pan-genome with equal probability, except in the case of "specific/unique genes", which are modeled as unique events that appear only once in the entire global population. Hogg et al. [24] proposed a finite supragenome model for pan-genome based on a different supposition that contingency genes are sampled from the pan-genome with unequal probability. By applying this finite supragenome model to 44 S. pneumoniae genomes, the predicted number of new genes drops sharply to zero when the number of genomes exceeds 50. However, in the case of S. mutans we could not observe such sharp decrease of new gene number even after 67 genomes were included. In the study of Cornejo et al. [10], they proposed a finite pan-genome for S. mutans, after they used a special "pseudogene cluster" identification process to exclude about 30% of the rare genes that are considered to be pseudogenes. However, they didn't provide detailed parameters they obtained from fitting. Our modeling using the 67 S. mutans genomes by applying the model described above without any restrictions pointed to an infinite pan-genome of S. mutans. However, we would like to understand this predicted "infinite" pan-genome as follows: 1) a "pan-genome" should be considered as "dynamic" rather than "static", which means the pan-genome content is changing during the evolution, it does not matter if its size is infinite or finite; 2) The change of a pan-genome content can be caused by the acquirement of new genes or by the loss of genes; 3) The actual pan-genome size can be more stable than the content of the pan-genome but can also change during evolution coupled with the change of the environment. Thus, without considering the "gene loss events", it's quite understandable to have a "growing" or "infinite" pan-genome as gene acquirement occurs no matter how slow it might be. Interestingly, Cornejo et al. found a high rate of LGT in S. mutans, where many genes were acquired from related streptococci and bacterial strains predominantly residing not only in the oral cavity, but also in the respiratory tract, the digestive tract, cattle, genitalia, in insect pathogens and in the environment in general [10]. Such high rate of LGT might also lead to a continuously growing pan-genome.
Gene content-based comparative analysis of 10 mutans streptococci strains The annotated protein sequences of all the genomes studied were cross-compared based on alleles/ortholog groups established by the program OrthoMCL [25]. In total, 2,211 putative alleles/ortholog groups are established, as documented in Additional file 4. A pair-wise comparison of the protein coding sequences between each two strains is shown in Table 2. It is clear to see that remarkable differences in protein coding sequences exist between the strains, even inside the same species of S. mutans. In the following sections, systems that are highly related to stress resistance and pathogenicity are presented and discussed. As all the following results are based on putative alleles/ortholog groups established by OrthoMCL, if not otherwise specified, the word "putative allele(s)/ ortholog(s)" is omitted in the following text.
High diversities of the competence development regulation module In a previous study we have systematically discussed the two-component signal transduction systems (TCSTSs) in the 10 mutans streptococci strains [26]. ComDE, one of the TCSTS is directly related to competence development. Competence development is a complex process involving sophisticated regulatory networks that trigger the capacity of bacterial cells to take up exogenous DNA from the environment. This phenomenon is frequently encountered in bacteria of the oral cavity, e.g., S. mutans [27]. In S. mutans, ComX, an alternative sigma factor, drives the transcription of the so called 'late-competence genes' required for genetic transformation. ComX activity is modulated by the inputs from two types of signal pathways, namely the competence-stimulating peptide (CSP) dependent competence regulation system and CSP-independent competence regulation system. ComX and the 'late-competence genes' regulated by ComX as labeled by boldface in Table 3, are highly conserved even between the species, indicating that all mutans streptococci studied here might have the potential ability of transforming to genetic competence state. On the other hand, the upstream signal pathways regulating the activity of ComX show high variety as discussed in details below.

CSP-dependent competence regulation system
It has been reported that the ComABCDE system in S. mutans combines the action of the two ortholog systems which are present as ComABCDE and BlpABCRH in S. pneumoniae and involved in competence regulation and bacteriocins regulation, respectively. It should be noticed that, ComAB have been primarily considered to be the transporter of ComC, the precursor of CSP. Later, ComAB have been renamed as NlmTE as they were found to function together as transporter of nonlantibiotic bacteriocins, while another gene pair CslAB was supposed to be the transporter of ComC [28]. However, a recent study confirms that ComAB is indeed a transporter both for nonlantibiotic bacteriocin and the peptide pheromone CSP [29].
In S. mutans, the comC-encoded prepeptide of CSP has a leader sequence containing a conserved double glycine (GG), at which the leader sequence is cleaved during transporting by ComAB to generate the mature signal peptide (CSP-21) containing 21 amino acid residues [28,30,31]. Recent studies show that an extracellular protease, SepM (SMU.518), is involved in the further processing of CSP-21 by removing the "LGK" residues in the C-terminal to generate a 18-residue peptide (CSP-18), which can work at a concentration much lower than that of CSP-21 [29,32]. SepM is identified in all the 10 strains compared in this study, although putative comC alleles are present only in the eight S. mutans strains, not in the S. sobrinus DSM 20742 and S. ratti DSM 20564. Multi-alignment of the ComC sequences shows clear variations among different S. mutans strains ( Figure 4a). Genetic variation of ComC in S. mutans has been reported previously [33]. Interestingly, the C-terminal amino acid sequence "LGK" of ComC is absent in the ComC prepetides of S. mutans KK23 and AC4446, which have also been observed previously in other S. mutans strains by Allan et al. [33]. ATCC 25175 possesses a unique ComC sequence ended with "LGKIR" at its C-terminal. In addition to the variations at the carboxyl end, substitutions of single amino acid residues at different positions are also found.
We have verified all the variants of comC revealed in this study by PCR experiments. Although Allan et al. pointed out that different comC alleles in some clinical strains of S. mutans exist but their products are functionally equivalent and there is no evidence of phenotype specificity [33], considering the complexity of phenotype evaluation, whether and how the variations found in this study may affect the natural genetic competence of these S. mutans strains requires further investigation.
The CSP-initiated activation of the response regulator ComE, through its cognate receptor kinase ComD, leads to the induction of competence through the alternative sigma factor ComX, and at the same time ComE directly induces a set of bacteriocin-related genes [28,30,[34][35][36][37][38]. In our previous study focused on the comparison of the twocomponent signal transduction systems of these mutans streptococci strains, we have reported the complete missing of ComDE in S. ratti DSM 20564 and the low similarities of putative ComDE in S. sobrinus DSM 20742 to the ComDE of S. mutans strains [26]. Accordingly, no comC-like genes could be identified in S. ratti DSM 20564 and S. sobrinus DSM 20742. Thus, it can be inferred that S. ratti DSM 20564 and S. sobrinus DSM 20742 are totally different to the S. mutans strains regarding cellular functions including genetic competence associated with the ComABCDE system.
In S. mutans, no binding motif for ComE is present in the promoter region of ComX, suggesting that ComE is not a direct regulator of ComX, whereas a new peptide regulator system (ComSR) downstream of ComE that directly activates ComX has been identified by Mashburn-Warren et al. ComR activates the expression   In the case that more than one paralogs were found, the most similar ortholog is marked in italic. The rows related to highly conserved 'late-competence genes' were shown in boldface. The missing of ComDE in S. ratti DSM 20564 has been published in our previous study [26]. of the ComS, which is secreted, processed, and internalized through the peptide transporter OppD. The processed peptide, designated XIP (for sigma X-inducing peptide), modulates the activity of ComR, which in turn activates the expression of ComX. Deletion of comR or comS gene completely abolished the competence in S. mutans [39]. In this study, the ComSR regulating system is identified in most of the strains, except for S. sobrinus DSM 20742 which lacks the ComSR-coding genes. This well explains the fact that despite the presence of comX and the 'latecompetence genes' we were not able to obtain the genetic competence state of S. sobrinus DSM 20742 (see discussion later in the "Variability and specificity in metabolic pathways and network" part). It is also worth to mention that the putative ComS ortholog found in S. ratti DSM 20564 is quite different to those of S. mutans strains, as shown in Figure 4b.

CSP-independent competence regulation system
It has been reported that a basal level of competence remains (referred as CSP-independent competence) after the deletion of comE from S. mutans, suggesting that the CSP-dependent regulation system is one of the several signaling pathways involved in ComX activation [34].
Indeed, under conditions of biofilm growth the HdrMR system, a novel two-gene regulatory system, has been shown to contribute to competence development through the activation of ComX by a yet unknown signal [40]. Moreover, microarray analysis revealed that both regulators, ComE and HdrR, activate a large set of overlapping genes [40,41]. Recently, Xie et al. identified in S. mutans another regulatory system, designated BsrRM, that primarily regulates bacteriocin-related genes but also affects the HdrMR system and thus indirectly contributes to competence development [42]. In this study, HdrR, the response regulator of the HdrMR system, is found neither present in S. ratti DSM 20564 nor in S. sobrinus DSM 20742. Furthermore, the response regulator BrsR of the BsrRM system is also absent in S. ratti DSM 20564, whereas S. sobrinus DSM 20742 lacks the complete BsrRM system. It's also worth to mention that a competence damageinducible protein CinA, which is regulated via ComX and has been proven to be related to DNA damage, genetic transformation and cell survival [43], is present in all strains. Taking together, both the CSP-dependent and CSPindependent competence regulation systems in S. ratti DSM 20564 and especially in S. sobrinus DSM 20742 are very different to those of the S. mutans strains.

Bacteriocin-related proteins
Bacteriocins are proteinaceous toxins produced by bacteria to kill or inhibit the growth of similar or closely related bacterial strain(s). Bacteriocins produced by mutans streptococci are named "mutacins". As dental plaque, the dominating niche of mutans streptococci, is a multispecies biofilm community that harbors many microorganism species, mutans group strains have developed a variety of mutacins to inhibit the growth of competitors, such as mitis group streptococci [44][45][46]. In this study, information about known mutacins as well as mutacin-immunity proteins was collected from the NCBI (http://www.ncbi.nlm.nih.gov) and Oralgen (http://www.oralgen.lanl.gov/) databases, as well as by searching for related publications. The collected protein sequences, as listed in Additional file 5, were used to blast against the proteomes of the 10 strains to see whether or not these known mutacins and mutacin-immunity proteins do exist in the mutans streptococci strains of this study. Distributions of identified mutacins and mutacin-immunity proteins are summarized in Table 4. Using this approach it is, however, not possible to identify any new types of mutacins.
Diversity of Streptococcus bacteriocins has been reported previously [57,58]. The mutacin assortments among the 10 strains in this study also demonstrate certain variations. An interesting result is that in contrast to S. mutans strains and S. ratti DSM 20564, S. sobrinus DSM 20742 does not possess any genes coding for mutacin-like proteins. Mutacin-SMB has been identified in S. mutans and S. ratti previously [47,48]. In our study, mutacin-SMB cluster was only identified in S. ratti DSM 20564 comprising 7 genes, including the mutacin-coding genes smbA and smbB, as well as 5 mutacin-related genes (smbG-> D822_07603, smbT-> D822_07593, smbM-> D822_07578, smbF-> D822_07588, and smbM2-> D822_07598). Lantibiotic mutacins, mutacin-I [49], mutacin-II [51] and mutacin-III [52], are completely absent in the 10 mutans streptococci strains. However, three gene copies possibly encoding the precursor of the lantibiotic mutancin mutacin-K8 are identified in the S. mutans strains KK23 and NN2025. Mutacin-K8 is an ortholog of the bacteriocin Streptococcin A-FF22 identified in group-A streptococci [59], and its production system has previously also been identified in the S. mutans strain K8 [53]. By carefully examining the genes surrounding mutacin-K8 precursor genes the gene cluster coding for a complete mutacin-K8 production system is also revealed in the strains KK23 and NN2025 ( Figure 5). A partial ortholog of the mutacin-K8 production system is found in S. mutans UA159, 5DC8 and KK21, with only genes responsible for the immunity (scnFEG) left behind. Orthologous genes coding for a part of the mutacin-K8 production system are also found in S. mutans AC4446, consisting of only scnFEG, scnT (coding a lantibiotic exporter) and a part of scnM (coding the lantibiotic synthetase). Since a gene encoding ISSmu2-type transposase is found to be located upstream of mutacin-K8 precursor genes, we infer that the variety of mutacin-K8 production system in S. mutans strains studied here is highly possible to be caused by transposase actions.
Mutacin-IV, nonlantibiotic bacteriocins coded by nlmA/ B (SMU.150/151, Note: hereinafter whenever needed/ possible the locus_tag of the reference strain S. mutans UA159 is given for convenience) was discovered first in S. mutans UA140 to be active against the mitis group streptococci [54]. In this study, nlmA/B are found to be present in six of the S. mutans strains, including UA159, 5DC8, KK21, KK23, ATCC 25175 and NCTC 11060, but not in S. mutans NN2025 and AC4446, nor in S. ratti DSM 20564 and S. sobrinus DSM 20742. On the other hand, the immunity protein for mutacin-IV (SMU.152), is identified in all strains, consistent with the fact that no inhibition phenomenon has been observed yet among different mutans streptococci strains. A mutacin-IV like protein found before in the strain UA159 (SMU.283) is identified in all strains except for S. sobrinus DSM 20742.
Mutacin-V, another nonlantibiotic peptide coded by cipB (SMU.1914) is found, in addition to S. sobrinus DSM 20742, also absent in the S. mutans strains ATCC 15175 and NCTC 11060. There are two homologs of mutacin-V immunity protein in S. mutans UA159, namely CipI (SMU.925) and SMU.1913 [8,60]. These two immunity proteins share a sequence identity of 82%. However, it has been reported that though very likely co-transcribed with cipB, SMU.1913 cannot prevent CipB-caused cell lysis in S. mutans UA159, and the key immunity factor of mutacin-V has been supposed to be CipI (SMU.925) rather than SMU.1913 [60]. All the 10 strains including S. sobrinus DSM 20742 possess at least one orthologous gene encoding one of the two mutacin-V immunity proteins. Based on the similarity scores S. mutans NN2025 does not have an ortholog of CipI (SMU.925), but it possesses an ortholog (GI|290579764) of SMU.1913, which is possibly co-transcribed with GI|290579764, the cipB ortholog in S. mutans NN2025. Furthermore, the only putative immunity protein D822_3349 in S. ratti DSM 20564 shows very close similarities to SMU.925 (61%) and SMU.1913 (56%) and is possibly co-transcribed with D822_03354, the CipB ortholog in S. ratti DSM 20564. From these results, we suppose that SMU.1913, which is co-transcribed with cipB (SMU.1914), might be the ancestor gene coding for the mutacin-V immunity factor. The additional copy, like SMU.925 in S. mutans UA159, might be generated by duplication action and evolved as the dominant immunity factor in some of the mutans streptococci strains.
Furthermore, a possible nonlantibiotic bacteriocin peptide (SMU.423) is found to be present in all strains except for S. ratti DSM 20564. Putative ComAB, which has been proved to be the transporter complex of mutacin IV in S. mutans [28], are identified in all strains, supporting the suggestion that ComAB might function as a common transporter for multi-type nonlantibiotic bacteriocins rather than merely for mutacin IV. Moreover, an additional paralog of ComA is present in most of the strains except for S. mutans KK23 and S. mutans ATCC 25175.
To summarize, a differed distribution of mutacin/bacteriocin encoding genes accompanied with a high conservation of genes coding for mutacin-immunity proteins are revealed for the 10 mutans streptococci strains/species. The conservation of mutacin immunity proteins apparently plays an important role for the survival of mutans streptococci strains under a bacteriocin-rich environment.

Antibiotic resistance-related proteins
Bacteria and other microorganisms that cause infections are remarkably resilient and can develop ways to survive drugs meant to kill or weaken them. Antibiotic resistance can be a result of horizontal gene transfer [61], and also of unlinked point mutations in the pathogen genome at a rate of about 1 in 10 8 per chromosomal replication [62]. The antibiotic action against the pathogen can be seen as an environmental selective pressure and bacteria which have developed mutations allowing them to survive will live on to reproduce. They will then pass this trait to their offsprings, which will result in the evolution of fully resistant colonies. Putative resistance-related genes are identified and listed in Table 5. The S. mutans species is known to be intrinsically resistant to bacitracins produced by Bacillus subtilis. We confirmed this by testing all the 10 strains with a bacitracin-E-test (data not shown). All strains including S. ratti DSM 20564 and S. sobrinus DSM 20742 had a minimum inhibitory concentration between 128 and >256 μg/l. In fact, this antibiotic is used to isolate mutansstreptococci from highly heterogeneous oral microflora. It has been reported that bceABRS (also named as mbrABCD) system, encoding a two component signal transduction system and an ABC-transporter, is required for bacitracin resistance in S. mutans [63,64]. As expected, ortholog of bceABRS system is found to be present in all strains. Furthermore, an ortholog of a putative bacitracin resistance protein UppP (SMU.244, undecaprenyl-diphosphatase) is present in all strains. It has been proved that overexpression of UppP in Escherichia coli and Bacillus subtilis results in bacitracin resistance [65,66]. However, the function of UppP in bacitracin resistance in mutans streptococci has not yet been investigated. Based on its conservation in all strains studied here, we suppose that UppP might play an important role in bacitracin resistance for mutans streptococci species as well.
Two penicillin-binding proteins (SMU.75 and SMU.455) are identified in all strains, indicating that they are potentially all susceptible to penicillin. Phenotypically all strains were tested to be susceptible to penicillin (data not shown). On the other hand, all the strains possess orthologs of SMU.368c, SMU.400, SMU.1444c and SMU.1515, which are homologs to beta-lactamases (EC 3.5.2.6), as well as orthologs of two so called beta-lactam resistance factors (SMU.716, SMU.717). Thus, all the strains are potentially capable of resistance against beta-lactam antibiotics. Orthologs of macrolide-efflux transporter proteins, as coded by GI|290581182 and GI|290581181 in S. mutans NN2025, are found to be also present in S. mutans 5DC8 and S. mutans KK21. A vancomycin b-type resistance-associated protein (D822_01634) is uniquely present in S. ratti DSM 20564, but our phenotypic testing showed as expected that S. ratti DSM 20564 is susceptible to vancomycin. Furthermore, several putative multidrug resistance-associated proteins (SMU.745, SMU.1611c and SMU.905 except for SMU.1286c) are found to be present in all strains.

Oxidative stress defense systems in mutans streptococci
For protection against reactive oxygen species (such as O 2 -, H 2 O 2 , HO·) or adaptation to oxidative stresses aerobes and facultative anaerobes have evolved efficient defense systems, comprising an array of antioxidant enzymes such as catalase, superoxide dismutase (SOD), Dps-like peroxide resistance protein, alkylhydroperoxide reductase (AhpCF), glutathione reductase, and thiol reductase, which have been identified in many bacterial species.
Although the first genome sequence of S. mutans UA159 has already been published in 2002, the oxidative stress defense systems in the group of mutans streptococci have not yet been systematically discussed. By searching for known antioxidant systems in the genomes of the sequenced mutans streptococci strains of this study, we Figure 5 Cluster structure of the mutacin-K8 production system across six S. mutans strains. The ORFs colored in yellow are the possible mutacin-K8 precursor genes. scnGEF: bacteriocin related ABC element; possible immunity system; scnK: histidine kinase of two component system; scnR: response regulator of two component system (Note: mutacin-K8 production system was failed to be identified in S. mutans NCTC 11060, S. mutans ATCC 25175, S. ratti DSM 20564 and S. sobrinus DSM 20742). obtained an overview of putative oxidative defense systems in these mutans streptococci strains/species which are composed of superoxide dismutase (SOD), AhpF/AhpC system, Dpr, thioredoxin system and glutaredoxin system, as shown in Table 6. SOD, which catalyzes the dismutation of superoxide into oxygen and hydrogen peroxide, is an important antioxidant defense in nearly all cells exposed to oxygen [67]. SOD is found in all strains of this study. Catalase, which catalyzes the decomposition of hydrogen peroxide, is not found in any of the mutans streptococci strains of this study. It is known that although most streptococci can grow in the presence of air, they do not possess a catalase, implying that hydrogen peroxide defense mechanism, by which lactic acid bacteria established their growth in air, are very different to those of aerobes. It has been reported that both the bi-component peroxidase system AhpF/AhpC and Dps-like peroxide resistance protein confer tolerance to oxidative stress in S. mutans [68].
The AhpF/AhpC system catalyzes the NADH-dependent reduction of organic hydroperoxides and/or H 2 O 2 to their respective alcohol and/or H 2 O. Both AhpF and AhpC are present in all S. mutans strains of this study and in S. ratti DSM 20564, but are absent in S. sobrinus DSM 20742. The natural missing of AhpF and AhpC in S. sobrinus indicates that AhpF/AhpC system is not an essential peroxide tolerance system for some mutans streptococci species. While studying a ahpF and ahpC double deletion mutant of S. mutans, Higuchi et al. [69] found that the mutant still showed the same level of peroxide tolerance as did the wild-type strain that led them to the finding of the dpr gene, which encodes a ferritin-like iron-binding protein involved in oxygen tolerance by limiting the nonenzymatic hydroxyl radical synthesis via iron-catalyzed 'Fenton reaction' in S. mutans. Their further studies on the biological function of dpr found that dpr gene from S. mutans chromosome was capable of complementing an alkyl hydroperoxide reductase-deficient mutant of E. coli, as well as complementing the defect in peroxidase activity caused by the deletion of ahpF/ahpC in S. mutans, indicating that dpr plays an indispensable role in oxygen tolerance of S. mutans [68,70]. Dpr homologs were found in all strains as expected by the supposed essential function of dpr gene in oxygen tolerance.
Thioredoxins are a class of small redox mediator proteins known to be present in all organisms. They are involved in many important biological processes, including redox signaling. Thioredoxins are kept in the reduced state by the flavor enzyme thioredoxin reductase in a NADPH-dependent reaction [71]. They act as electron donors to many proteins including thiol peroxidases [72]. Thioredoxin, thioredoxin reductase and thiol peroxidase, the components of thioredoxin system, are identified in all the strains of this study. Two putative thioredoxin reductases (SMU.463 and SMU.869) are found in all strains/species. It has been reported that in some species thioredoxin reductases have been evolved to be activated by both NADPH and NADH [73]. We speculate that SMU.463 and SMU.869 might have been evolved to have different preferences to NADPH and NADH (SMU.463 and SMU.869 shares less than 20% similarities). If it holds true, this could be advantageous for these mutans streptococci, as the extra amount of NADH produced from glycolysis/gluconeogenesis pathway under anaerobic conditions could be directly used for oxidative stress resistance. Thioredoxin (SMU.1869) and two thioredoxin family proteins (SMU.1971c and SMU.1169c) are found to be present in nearly all strains, except for S. sobrinus DSM 20742, which lacks any ortholog of SMU1169c. An ortholog of a thiol peroxidase-coding gene (tpx) is identified in all strains.
Glutaredoxins share many functions of thioredoxins but are reduced by glutathione (L-γ-glutamyl-L-cysteinylglycine, GSH) rather than by a specific reductase. This means that glutaredoxins are oxidized by their corresponding substrates, and reduced non-enzymatically by GSH [74]. Oxidized glutathione (GSSG) is then regenerated by glutathione reductase. Together, these components comprise the glutathione system [75]. GSH is a well-characterized antioxidant in eukaryotes and Gram-negative bacteria, where it is synthesized by the sequential action of two enzymes, γ-glutamylcysteine synthetase (γ-GCS) and glutathione synthetase (GS). Among Gram-positive bacteria only a few species contain GSH. It has been reported that streptococci lack the moderate-to-high levels of intracellular glutathione normally found in Gram-negative bacteria [76]. Using Streptococcus agalactiae as a model, it has been discovered that in GSH-containing Gram-positive bacteria GSH synthesis is catalyzed by one bifunctional protein, γ-glutamylcysteine synthetase-glutathione synthetase (γ-GCS-GS), encoded by one gene, gshAB. Homologs of γ-GCS-GS have been identified in the genomes of 19 mostly studied Gram-positive bacteria, including S. mutans [77]. All components of the glutathione system were identified in all the 10 strains of this study. Several S. mutans strains, namely UA159, 5DC8, KK21, KK23, ATCC 25175, and NCTC 11060, as well as S. ratti DSM 20564, possess two glutathione reductase orthologs (SMU.140 and SMU.838). This could possibly convey these strains certain advantages in the re-generation of GSH from GSSG, which in turn would be helpful for oxidative resistance.

Variability and specificity in metabolic pathways and network
In order to reveal the metabolic variability of the mutans streptococci systematically, we have reconstructed and analyzed the genome-scale metabolic networks of all the strains sequenced with the method proposed by Ma and Zeng [79] and an updated database [79]. All annotated protein sequences having EC numbers are considered for the network reconstruction. From the functional annotation discussed above, total EC numbers identified in the 10 strains are very close to each other, as shown in Table 7. A summary of the total numbers of the reactions and metabolites in each of the reconstructed metabolic networks is shown in Table 7, and all the constructed metabolic networks are provided in Additional file 6 in *. cys format which can be opened with Cytoscape [80], a software for visualization and analysis of biological networks. The sizes of the constructed metabolic networks of the eight S. mutans strains are very close to each other, with UA159, NN2025, AC4446, 5DC8 and KK21 having almost exactly the same size, and the networks of KK23, ATCC 25175 and NCTC 11060 being merely about 2% larger. While the size of the metabolic network of S. ratti DSM 20564 is comparable to those of the S. mutans strains, the metabolic network of S. sobrinus with 833 reactions and 853 metabolites is the smallest one, which have 62 less reactions and 60 less metabolites compared to the largest one of S. mutans NCTC 11060 (895 reactions and 913 metabolites). Despite the comparable network sizes, however, all the strains possess or lack certain reactions/metabolites, as revealed by detailed comparative analyses. Using the metabolic network of S. mutans UA159 as reference, the presence and absence of reactions in each of the strains/ species compared are discovered and mapped into subpathways based on the KEGG pathway classification (http://www.genome.jp/kegg/pathway.html). As a result, among the 416 sub-pathways defined in the KEGG pathway database 46 sub-pathways demonstrated certain variations between the strains/species, as summarized in Additional file 7.
A key feature of the oral environment is that the nutrients available to the oral bacteria are always fluctuating between abundance and famine associated with human diet. Thus, the ability to quickly acquire and metabolize carbohydrates to produce energy and precursors for biosynthesis is essential for the survival of all oral bacteria. Due to their key roles in carbohydrates metabolism and energy production, glycolysis/gluconeogenesis, TCA cycle and pyruvate metabolism pathways are generally considered to be highly conserved among these oral bacteria. Although mutans streptococci strains/species are closely related species as revealed by phylogenetic tree analysis in this study (Figure 1), differences in the central carbon metabolic pathways are found as shown in Figure 6.
Facultative anaerobes such as lactic acid bacteria including Streptococcus lack cytochrome oxidases required for energy-linked oxygen metabolism and energy (in the form of ATP) required for survival and growth are generated by substrate level phosphorylation in the glycolysis pathway [69]. L-lactate oxidase (D823_06598) with a similarity of 73% to YP_003064450.1 (accession number) of Lactobacillus plantarum JDM1 and lactate oxidase (D823_06595) with a similarity of 65% to ZP_09448656.1 (accession number) of Lactobacillus mali KCTC 3596, are found to be uniquely present in S. sobrinus DSM 20742. These two enzymes catalyze the reaction of L-Lactate + O 2 = > Pyruvate + H 2 O 2 and/or D-Lactate + O 2 = > Pyruvate + H 2 O 2 . It has been reported that in S. pneumoniae concerted action of lactate oxidase and pyruvate oxidase forms a novel energy-generation pathway by converting lactate acid to acetic acid under aerobic growth conditions [81]. Because there is no pyruvate oxidase identified in S. sobrinus DSM 20742, the function of the lactate oxidases in S. sobrinus DSM 20742 should be different to that of S. pneumoniae. By a close examination we hypothesize that lactate oxidase, together with pyruvate dehydrogenase, phosphate acetyl transferase and acetate kinase, could form a novel energy production pathway to convert lactate acid to acetate and simultaneously produce one additional ATP, as depicted Figure 6. By doing so, the lactate oxidases of S. sobrinus DSM 20742 could also play a role in consuming lactate to regulate pH, which would be an advantage for S. sobrinus DSM 20742 in resistance to acid stress. In addition, this pathway could replenish Acetyl-CoA, an important intermediate for the biosynthesis of fatty acids and amino acids. This is for the first time that such an energy production pathway is proposed in Streptococcus species. Furthermore, lactate oxidase and lactate dehydrogenase could form a local NAD + regeneration system, which would be certainly advantageous to S. sobrinus DSM 20742 under aerobic growth conditions. Moreover, it is known that mutans group streptococci and the mitis group streptococci are competitors, with S. mutans producing mutacins to kill the mitis group streptococci and the mitis group streptococci in turn produce H 2 O 2 to kill mutans group streptococci [16,82]. Favored by possessing the lactate oxidases, S. sobrinus DSM 20742 has the potential ability of producing H 2 O 2 to kill not only competitors (oxygen sensitive S. mutans, oral anaerobes) but also macrophages [83], and defend its ecological niche. The unique presence of lactate oxidases in S. sobrinus DSM 20742 was verified by PCR experiments as shown in Additional file 8. Later, we also found that another S. sobrinus strain AC153 also harbors homologous genes of lactate oxidase, suggesting that lactate oxidase may be conserved and play an important role in S. sobrinus. In the effort to clarify the functionality of lactate Novo energy production pathway formed by lactate oxidase in S.sobrinus DSM20742 Figure 6 Central metabolism pathways of mutans streptococci.  Figure 6 by the blue dotted lines. It has been reported that citrate lyase functions as a key enzyme in initiating the anaerobic utilization of citrate by a number of bacteria, further catabolism of oxaloacetate formed taking place either by decarboxylation or by reduction. In some organisms, oxaloacetate is decarboxylated to pyruvate by oxaloacetate decarboxylase, which is also induced in the presence of citrate. The two enzymatic reactions, which occur sequentially, constitute the 'citrate fermentation pathway' [84]. The absence of citrate lyase and oxaloacetate decarboxylase implies that S. sobrinus DSM 20742 might lacks the ability in anaerobic utilization of citrate as a substrate. The disadvantages of S. sobrinus DSM 20742 in citrate utilization could be offset by the novel energy production pathway from lactate to acetate proposed above.
A putative pyruvate-phosphate dikinase (EC 2.7.9.1), which catalyzes the interconversion between PEP and pyruvate, is found to be uniquely present in S. ratti DSM 20564. Pyruvate-phosphate dikinase has been found in propionic acid bacteria [85]. The large difference in the standard free energy of hydrolysis for ATP to AMP and pyrophosphate (−7.6 kcal/mole) and for PEP to pyruvate (−13.6 kcal/mole) at pH 7.0 indicates that the equilibrium for the reaction it catalyzes would strongly favor pyruvate formation. But studies in Acetobacter xylinum clearly indicate that the function of this enzyme under physiological conditions favors the process of gluconeogenesis [86]. Metabolite interconversion at the PEP-pyruvate-oxaloacetate node involves a structurally entangled set of reactions that interconnect the major pathways of carbon metabolism and thus, is responsible for the distribution of the carbon flux among catabolism, anabolism and energy supply of the cell [87]. Under glycolytic conditions oxaloacetate is generated by carboxylation of PEP and/or pyruvate catalyzed by PEP carboxylase (PEPCx) and/or pyruvate carboxylase (PCx). In this study PCx is not found in any of the mutans streptococci strains.
All the 10 strains of this study possess similarly an incomplete TCA cycle and the primary role of the existing TCA enzymes is most likely the synthesis of amino acid precursors as has been reported previously [8,88].

Conclusion
In the present study, the genomes of 8 mutans streptococci strains, including six S. mutans strains, one S. ratti strain and one S. sobrinus strain were sequenced, annotated and compared together with S. mutans UA159 and NN2025. Multiple genome alignment showed extensive genome rearrangement among the eight strains of S. mutans. The core-genome size of S. mutans was determined to be around 1,370 genes by including 67 S. mutans genomes. A possibly open pan-genome of S. mutans was inferred.
Systematic comparative analyses were focused on competence regulation, bacteriocin (mutacin) production, antibiotic resistance, oxidative stress resistance, as well as central carbon metabolism and energy production pathways. Most of these systems show remarkable differences between the strains, except for oxidative stress resistance systems which are found to be well conserved. CSP-dependent and independent competence regulation systems are highly diverse in mutans streptococci: no comC-like genes could be identified in S. ratti and S. sobrinus; putative ComC amino acid sequences of S. mutans show clear variations; ComS and ComR are absent in S. sobrinus which well explains the fact that we were not able obtain genetic competence state of S. sobrinus by experiment, even though the ComX and the downstream competence development genes are well reserved; furthermore, the response regulators of the HdrMR and BsrRM systems, which are known to be involved in competence development, are missing in both S. ratti and S. sobrinus.
Variation in mutacin-encoding genes is accompanied with the conservation of mutacin immunity proteins, which indicates apparently important roles of the mutacin immunity proteins for the survival of these mutans streptococci in a bacteriocin rich environment. The presence of various antibiotic resistance factors, together with the open pan-genome inferred, implies that attention should be paid to the potential of mutans streptococci in the development of antibiotic resistance.
The sizes of the genome-scale metabolic networks of the 10 strains are very close to each other. Comparative analysis of sub-pathways reveals that 46 sub-pathways of all 416 sub-pathways defined in KEGG pathway database show variation using S. mutans UA159 as reference. By identifying lactate oxidases to be uniquely present in S. sobrinus DSM 20742, we proposed for the first time a novel energy production pathway in S. sobrinus. Additional functions of the lactate oxidases in connection with the proposed energy production pathway are also discussed.
In conclusion, the genomes of mutans streptococci display remarkable differences, especially among different species. We believe that the strain-specific information provided in this study should be helpful to understand the evolution and adaptive mechanisms of those oral pathogens.

Genome sequences and strains
All the newly sequenced strains were described previously [26]. Briefly, serotype c strain S. mutans 5DC8 was isolated from root caries by David Beighton (London, UK); serotype c strain S. mutans AC4446 was isolated from a proven case of infective endocarditis in Dillingen (Germany), serotype c strain S. mutans KK21 was isolated from enamel caries of an adult by Susanne Kneist (Jena, Germany), serotype c strain S. mutans KK23 was isolated from enamel caries of a child by Susanne Kneist (Jena, Germany), Serotype c, type strain S. mutans ATCC 25175 was isolated from carious dentine, serotype f strain S. mutans NCTC 11060 was isolated in Denmark from a patient's blood, serotype b strain S. ratti DSM 20564(=ATCC 19645) was isolated from caries lesion in rat, and finally, serotype non-d & non-g strain S. sobrinus DSM 20742 (= ATCC 33478) was isolated from human dental plaque. Serotype c is overrepresented because 70-80% of all S. mutans isolates are of this serotype. However, non-c serotypes seem to be associated with cardiovascular diseases and this is represented in our study by the serotype f strain. Besides S. mutans, S. sobrinus is considered as a relevant cariogenic species in human. The genome sequences of S. mutans UA159 and S. mutans NN2025 were genome sequenced previously and obtained from NCBI genome database (http://www.ncbi.nlm.nih.gov/genome/, consulted at the beginning of January 2011).

Genome sequencing, assembly and annotation
All of the strains were sequenced using the Solexa sequencing platform at the Helmholtz Center for Infection Research in Braunschweig, Germany. The "high-quality draft" [89] genome sequences of these mutans streptococci strains were assembled by a combined use of the sequence assembly tools SOAPdenovo [90], Maq [91] and Phrap [92]. In brief, we first use SOAPdenovo to obtain the optimal assembly result by using different k-mer from 17 to 41 without scaffolding. Then we map all reads to reference sequence of S. mutans UA159 using Maq and break down the low quality area to obtain a collection of long contigs. Finally, the long contigs were used to close partial gaps of the initial assembly to improve the assembly quality using Phrap. The first version genome annotations were performed using mauve [13,93], tRNAscan-SE 1.21, Glimmer3.02 [94] and Blast2GO [95], and then released through our central genome database (http://biosystem. bt1.tu-harburg.de/) established with PathwayTools [26,96]. This version was used before for the study of TCSTSs of the 10 strains [26]. During this study, all genomes were re-annotated using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP, http://www. ncbi.nlm.nih.gov/genomes/static/Pipeline.html) and the whole-genome shotgun sequences have been deposited at DDBJ/EMBL/GenBank under the accessions of AOBX00000000 (S. mutans 5DC8), AOBY00000000 (S. mutans KK21), AOBZ00000000 (S. mutans KK23), AOCA00000000 (S. mutans AC4446), AOCB00000000 (S. mutans ATCC 25175), AOCC00000000 (S. mutans NCTC 11060), AOCD00000000 (S. ratti DSM 20564) and AOCE00000000 (S. sobrinus DSM 20742). The present study used the new version deposited at DDBJ/ EMBL/GenBank. As we found out that in the annotated results from PGAAP some coding genes are missing, we did manual curation based on blast searches using known coding nucleotide sequences, the location of the missing coding sequences are given in Additional file 9.

Genome alignment
Multiple genome alignments were computed by using the progressive Mauve algorithm of the Mauve software [13] with default options.
Data pre-processing for the core and pan-genome analysis were performed using a self-written perl script (Additional file 10), which is similar as described previously by Tettelin et al. [20]. Briefly, an iterative procedure was carried out to estimate total genes/core genes to be discovered per additional genome sequenced. The number of total genes/core genes provided by each added new genome depends on the selection of previously added genomes. All possible combinations of genomes from 1 to M (the maximal number of available genomes) were calculated. In the case more than 1000 combinations are possible, only 1000 random combinations were used. In order to take into consideration of core genes that are possibly missed during genome sequencing and assembly, for the calculation of core-genome size, an additional correction step was introduced, in which any one gene that is only absent in one of the 63 draft genomes was still regarded as core gene. During the fitting step of the core genome model, the inputted genome numbers were used as fitting weight for corresponding data point.

Gene content-based comparative analysis of 10 mutans streptococci strains
In this work, if not otherwise specified, the uniqueness of genes from "organism A" is defined according to the ortholog groups constructed by using the OrthoMCL program [25]. If the ortholog of a gene from organism A is absent in "organism B", we define that this gene is unique or specific to organism A in comparison to organism B. This does not imply that there is no homolog (namely paralog) of the gene from organism A in organism B. In some cases, this gene is just an additional copy of another gene whose alleles/orthologs are found in both organisms. This does further not imply that this gene is found in organism A only. For example, the ortholog of this gene may be found in organism C from the relationship table or another strain or species that is not compared in this work.

Genome-scale metabolic networks construction
The bipartite metabolic networks were constructed based on the connection matrix of updated KEGG reactions database according to Stelzer and Zeng [97] with addition of newly identified reaction catalyzed by lactate oxidase (Lactate + O 2 = > Pyruvate + H 2 O 2 ) with provisional R numbers of R10001 (C00186 + C00007 = > C00022 + C00027) and R10002 (C00256 + C00007 = > C00022 + C00027). Compared to the reaction graph or the metabolite graph, wherein either reactions or metabolites (called "nodes") are shown in an interconnected way, the bipartite network is more comprehensible because, similar to the biochemistry textbook, both the reactions and metabolites are visualized at mean time. Seventy-six non-enzymatic automatic reactions were also considered for the network construction. The construction of sub-networks was based on the KEGG pathway classification (http://www.genome. jp/kegg/pathway.html) with slight modification of addition of reaction catalyzed by lactate oxidase into Glycolysis/ Gluconeogenesis pathway (MAP00010) and Pyruvate metabolism pathway (MAP00620). The software Cytoscape [80] was used for the visualization and comparative analysis of the genome-scale metabolic networks.

PCR verification
To verify the unique presence of the lactate oxidase (consecutive) coding genes D823_06595 and D823_06598 respectively and to exclude the possibility of contamination with e. g. human DNA during the process of genome sequencing, PCR amplification (using one primer pair covering both genes) with newly isolated DNA from S. sobrinus DSM 20742 as well as a second S. sobrinus strain (AC153) and from strains S. mutans UA159 as well as S. ratti DSM 20564 (as negative controls) was performed. The primers used were: 5′-GAGCAGGATAATTGACAGTC -3′ (forward primer) and 5′-ACTCAGTGACGAATCAGTT -3′ (reverse primer), which were designed by using Primer Premier http://www.premierbiosoft.com/primerdesign/index. html) and Vector NTI 9.0 (InforMax), respectively. Conditions for this conventional PCR were: 94°C, 2 min; followed by 32 cycles of 94°C for 30s; annealing temperature 48°C for 30s; and 72°C for 90s; final extension at 72°C for 5 min; length of amplicon 1,175 bp.

Constructs for lactate oxidase deletion mutants and transformation of S. sobrinus DSM 20742
To clarify the functionality of the two lactate oxidases, namely D823_06598 (Llod) and D823_06595 (lod), PCR ligation mutagenesis according to the method of [98] was used to separately replace the two genes encoding the two enzymes by an erythromycin resistance cassette via double homologous recombination. Primers P1Llod (TTACCGTTATCCGCGAATTAT) and P2Llod (GGCG CGCCAACCACCCAAGGTTGAATC), P1lod (GGCTG GTTTCCTCCATGATA) and P2lod (GGCGCGCCCCA AAACCACCTTGAGGAAT) were used to amplify the 5′flanking regions of both genes, respectively, introducing an AscI restriction site. To amplify the 3′flanking regions of both genes, the primers P3Llod (GGCCGGCCGGGA GCTCAAGGTGTTCAAA) and P4Llod (CAAATTGTTC AAAGCGGGAAC), P3lod (GGCCGGCCGGCAGCAGC CGGTAGTATT) and P4lod (GGGTGCCAACTTATGTC ACGA) were used, thereby introducing restriction site for FseI. The erythromycin resistance cassette was amplified from previously constructed gene deletion mutant [99]