Comparative genomics of the dairy isolate Streptococcus macedonicus ACA-DC 198 against related members of the Streptococcus bovis/Streptococcus equinus complex

Background Within the genus Streptococcus, only Streptococcus thermophilus is used as a starter culture in food fermentations. Streptococcus macedonicus though, which belongs to the Streptococcus bovis/Streptococcus equinus complex (SBSEC), is also frequently isolated from fermented foods mainly of dairy origin. Members of the SBSEC have been implicated in human endocarditis and colon cancer. Here we compare the genome sequence of the dairy isolate S. macedonicus ACA-DC 198 to the other SBSEC genomes in order to assess in silico its potential adaptation to milk and its pathogenicity status. Results Despite the fact that the SBSEC species were found tightly related based on whole genome phylogeny of streptococci, two distinct patterns of evolution were identified among them. Streptococcus macedonicus, Streptococcus infantarius CJ18 and Streptococcus pasteurianus ATCC 43144 seem to have undergone reductive evolution resulting in significantly diminished genome sizes and increased percentages of potential pseudogenes when compared to Streptococcus gallolyticus subsp. gallolyticus. In addition, the three species seem to have lost genes for catabolizing complex plant carbohydrates and for detoxifying toxic substances previously linked to the ability of S. gallolyticus to survive in the rumen. Analysis of the S. macedonicus genome revealed features that could support adaptation to milk, including an extra gene cluster for lactose and galactose metabolism, a proteolytic system for casein hydrolysis, auxotrophy for several vitamins, an increased ability to resist bacteriophages and horizontal gene transfer events with the dairy Lactococcus lactis and S. thermophilus as potential donors. In addition, S. macedonicus lacks several pathogenicity-related genes found in S. gallolyticus. For example, S. macedonicus has retained only one (i.e. the pil3) of the three pilus gene clusters which may mediate the binding of S. gallolyticus to the extracellular matrix. Unexpectedly, similar findings were obtained not only for the dairy S. infantarius CJ18, but also for the blood isolate S. pasteurianus ATCC 43144. Conclusions Our whole genome analyses suggest traits of adaptation of S. macedonicus to the nutrient-rich dairy environment. During this process the bacterium gained genes presumably important for this new ecological niche. Finally, S. macedonicus carries a reduced number of putative SBSEC virulence factors, which suggests a diminished pathogenic potential.


(Continued from previous page)
Conclusions: Our whole genome analyses suggest traits of adaptation of S. macedonicus to the nutrient-rich dairy environment. During this process the bacterium gained genes presumably important for this new ecological niche. Finally, S. macedonicus carries a reduced number of putative SBSEC virulence factors, which suggests a diminished pathogenic potential.
Keywords: Streptococcus, Genome, Adaptation, Gene decay, Pseudogene, Horizontal gene transfer, Pathogenicity, Virulence factor, Milk, Niche Background Lactic acid bacteria (LAB) constitute a very important group of microorganisms for the food industry, as well as the health of humans and animals [1,2]. Several species in this group have a long history of safe use in fermented foods and thus belong to the very few bacteria that may qualify for the "generally regarded as safe" (GRAS) or the "qualified presumption of safety" (QPS) status according to FDA and EFSA, respectively [3]. Other LAB species are commensals of the skin, the oral cavity, the respiratory system, the gastrointestinal tract (GIT) and the genitals of mammals or other organisms. Furthermore, the presence of specific LAB strains, called "probiotics", in certain niches of the body is considered to promote the health of the host [2]. This benign nature of LAB, as well as their economic value, often obscure the existence of notorious LAB pathogens that are among the leading causes of human morbidity and mortality worldwide [4].
This oxymoron about the vast differences in the pathogenic potential within the LAB group is probably best exemplified by streptococci. The genus basically consists of commensals that include several severe pathogens, like group A streptococci (GAS), group B streptococci (GBS) and Streptococcus pneumoniae [5]. Streptococcal pathogens are implicated in a plethora of diseases, ranging from mild (e.g. pharyngitis) to invasive and life-threatening (e.g. necrotizing fasciitis) infections [6]. In contrast, Streptococcus thermophilus is one of the most frequent starter LAB consumed by humans in yogurt and cheese [7]. It is believed that this is the only streptococcal species that, during its adaptation to the nutrient-rich milk environment, underwent extensive genome decay, resulting in the loss of pathogenicity-related genes present in members of the genus [7,8].
Apart from S. thermophilus, other streptococci can grow in milk and milk products. Such streptococci mainly belong to the Streptococcus bovis/Streptococcus equinus complex (SBSEC) [9]. The exact route that would explain their presence in milk is yet unidentified. In theory, since some of them can naturally occur in the GIT or on the teat skin of lactating animals, they could be passively transmitted to raw milk. In addition, species of the SBSEC are known to be involved in human cases of endocarditis, meningitis, bacteremia and colon cancer [10][11][12].
However, Streptococcus macedonicus, which is a member of this specific complex, has been suggested to be adapted to milk and it has been hypothesized that it could be non pathogenic. These assumptions were based on the fact that the primary ecological niche of S. macedonicus appears to be naturally fermented foods, mostly of dairy origin similarly to S. thermophilus [13]. Initial in vitro and in vivo evaluation did not support virulence of S. macedonicus ACA-DC 198 [14]. PCR and Southern blotting analyses indicated the absence of several Streptococcus pyogenes pathogenicity genes. In addition, oral administration of the organism at high dosages (8.9 log cfu daily) for an extended period of time (12 weeks) to mice did not result in any observable adverse effects including inflammation in the stomach or translocation from the GIT to the organs of the animals [14]. Moreover, strains of S. macedonicus have been shown to present important technological properties of industrial cultures like the production of texturizing exopolysaccharides and anti-clostridial bacteriocins [13].
Streptococcus macedonicus was originally isolated from traditional Greek Kasseri cheese [15] and it is phylogenetically related to Streptococcus gallolyticus subsp. gallolyticus and Streptococcus pasteurianus (formerly known as S. bovis biotypes I and II.2, respectively), as well as to Streptococcus infantarius (formerly known as S. bovis biotype II.1). The inclusion of S. macedonicus and S. pasteurianus as subspecies of S. gallolyticus subsp. gallolyticus (from this point on S. gallolyticus) has been previously suggested [16], but this taxonomic reappraisal has not been formally accepted so far [17]. Streptococcus gallolyticus and S. pasteurianus are considered pathogenic. Preliminary investigations concerning the mechanisms by which S. gallolyticus causes endocarditis indicated that S. macedonicus may lack at least some of the pathogenic determinants implicated in this disease [18,19]. Furthermore, the recent study of the genome of S. infantarius subsp. infantarius CJ18 (from this point on S. infantarius) isolated from spontaneously fermented camel milk in Africa has indicated strain-dependent traits of adaptation to the dairy environment despite the fact that the species is considered as a putative pathogen [20]. Overall, the presence in fermented foods of SBSEC species with a currently unresolved pathogenicity status, such as S. macedonicus and S. infantarius, may represent an underestimated cause of concern in terms of food safety and public health, which needs to be addressed.
Here we present the first complete genome sequence of S. macedonicus in order to shed light on the biology of the species. We are particularly interested in assessing niche adaptation and in investigating the pathogenic potential of the strain analyzed based on comparative genomics against other complete genomes within the SBSEC. This is an important step to rationally deduce whether the bacterium is safe to be used as a starter or if extra technological measures are needed to avoid its presence in food fermentations.

Results and discussion
General features of Streptococcus macedonicus ACA-DC 198 genome The circular chromosome of S. macedonicus ACA-DC 198 consists of 2,130,034 bp ( Figure 1) with a G + C content of 37.6%, which is among the lowest values within the available complete streptococcal genomes (39.3% ± 1.7%, n = 95 by May 2013). A total of 2,192 protein coding DNA sequences (CDSs) were annotated, covering 87.3% of the S. macedonicus chromosome. Of these, 192 were identified as putative pseudogenes according to GenePRIMP [21] analysis followed by manual curation. The bacterium also carries 18 rRNA genes organized in 5 clusters co-localized with most of the 70 tRNA genes. The S. macedonicus genome was found to be 220-232 kb smaller and only 30 kb larger than the genomes of S. gallolyticus and S. pasteurianus, respectively. Streptococcus infantarius has one of the smallest genome sizes within the SBSEC reported up to now (i.e. 141 kb smaller than that of S. macedonicus). The percentage of potential pseudogenes in S. macedonicus was 8.7%, in S. pasteurianus 7.7% and in S. infantarius 4.9%. In contrast, the percentage of pseudogenes in at least two S. gallolyticus strains (i.e. strains UCN34 and ATCC 43143) has been found to be 2.1% or less. This analysis is in accordance with previous findings [9,22]. Based on the close phylogenetic relationship among the four species, these observations suggest that the genome of S. macedonicus, as well as those of S. pasteurianus and S. infantarius may be evolving under selective pressures that allow gene loss events and genome decay processes when compared to the S. gallolyticus genomes.
Whole genome phylogeny, comparative genomics, and core genome analysis A phylogenetic tree based on the currently available complete streptococcal genome sequences was constructed using the EDGAR software [23]. On this tree, S. gallolyticus, S. macedonicus, S. pasteurianus, as well as S. infantarius Figure 1 The circular map of the genome of Streptococcus macedonicus ACA-DC 198. Genomic features appearing from the periphery to the centre of the map: 1. Forward CDSs (red); 2. Reverse CDSs (blue); 3. Putative pseudogenes (cyan); 4. rRNA genes (orange); 5. tRNA genes (green); 6.% GC plot; 7. GC skew. formed a single, monophyletic branch, providing strong evidence for the taxonomic integrity of the SBSEC (Additional file 1: Figure S1).
Subsequently, full chromosome alignments were performed using progressiveMAUVE [24]. The analysis revealed a mosaic pattern of homology organized in local collinear blocks (LCBs) among S. gallolyticus, S. macedonicus and S. pasteurianus (Figure 2A). Evidently, a significant portion of the genetic information has been overall conserved, as the majority of the LCBs are shared by all species. In addition, chromosomal rearrangements seem to have been rather minimal, as the number of LCBs showing a change in relative genomic position among the strains was low and their length short. Nevertheless, numerous differences were also detected. Some LCBs were common only among some of the strains, while some regions were identified as strain-specific (and hence not included within an LCB). The presence of such strain-specific regions suggests that, in addition to gene loss mentioned earlier, gene acquisition events mediated by horizontal gene transfer (HGT) may have played a role during the evolution of the three species (see below). Interestingly, the inclusion of the S. infantarius genome in the MAUVE analysis resulted in an increased number of LCBs with a decreased average length. As the level of sequence conservation of individual LCBs among the four species remains relatively high, this observation suggests that specific genome structure reorganization events occurred specifically in S. infantarius ( Figure 2B). Analysis with the EDGAR software revealed a core genome of only 1,372 orthologous genes based on the sequence and the current annotation of S. gallolyticus, S. pasteurianus and S. macedonicus ( Figure 3A, Additional file 2: Table S1) [23]. Once more, inclusion of S. infantarius increased the diversity, resulting in reduction of the core genome by more than 100 genes among the four species ( Figure 3B, Additional file 3: Table S2). The significant percentage of variable genes within the four SBSEC species may underpin their adaptation to specific environments.

Genes involved in the survival in the GIT
It has been established that S. gallolyticus displays the notable ability to accumulate and metabolize a broad range of complex carbohydrates from plants when compared to other streptococci [25]. The necessity for this repertoire of carbohydrate-degrading activities has been considered to reflect the adaptation of S. gallolyticus to the rumen of herbivores [22,25]. Preliminary analysis indicated that at least some of the relevant genes are either entirely absent or they have been converted into pseudogenes in the genomes of S. macedonicus, S. pasteurianus and S. infantarius ( Table 1). The presence of pseudogenes related to carbohydrate metabolism reinforces the notion that S. macedonicus, S. pasteurianus and S. infantarius have undergone genome decay processes during adaptation to their ecological niches. The entire glycobiome of the SBSEC members was further analyzed based on the data available in the CAZy database (Additional file 4: Table S3) [26]. Important differences in the distribution of enzymes among the SBSEC members were observed for all CAZy categories including glycoside hydrolases (GHs), glycosyl tranferases, polysaccharide lyases, carbohydrate esterases and carbohydrate-binding modules (CBMs). Streptococcus macedonicus and Streptococcus infantarius had the smallest glycobiome within the SBSEC. The two strains had only 24 and 23 GHs, while the rest SBSEC members had more than 40. Given that most of these GHs are potentially involved in plant and dietary carbohydrate catabolism (e.g. GH1, GH3, GH13, GH36 etc.) [27], it could be hypothesized that S. macedonicus and S. infantarius have a diminished necessity for such enzymes in their ecological niche. Streptococcus pasteurianus had the highest number of GHs, some of which were unique among SBSEC (i.e. GH35, GH78, GH79, GH85, GH92, GH125). This observation indicates differences in the range of carbohydrates the strain is able to catabolize in comparison to the other members of the complex. Interestingly, none of the SBSEC members were found to carry GHs that are implicated in the degradation of host derived oligosaccharides (e.g. GH33 and GH98) [27]. In contrast, Streptococcus gallolyticus strains, S. macedonicus and S. infantarius had hits in the CBM41 family, while S. pasteurianus in the CBM32 family, both of which have been associated with the recognition of host glycans [27,28].
Furthermore, S. gallolyticus can detoxify toxic compounds met in the rumen and other environments. Again, S. macedonicus, S. pasteurianus and S. infantarius miss some of the genes involved in detoxification ( Table 1). None of them carry genes for tannin hydrolysis similar to GALLO_0933 or GALLO_1609. The potential to degrade additional phenolic compounds like gallic acid seems to be comparable between S. gallolyticus and S. pasteurianus. In contrast, S. infantarius has no orthologs of either PadC (GALLO_2106) or GALLO_0906, i.e. the two gallic acid decarboxylases found in S. gallolyticus UCN34, while S. macedonicus has retained only PadC. Furthermore, the bsh gene (GALLO_0818), coding for a bile salt hydrolase, is present in all four species with the exception of S. macedonicus, in which it appears as a pseudogene. Thus, our findings clearly suggest that not only S. macedonicus, but also S. pasteurianus and S. infantarius have deviated from S. gallolyticus in their potential to cope with the harsh environment of the GIT of herbivores.
Genes involved in the growth in milk or dairy products Dairy LAB are considered fastidious microorganisms due to their adaptation to growth in milk that is particularly nutritious by nature. Lactose and milk proteins (both caseins and whey proteins) are characteristic of the dairy environment. LAB are able to ferment lactose to lactic acid and they have evolved a proteolytic system for the degradation of milk proteins down to amino acids [1,29].
All SBSEC species are able to utilize lactose and to catabolize galactose. Sequence similarity searches revealed a gene cluster (SMA_0197 -SMA_0211) dedicated to lactose metabolism with a unique organization in SBSEC when compared to those previously reported for other LAB ( Table 2). The typical sequence of lac genes is interrupted in the majority of SBSEC strains by genes coding for the IIA, IIB and IIC components of a PEP-PTS (SMA_0202 -SMA _0204). Annotation of this PEP-PTS varies among the SBSEC species/strains and for this reason functional analysis is required to properly determine its exact function. In contrast to other SBSEC species, these three PTS genes are absent from S. infantarius. The lactose-specific PTS found at the end of the lac gene cluster (SMA_0206 -SMA _0210) is also inactivated in S. infantarius through disruption of the lacT antiterminator gene by transposases [20]. Interestingly, the lac gene cluster in S. macedonicus contains two 6-phospho-beta-galactosidase (lacG) genes that may be indicative of adaptation of this particular species to milk. Galactose can also be catabolized through the Leloir pathway and a galRKTE operon coding for the relevant enzymes was previously determined in S.
A partial gal-lac operon galT(truncated)/galE1M/lacSZ with high sequence identity to S. thermophilus is also present in the genome of S. infantarius [30]. It has been demonstrated that the lactose and galactose permease (lacS) and the β-galactosidase (lacZ) are responsible for the uptake and initial hydrolysis of lactose in S. infantarius in a manner similar to that employed by S. thermophilus [20]. This gal-lac operon of S. infantarius is missing from the other SBSEC strains as a whole. A LacZ ortholog (SGPB_0344) is only present in S. pasteurianus and dispersed galE and galM genes can be found in the S. gallolyticus and S. pasteurianus genomes. Similarly to the presence of the extra gal-lac operon in S. infantarius, we detected a second lac gene cluster in S. macedonicus (SMA_1156 -SMA_1165), also suggesting adaptation to the milk environment. This second gene cluster is solely present in S. macedonicus and not in any other SBSEC member. Surprisingly, an additional lacTFEG region coding for a complete lactose PEP-PTS and a 6-phospho-beta-galactosidase is present in the genomes of S. gallolyticus and S. pasteurianus. This is an unexpected finding since S. gallolyticus and S. pasteurianus have hardly ever been related to milk up to now [9].
We then investigated the proteolytic system of S. macedonicus and the rest of the SBSEC members adapting the Lactose and galactose permease lacS  scheme previously described by Liu and co-workers (i.e. excluding housekeeping proteases or proteases involved in specific cellular processes other than the acquisition of amino acids) [29]. In milk, casein utilization by LAB is initiated after hydrolysis by a cell-envelope associated proteinase (CEP) releasing oligopeptides. The oligopeptides are then transferred intracellularly via specialized peptide transport systems where they are systematically degraded into amino acids by an array of intracellular peptidases. The four species have essentially the same proteolytic system, albeit showing some differences ( Table 3). None of them has a typical PrtP CEP, but S. gallolyticus and S. infantarius carry a lactocepin coding gene. The lactocepin of the SBSEC shows ≥ 63% sequence similarity to the PrtS CEP involved in the degradation of milk proteins in S. thermophilus [31,32]. The exact role of lactocepin in SBSEC species needs to be experimentally examined. SBSEC strains like S. macedonicus may require CEP activity to be provided by other bacteria when growing in milk. This is a common strategy of nonstarter LAB that rely on starter CEP-producing strains for casein hydrolysis [33].
Streptococcus infantarius carries two oligopeptide transport systems (Opp) [20], but all the other SBSEC species have only one such system. All SBSEC strains own a proton motive force (PMF)-driven DtpT transporter for the transport of di-and tri-peptides and they all possess an entire repertoire of proteolytic enzymes including endopeptidases, general aminopeptidases and specialized peptidases (Table 3). They only lack enzymes of the PepE/PepG (endopeptidases) and the PepI/PepR/PepL (proline peptidases) superfamilies in accordance to previous observations for streptococci and lactococci [29]. The conservation of this proteolytic system among streptococci in the SBSEC despite their presumed adaptation to different ecological niches [20,22,25] indicates that it may somehow be essential. Furthermore, S. macedonicus and the other SBSEC members are autotrophs for several amino acids (data not shown) and only S. pasteurianus has been reported to be unable to synthesize tryptophan [22]. Thus, the preservation of an entire proteolytic system by SBSEC members while retaining the ability to synthesize most, if not all, amino acids is puzzling, especially when considering that some of them have obviously undergone extensive genome decay processes. It could be hypothesized that this property of SBSEC species may provide a competitive advantage in poor environments, but this needs to be further investigated. Apart from amino acids, S. gallolyticus UCN34 also carries complete pathways for the synthesis of a number of vitamins including riboflavin, nicotine amide, pantothenate, pyridoxine, and folic acid, while the biosynthetic pathways for biotin and thiamine are partial [25]. The genes potentially involved in the de novo biosynthesis of pyridoxine in the SBSEC strains were determined based on the respective pathway of S. pneumoniae D39 [34]. The corresponding loci are conserved among S. gallolyticus strains but once more S. macedonicus, S. pasteurianus and S. infantarius appear to have undergone a heterogeneous gene loss process, indicating the necessity for exogenous supply of some of these vitamins (Table 4). For example, S. macedonicus misses the bioBDY, panBCD and ribDEAH loci involved in the biosynthesis of biotin, pantothenate and riboflavin, respectively. In addition, the presence of pseudogenes or truncated/split genes may have disrupted the biosynthesis of pyridoxine, nicotine amide and thiamine through the routes analyzed here. It is not uncommon for LAB to be auxotrophic for several vitamins [35], though milk and other dairy products may contain all essential vitamins to sustain the growth of these microorgansims.

Genomic islands (GIs) and unique genes of Streptococcus macedonicus
GIs are sites of HGT that can uncover important features of the plasticity of a bacterial genome and they are primarily linked to gene gain processes. We used the IslandViewer application [36] to identify GIs of the SBSEC members in parallel. Streptococcus macedonicus had 14 predicted GIs with an average length of 18,109 bp corresponding to a total sequence of 253,523 bp or 11.9% the size of the bacterium's genome (Additional file 5: Figure S2). This percentage of externally acquired DNA is higher compared to the other SBSEC members, in which it ranged from 8.8% in S. gallolyticus ATCC BAA-2069 down to 5.9% in S. gallolyticus UCN34.
As could be expected, the highest degree of sequence conservation among GIs was observed in the S. gallolyticus strains (e.g. S. gallolyticus UCN34 GIs 2, 6, 7, 8 and 9). When different SBSEC species were compared, a number of GIs were only partially conserved (e.g. S. gallolyticus UCN34 GIs 1, 3, 6, 7, 8 and 9). Unique GIs were also present in most genomes analyzed (e.g. S. pasteurianus GIs 2, 4, 6 and 8). Partially conserved GIs may be remnants of GIs acquired before speciation events in the SBSEC and their subsequent gene decay may be the result of adaptation to diverged ecological niches. The existence of unique GIs among the SBSEC species, whose acquisition must have been more recent (i.e. most probably after speciation), also points to the same direction. Furthermore, our analysis suggests that S. macedonicus shares stretches of GI sequences exclusively with S. infantarius among the SBSEC members (e.g. in S. macedonicus GIs 1,4,5,6,7,8 and 14) in accordance with previous findings [20]. Potential donors of GI sequences were identified from best BLASTN hits showing sequence identity > 90%. In several instances sequence segments within S. macedonicus GIs may have derived from more than one donor (Additional file 6: Figure S3). Potential donors of the S. macedonicus GIs were Streptococcus agalactiae, Streptococcus intermedius, Oligoendopeptidase Streptococcus suis, Streptococcus uberis, Enterococcus faecium, Lactococcus garvieae and Pediococcus pentosaceus. Most importantly, Lactococcus lactis or S. thermophilus were found among these donors in 9 out of 14 S. macedonicus GIs and the same applies for S. infantarius in 6 out of 12 GIs. None of the GI sequences of the other SBSEC members could be linked to L. lactis or S. thermophilus apart from the S. gallolyticus ATCC BAA-2069 GI 6 that exhibited a 96% identity over an approximately 3 kb genomic region of S. thermophilus JIM 8232 (data not shown). These observations constitute additional evidence that S. macedonicus and S. infantarius are the only  members of the complex that have extensively interacted with the dairy L. lactis and S. thermophilus. We then calculated the unique genes (also referred here as singleton genes) of S. macedonicus against the other SBSEC species twice, taking or not into account the genome of S. infantarius. Results from singleton gene analysis using EDGAR [23] were manually curated to relieve the set from the high numbers of transposable elements. There was an important overlap between the list of genes found in GIs of S. macedonicus and the singleton genes (Additional file 7: Table S4 and Additional file 8: Table S5). Again, S. macedonicus and S. infantarius were found to share a number of genes that are absent from the other SBSEC genomes (Additional file 8: Table S5).
According to the aforementioned analysis S. macedonicus carries the complete biosynthetic pathways for two lantibiotic bacteriocins, i.e. the macedocin and the macedovicin peptides [37,38]. The presence of both antimicrobials can provide an additional link between S. macedonicus and the milk environment. Production of macedocin has been observed only in milk up to now and proteolytic fragments of casein may trigger biosynthesis of this peptide [39]. In addition, the entire macedovicin gene cluster is practically identical (99% sequence identity over the entire length of the~9.8 kb cluster) to the respective clusters of thermophilin 1277 and bovicin HJ50 found in the dairy isolates S. thermophilus SBT1277 and S. bovis HJ50, respectively [37]. The locus seems to have spread among the three strains by HGT and their common dairy origin increases the possibility that this exchange of genetic material has taken place in milk [37].
Another evident characteristic of the S. macedonicus genome was the presence of multiple restriction modification (RM) systems among the singleton genes (Additional file 9: Figure S4). Streptococcus macedonicus possesses the highest number of RM systems within the SBSEC and it is the only member of the group with all three types of RM systems. A yet unresolved difference in the number and the type of RM systems between commensal and dairy LAB has been previously observed [40,41]. As mentioned earlier, phages are present in milk and dairy products often in high numbers [42] and traditional practices (e.g. backslopping) may promote the selection of phage resistant strains [40,41]. In S. thermophilus RM systems are considered as important technological traits [8] and it has been previously suggested that genes of the type III RM system may provide a signature for milk adaptation [40]. Streptococcus macedonicus has two type III RM systems, one of which is inactive since it consists of pseudogenes. The increased number of RM systems of S. macedonicus compared to the other SBSEC members suggests that it should be particularly competent in resisting invading DNA. These findings coincide with the fact that S. macedonicus carries the highest number of spacers in its CRISPR (clustered regularly interspaced short palindromic repeats) locus within the SBSEC (Additional file 10: Table S6). Furthermore, BLASTN analysis of the spacers in the S. macedonicus CRISPR revealed that four of them, namely spacers 3, 5, 17 and 18, had hits in S. thermophilus phages (e.g. phages O1205, 7201, Abc2, etc.), S. thermophilus plasmids (e.g. pER36) or S. thermophilus CRISPR spacer sequences (data not shown). In contrast, among the 140 spacers of the different CRISPR found in the other SBSEC species, only one had a hit in L. lactis phage 1706 (spacer 35 in the CRISPR of S. pasteurianus). According to these findings the occurrence of S. macedonicus in the same habitat as that of S. thermophilus can be supported.
In addition, S. macedonicus contains singleton genesseveral copies in some instancescoding for proteins involved in the transport and homeostasis of metal ions (Additional file 7: Table S4 and Additional file 8: Table S5). Some of these genes are also shared by S. infantarius, but not all. These genes may play a role in the transport of copper (e.g. copA and copB), cadmium (e.g. cadA and cadC), manganese (e.g. mntH) and magnesium (e.g. SMA_2044). Copper and cadmium are of no evident biological role for Lactobacillales [43] and thus transport systems for such metals in S. macedonicus should be perceived as a protective mechanism towards their deleterious effects (e.g. through oxidative stress). The presence of metal transport genes has been previously reported in several LAB including L. lactis and S. thermophilus strains [43][44][45][46][47][48]. In our opinion the high number of metal transport associated genes in S. macedonicus was an unexpected observation and further investigation is required regarding their physiological relevance.

Distribution of virulence factors (VFs) within species of the SBSEC
One of the main goals behind the genome sequencing of S. macedonicus was to clarify its pathogenic potential. Unfortunately, despite the well-known association of S. bovis with human disease, especially endocarditis and colon cancer, there is very little knowledge about the pathogenicity mechanisms employed by members of the SBSEC. In Table 5 we have gathered genes previously assigned as potential VFs in SBSEC. The available studies have shed some light on the ability of S. gallolyticus to colonize host tissues, a step that is considered as a prerequisite for the initiation of the infection by this bacterium. Streptococcus gallolyticus UCN34 contains three pilus gene clusters which may mediate binding to the extracellular matrix (ECM), similarly to the clinical isolate TX20005 whose genome is partially characterized [25,49]. The pil1 and pil3 of strain UCN34 have been found identical to the acb-sbs7-srtC1 and sbs15-sbs14- Cell envelope proteinase (lactocepin) Glucan biosynthesis gene cluster Hemicellulose biosynthesis gene cluster Macrophage infectivity potentiator protein srtC3 loci of strain TX20005, respectively, but their additional predicted pilus gene cluster (i.e. pil2 vs. sbs12-sbs11-srtC2) was only distantly related [25]. While all three strains of S. gallolyticus carry the three pilus loci (as found in strain UCN34), S. macedonicus, S. pasteurianus and S. infantarius carry only the pil3 locus. Functional analysis indicated that pil1 is a crucial factor of S. gallolyticus UCN34 for binding to ECM, especially to collagen [18]. The preference of S. gallolyticus to bind to collagen is of particular importance, since it may allow the adherence of the bacterium to the collagen-rich surfaces of damaged heart valves and (pre)cancerous sites [50]. Besides the pilus loci, additional MSCRAMM (microbial surface recognizing adhesive matrix molecules) proteins have been predicted in S. gallolyticus, most of which are either absent or preudogenes in S. macedonicus, S. pasteurianus and S. infantarius (Table 5) [49]. The cell surface protein antigen c (PAc) also appears exclusively in the S. gallolyticus genomes, sometimes in more than one copy.
Only the surface-exposed histone-like protein A (HlpA) and the autolysin (AtlA) are universally conserved in the SBSEC. HlpA has been shown to be a major heparinbinding protein regulating the ability of S. gallolyticus adherence to the heparan sulfate proteoglycans at the colon tumor cell surface [51]. AtlA is a fibronectin-binding protein which is a VF of S. mutans associated with infective endocarditis [52]. Furthermore, S. gallolyticus UCN34 carries loci for the biosynthesis of insoluble glucan polymers from sucrose and the synthesis of hemicellulose [25]. Insoluble glucan polymers may contribute to feedlot bloat in cattle [25], while hemicellulose could play a role in biofilm formation [53]. It is possible that the production of these polymers may vary among strains of S. gallolyticus (Table 5). Streptococcus macedonicus is devoid of the biosynthetic gene cluster of glucan, while the hemicellulose synthesis operon seems to be comprised of pseudogenes. Similarly, S. pasteurianus and S. infantarius seem to be also unable to synthesize both sugar polymers, either due to full or partial absence of the genetic loci. More genes whose products may be implicated in other interactions with the host cells beyond adherence could be identified. Despite the fact that the SBSEC members are considered non-hemolytic (as members of the group D streptococci), S. gallolyticus ATCC BAA-2069 has been reported to cause alpha-hemolysis on Schaedler Agar with 5% sheep blood [54]. Three hemolysins are conserved among the SBSEC members (Table 5). Sequence analysis of Sinf_1513 and Sinf_1683, also annotated as hemolysin genes, was not supportive of a hemolysin protein product (data not shown). Apart from hemolysins, a putative exfoliative toxin B (Sinf_0933) and a macrophage infectivity potentiator protein (Sinf_0931) are present in the S. infantarius genome [20]. Similar genes can be found in S. thermophilus strains but not in the other SBSEC species and in our opinion functional analysis is required to verify these annotations.
In order to expand our investigation for putative pathogenicity traits, we screened the genomes of S. macedonicus and its related SBSEC species using the VFDB (virulence factors database) [55] and the genes determined to encode putative VFs during this analysis are presented in Additional file 11: Table S7. Current results of comparative pathogenomics have allowed the classification of available streptococcal VFs in nine categories, i.e. adhesion factors, DNases, exoenzymes, immune evasion factors, immunoreactive antigens, factors involved in metal transport, proteases, superantigens and toxins [56]. The general profile of VFs for the six streptococci under investigation was rather similar and we determined a number of previously unidentified potential VFs dispersed among all or some of the SBSEC members. Several of these genes coding for putative VFs like the agglutinin receptor, the fibronectin/ fibrinogen-binding protein (fbp54/pavA), the lipoprotein rotamase A (slrA), the plasmin receptor/GAPDH multifunctional protein, the streptococcal enolase exoenzyme, the pneumococcal surface antigen A and specific proteases (i.e. cppA, htrA/degP and tig/ropA) have been experimentally correlated with the virulence of pathogenic streptococci beyond SBSEC members [57][58][59][60][61][62][63][64][65][66][67]. Some genes were also involved in the production of a capsule that enables bacterial cells to evade phagocytosis (Additional file 11: Table S7) [68]. According to our analysis, all SBSEC streptococci carry a main gene cluster spanning practically the same position in the chromosome that could be involved in the biosynthesis of a capsule (Additional file 12: Figure S5). Even though the cps clusters are identical between S. gallolyticus UCN34 and ATCC BAA-2069 [54], multiple sequence alignment indicates significant structural diversity in the rest of the strains. The existence of dispersed pseudogenes in the gene clusters of S. infantarius and S. macedonicus (e.g. SMA_0865 and SMA_0866) may prohibit the production of capsule substances. It should be emphasized that the strains of the SBSEC missed hits in several major categories of streptococcal VFs (e.g. DNases, immunoreactive antigens, superantigens and toxins) supporting a reduced pathogenic potential for the SBSEC in general.

Conclusions
In this study we presented the analysis of the first complete genome sequence of a dairy isolate of S. macedonicus. While comparative analysis among specific subgroups of the SBSEC species has been previously presented [20,22,25,54], comparative genomics of the six complete genome sequences was missing. Most importantly, the inclusion of S. macedonicus into this analysis provided a better opportunity to assess niche adaptation of the SBSEC species that was so far limited by the presence of only one dairy isolate (i.e. S. infantarius CJ18) among four clinical strains.
Our findings clearly support two distinct evolutionary patterns within the SBSEC. On the one hand, S. gallolyticus is a species without apparent genome decay and the available genomes suggest that it is a robust bacterium able to thrive in the rumen of herbivores. On the other hand, the remaining SBSEC species, i.e. S. macedonicus, S. pasteurianus and S. infantarius exhibit decreased genome sizes accompanied by increased percentages of potential pseudogenes due to extensive genome decay, suggesting adaptation to nutrient-rich environments. This does not necessarily mean that the environment to which the three species have been adapted is the same. The three species appear with a reduced ability to catabolize complex plant carbohydrates and to detoxify substances met in the rumen, which indicates that they must have deviated from this niche. It has been proposed that S. pasteurianus may now reside in the human gut [22], while S. infantarius presents adaptations to milk [20]. Streptococcus macedonicus also possesses traits that may contribute to growth in the dairy environment, like the extra lactose gene cluster and its proteolytic system. However, all SBSEC strains, including clinical isolates, seem to be competent in the metabolism of lactose and galactose or the degradation of milk proteins. Taking into account these shared characteristics of all SBSEC species, it is tempting to speculate that their common ancestor may have been able to grow in milk.
In our opinion, several genome traits per se suggest adaptation of S. macedonicus to milk. This hypothesis is also supported by the predicted interspecies interactions of S. macedonicus with other bacteria. As it has been recently reported for S. infantarius [20], the S. macedonicus genome may have acquired genes originating from L. lactis and S. thermophilus through HGT. The predicted exposure of S. macedonicus to S. thermophilus phages, based on our CRISPR sequence analysis, is also in favor of this theory. No such evidence was found for the rest of the SBSEC members apart from S. infantarius. These findings are in accordance with the frequent isolation of S. macedonicus from dairy products [13] and the prevalence of S. infantarius in certain African fermented milks [20]. One additional question that arises is whether S. macedonicus and S. infantarius are specialized dairy microbes like S. thermophilus. We believe that the available data does not support this idea. Traits of milk adaptation have been shown to be strain-specific in S. infantarius [20]. In addition, the genome size of S. macedonicus is significantly larger, containing a higher number of functional genes in comparison to S. thermophilus. Streptococcus macedonicus and S. infantarius may thus represent intermediate evolutionary stages analogous to those followed by the ancestors of S. thermophilus before it became today's starter culture.
Thus, the safety concerns raised from the presence of SBSEC members in foods remain, even if reports implicating S. macedonicus with disease are rather scarce [69,70]. Our comparative genomic analysis showed that both S. macedonicus and S. infantarius miss several VFs that are highly conserved in S. gallolyticus. However, the interpretation of these findings becomes complicated as the available genome of the human blood isolate S. pasteurianus ATCC 43144 also exhibited diminished traits of pathogenicity similarly to the two dairy SBSEC members. Overall, our analysis provides evidence in agreement with the clinical perception that the members of the SBSEC are lower grade streptococcal pathogens [10]. In terms of food safety, the dairy SBSEC could thus constitute a risk factor similar to the presence of enterococci that are widely found in fermented products, but cause no major problem for the average healthy and adult consumer. Nevertheless, it is the correlation of the SBSEC microorganisms with human endocarditis and colon cancer in particular that may require special considerations. For example, it has been proposed that members of the SBSEC like S. gallolyticus may be part of the etiology of colon cancer by causing chronic inflammation [10]. In order to assess the pathogenicity of this group of streptococci, more research is needed on the specific mechanisms employed by SBSEC members to cause disease. More comparative and functional genomics studies comprising SBSEC genomes are necessary that will cover additional species of the complex, like the recently sequenced Streptococcus lutetiensis [71]. New clinico-epidemiological studies should also be undertaken in view of the most recent changes in the taxonomy of the SBSEC complex [72]. In the meantime, assuming the worse case scenario, we propose that the presence of SBSEC members including S. macedonicus and S. infantarius in foods should be avoided until their pathogenicity status is resolved.

Methods
Sequencing and annotation of the genome of Streptococcus macedonicus ACA-DC 198 The genome of S. macedonicus ACA-DC 198 was sequenced and annotated as described previously [19]. In brief, we employed a sequencing strategy involving shotgun/paired-end pyrosequencing and shotgun Illumina sequencing with the 454 GS-FLX (Roche Diagnostics, Basel, Switzerland) and the Hiseq 2000 (Illumina, San Diego, CA), respectively. Sequences were assembled in two contigs corresponding to the complete genome sequence and the pSMA198 plasmid of S. macedonicus. The hybrid assembly was validated against an NheI optical map of the S. macedonicus genome generated at OpGen Technologies, Inc. (Madison, WI). The genome was annotated using the RAST [73] and the Basys [74] pipelines. Predictions of the two pipelines were compiled into a single annotation file after manual curation in the Kodon software environment (Applied Maths N.V., Sint-Martens-Latem, Belgium). Final corrections and quality assessment of the annotation were performed with the GenePRIMP pipeline [21]. GenePRIMP was also used for the identification of putative pseudogenes. The circular map of the S. macedonicus genome was generated by the DNAPlotter software [75].
Comparative genomics of Streptococcus macedonicus ACA-DC 198 against related members of the SBSEC The complete genome sequence of S. macedonicus was compared to those of S. gallolyticus strains UCN34, ATCC 43143 and ATCC BAA-2069, S. pasteurianus ATCC 43144 and S. infantarius CJ18 using a variety of tools. In order to visualize conserved genomic regions or chromosomal rearrangements, whole genome sequence alignments were performed by progressiveMAUVE [24]. Estimation of the differential gene content of the genomes, as well as whole genome phylogeny of streptococci was carried out within the EDGAR software framework [23]. Venn diagrams were designed with the VennDiagram package in R [76]. The glycobiome of the SBSEC members was determined based on the pre-computed data available in the CAZy database [26].

Additional analysis
Sequence similarity searches were performed with the BLAST suite [77]. Whenever necessary, protein sequences were analyzed in the CDD [78]. Figures showing similarity of gene clusters were constructed with the Easyfig comparison visualizer [79]. Potential VFs included in the VFDB [55] were identified in the SBSEC genomes with mpiBLAST, as implemented in the mGenomeSubtractor website [80]. In brief, the entire VFDB was uploaded as the reference sequence in the mGenomeSubtractor website and each genome was used as the query sequence. Only hits with H-value homology score > 0.6 were considered significant. CRISPRs were analyzed by the tools available in the CRISPRcompar web-service [81]. A general bit score cutoff value of 42.0 was applied during BLASTN of CRISPR spacers. GIs were identified and visualized by the IslandViewer application that utilizes three different prediction tools (i.e. IslandPick, SIGI-HMM and IslandPath-DIMOB) relying on either sequence composition or comparative genomics [36]. Genomic regions of RM systems were determined in the REBASE genomes database [82].