Fermented dairy products are important in Africa as source of nutrients and as weaning food. Fermentation is an essential preservation method in the absence of refrigeration [67–69]. Analyses of dairy adaptations and potential virulence factors of bacteria leading spontaneous fermentation processes is therefore important to identify consumers’ health risk potential and unravel novel fermentative lactic acid bacteria strains.
In this study, we report the complete genome sequence of the African dairy isolate Sii CJ18, the first complete assembled genome of a S. infantarius species. Whole genome comparison of Sii CJ18 to Sii ATCC BAA-102T and related streptococci revealed substantial adaptations to the dairy environment in CJ18, paralleling that of S. thermophilus. However, our data indicates that genome decay of Sii CJ18 is in a less advanced state compared to S. thermophilus, since most biosynthesis pathways seem to be intact and the number of pseudogenes (4.9%) is smaller than for S. thermophilus (10-19%). This suggests that establishment of CJ18 in the dairy environment is more recent than S. thermophilus strains or S. gallolyticus subsp. macedonicus ACA-DC 198. Based on genome decay, the most recent common ancestor for S. thermophilus strains was estimated to have lived 3,000-30,000 years ago, which is approximately the duration of human dairy activity [26, 70]. Camels, however, were introduced in East Africa only around 2,500 years ago [71–73], and the less advanced state of genome decay in CJ18 may be related to the later start of African camel milk fermentation.
Adaptation to the dairy environment in S. thermophilus consists of enhanced uptake of lactose and peptides and loss of other metabolic pathways. CJ18 displays a similar adaptation in the lactose metabolism through the transporter LacS and β-galactosidase LacZ. Truncation of either LacS or LacZ resulted in significant impaired growth on lactose, confirming the functionality of this acquired lactose utilization path. Neither the second LacS (Sinf_1514), present in both CJ18 and ATCC BAA-102T, nor the lactose PTS could take over lactose transport in the LacS KO strain. The integration of transposases in the corresponding lactose PTS gene cluster seems therefore a result of loss of essentiality after the acquirement of lacS and lacZ. Moreover, a concurrent activity of both transporters potentially leads to misbalance in redox or phosphorylation status of the cell, and hence positive selection on truncation of the lactose PTS gene cluster might have even occurred after acquirement of LacSZ. The release of galactose into the growth medium shows that LacS in CJ18 functions as a highly efficient antiporter and the competitiveness of CJ18 in the dairy environment seems therefore based on the acquired LacSZ. This facilitates efficient transport of lactose and as a consequence an increased lactose consumption and lactate production compared to ATCC BAA-102T (isogenetic strain of CCUG 43820T) .
The role of other adaptations to the dairy environment, such as the presence of a second oppABCDF operon and an extended EPS biosynthesis cluster is less clear. Enhanced uptake of casein derived peptides by the second peptide transporter could contribute to increased competiveness in milk. The enlarged cluster of Eps/Cps-related proteins could contribute to survival during the suusac back-slopping process, via improved biofilm formation capabilities. Furthermore, EPS contribute to texture of the fermented dairy product and the selection of strains for these textural properties might have occurred in the past [18, 26].
The more recent adaptation to the dairy environment of C18 is reflected by the lower number of pseudogenes and CRISPR spacers in CJ18 compared to S. thermophilus or S. gallolyticus subsp. macedonicus ACA-DC 198. CJ18 harbors nine CRISPR spacers whereas typical widespread dairy starter strains of S. thermophilus such as CNRZ 1066 and LMG 18311 harbor 42 and 39 spacers, respectively [26, 65]. Phage infection and phage-related fermentation losses are major problems in dairy technology. The number of CRISPR spacer in a bacterial genome is directly linked to phage contact history and presumptive resistance against phages of that particular strain . The African strain CJ18 was apparently not continuously exposed to phage infections over prolonged periods. This could be a result of the spontaneous nature of the traditional fermentation, which in contrast to industrial starter culture fermentations, does not rely on selected starter strains. The absence of CRISPR spacer identity between CJ18 and ATCC BAA-102T further shows that the African CJ18 is only a distant relative of ATCC BAA-102T as previously observed in microevolution of CRISPR spacers in other genera . Additionally, the presence of 103 CDS in CJ18 shared only with other streptococci but not with ATCC BAA-102T as well as the absence of 310 CDS in CJ18 present in ATCC BAA-102T indicates an ancestral streptococcal origin of these CDSs and again only distant relation between the two Sii strains.
Another interesting feature of CJ18 is its natural competence and DNA uptake capability, paralleling that of other streptococci and lactic acid bacteria (LAB) [27, 76]. As a possible result of this, the genome displays traces of HGT events from commensal bacteria encountered in milk such as Lactococcus spp. and S. thermophilus but also pathogens like S. agalactiae. Furthermore, the natural competence could potentially contribute to the uptake of mobile genetic elements and to spread of antibiotic resistance genes . Therefore the apparent intact competence machinery is probably of high importance for persistence of the strain in the African dairy environment.
CJ18 harbors none of the concerning typical streptococcal virulence factors  and less SBSEC-related virulence factors compared to e.g. S. gallolyticus and S. bovis. Moreover, most of these potential virulence factors are related to adhesion and not directly to infection, cytotoxicity or toxin production and are therefore of less concern. Many factors found in CJ18 are also present in the proclaimed safe strain S. gallolyticus subsp. macedonicus ACA-DC 198, a species without QPS-approval [25, 77]. Some potential virulence factors or artifacts thereof were even found in S. thermophilus. Consequently, relying on genomic information alone, ingestion and digestion of large amounts of Sii via suusac does not seem to be a direct health risk for adults. However, the SBSEC-associated health risks for immune-deprived people, a major concern in Africa, and for children are less understood as epidemiological data on these diseases are not available. Furthermore, the uncertain association of Sii with human diseases necessitates further elucidation of presumptive Sii-specific virulence factors or the absence thereof in Sii.