Skip to main content

Whole genomic comparative analysis of Streptococcus pneumoniae serotype 1 isolates causing invasive and non-invasive infections among children under 5 years in Casablanca, Morocco



Streptococcus pneumoniae serotype 1 remains a leading cause of invasive pneumococcal diseases, even in countries with PCV-10/PCV-13 vaccine implementation. The main objective of this study, which is part of the Pneumococcal African Genome project (PAGe), was to determine the phylogenetic relationships of serotype 1 isolates recovered from children patients in Casablanca (Morocco), compared to these from other African countries; and to investigate the contribution of accessory genes and recombination events to the genetic diversity of this serotype.


The genome average size of the six-pneumococcus serotype 1 from Casablanca was 2,227,119 bp, and the average content of coding sequences was 2113, ranging from 2041 to 2161. Pangenome analysis of the 80 genomes used in this study revealed 1685 core genes and 1805 accessory genes. The phylogenetic tree based on core genes and the hierarchical bayesian clustering analysis revealed five sublineages with a phylogeographic structure by country. The Moroccan strains cluster in two different lineages, the five invasive strains clusters altogether in a divergent clade distantly related to the non-invasive strain, that cluster with all the serotype 1 genomes from Africa.


The whole genome sequencing provides increased resolution analysis of the highly virulent serotype 1 in Casablanca, Morocco. Our results are concordant with previous works, showing that the phylogeography of S. pneumoniae serotype 1 is structured by country, and despite the small size (six isolates) of the Moroccan sample, our analysis shows the genetic cohesion of the Moroccan invasive isolates.


Streptococcus pneumoniae, an encapsulated commensal bacterium of the human nasopharynx, remains a major cause of mortality and morbidity worldwide, that frequently infects children under 2 years old with immature immune system and elderly with a decrease of the immune response, as well as people with underlying diseases [1, 2]. This pathogen is the well-known agent of non-invasive infections such as sinusitis and otitis media, and invasive infections such as meningitis, bacteraemia and bacteraemic pneumonia [1]. According to the World Health Organization (WHO) in 2015, of the estimated 5.83 million globally deaths among < 5 years old children, 294,000 deaths were due to pneumococcal infections [2].

Nowadays, nearly 100 serotypes have been identified based on antibody binding to specific epitopes and on structural differences in its capsular polysaccharides [3]. Some serotypes are known to cause invasive diseases and others are mainly found in nasopharyngeal carriage [4]. Pneumococci expressing serotype 1 are often associated with invasive infections and are rarely found in carriage. This serotype can cause invasive pneumococcal disease (IPD) outbreaks in small or closed communities, and lethal meningitis epidemics [5, 6]. In African countries, serotype 1 remains a leading cause of IPDs, even in countries with pneumococcal conjugated vaccines (PCV), PCV-10/PCV-13 [5].

Pneumococcal vaccination constitutes the best option for preventing IPD, and in several countries, PCVs have been recommended in childhood immunization programs for more than a decade. They have significantly reduced the global IPD burden in children as reported in the review of Izurieta [7]. In Morocco, the national immunization program introduced PCV in October 2010 firstly by PCV-13 in 2 + 1 schedule at the age of 2, 4 and 12 months, then switched to PCV-10 in July 2012 in the same schedule. In Casablanca, the study conducted by Diawara and his colleagues on IPD among ≤5 years children showed the persistence of serotype 1 after introduction of PCVs [8].

Multi Locus Sequence Typing (MLST) is a current method used to characterize pneumococcal populations. Compared to this technique, whole genome sequencing (WGS) has a highest discriminatory power for bacterial genomes analysis. Indeed, WGS has increased power to establish the evolutionary relationships between close strains across the species [9]. Multiple studies have used the high-throughput WGS to investigate PCV impact and clustering pneumococcal populations into groups using genomic variation that reflect a recent evolutionary history [10,11,12,13]. The objectives of this study, as part of the Pneumococcal African Genome project (PAGe), were to determine the phylogenetic relationships of serotype 1 isolates recovered from children patients in Casablanca, compared to isolates from other African countries (Egypt, Ethiopia, Ghana, Malawi, Mozambique, Niger, Nigeria, South Africa and The Gambia), and to investigate the contribution of accessory genes and recombination events to the genetic diversity of this serotype.


Characteristics of pneumococcal serotype 1 isolates from Casablanca, Morocco

We analyzed six strains of S. pneumoniae serotype 1 isolated among ≤5 years children (Table 1). Five isolates were from invasive infections while the remaining strain was isolated from the protected distal bronchial sample (PDBS) in the pneumonia context. Of these invasive strains, four were isolated from blood and the other from the cerebrospinal fluid (CSF). All isolates were susceptible to all antibiotics tested. The minimum inhibitory concentration (MIC) to penicillin G and ceftriaxone varied between [0.016–0.032] μg/ml for both beta lactams antibiotics.

Table 1 Sources and genome organization of S. pneumoniae serotype 1 isolates from Casablanca

General genome features

The six strains of S. pneumoniae serotype 1 were sequenced using the Illumina HiSeq 2500 system (Table 1). The average length of the six pneumococcus serotype 1 genomes analyzed was 2,227,119 bp. The minimum genome length was 2,177,478 bp corresponding to the strain isolated from CSF, while the maximum genome size was of 2,319,720 bp for the strain isolated from PDBS. The CDS content ranged from 2041 to 2161 with an average of 2113. The minimum and maximum CDS content were of the strain from CSF and PDBS respectively.

Phylogenomic analysis and population structure

The maximum likelihood reconstruction of the phylogenetic relationships among the African isolates of S. pneumoniae serotype 1 based on the core genome, and the hierarchical bayesian clustering analysis revealed 2 lineages and five sublineages. The five sublineages population structure clusters the isolates by country and geographic region (Fig. 1a). With regard to the Moroccan strains, the phylogenetic analysis showed a basal divergent clade with the five invasive strains that clusters in two sublineages, and is distantly related to the non-invasive strain (ni_Morocco_3) that is highly related with the other African genomes (Fig. 1a). All the five invasive moroccan strains were of ST306 while the remaining non-invasive strain was of ST2084. The identified ST306 matched the lineage 2 exclusively associated with invasive strains.

Fig. 1

a Phylogeny of serotype 1 S. pneumoniae isolates. The tips of the tree are colored by country, and beside there are two bars that are the result of the clustering analysis with hierBaps. One bar, grey and black, represent the Lineages; and the other, with five colors, represent sublineages. The heatmap are the accessory genes, with no singletons. b The boxplots shows the accessory genomes size, with singletons. c shows a DAPC analysis

Pangenome analysis

To explore the accessory genes composition of S. pneumoniae serotype 1, we reconstructed the pangenome of the 80 genomes. The analysis revealed a total of 3490 genes, from which 1685 were core genes, and 1805 accessory genes from which 1311 and 494 represented shell and cloud genes respectively (Table 2). Figure 1a shows the differences in the accessory genes distribution across sublineages, and Fig. 1b explicit that the accessory genomes sizes in invasive sublieanges are substantially smaller. Finally, we can see in Fig. 1c based on the discriminant analysis of principal components (DAPC) using the accessory genes that sublineage 1 is highly different from the remaining sublineages. Sublineages 2 and 3, as sublineages 4 and 5, are overlapped in the accessory genes composition (Fig. 1c). Despite the high divergence observed among the strains in the accessory genes composition, they are still being the same species. All ANI (Average Nucleotide Identity) values are all > 95%, which value is indicative of the same species of pneumococcus (Fig. 2).

Table 2 Pangenome analysis of 80 genomes of S. pneumoniae serotype 1 strains
Fig. 2

ANI values obtained comparing the S. pneumoniae genomes. The colors are by country following Fig. 1. Morocco within = all Moroccan genomes, Morocco within* = all Moroccan genomes without ni_Morocco_3, Morocco among = all Moroccan genomes against all other genomes, Morocco among* = all Moroccan genomes without ni_Morocco_3 against all other genomes, Morocco among* = ni_Morocco_3 against all other genomes from other countries

Recombination detection

The recombination events on the core genome alignment of the 80 isolates were examined to investigate their contributions to the phylogenetic diversity of serotype 1. Our results shows that all African S. pneumoniae serotype 1 lineages underwent multiple recombinations across its evolutionary history (Fig. 3a). The two sublineages (i.e. sL.4 and sL.5) identified in the invasive Moroccan isolates underwent many recombination events among it, but also with isolates of sL.2 where the non-invasive Moroccan isolate clusters. We also correlated the Jaccard distance, as a measure of the dynamic and similarty of the accessory genome, with the nucleotide diversity based on the synonymous polymorphisms (neutral evolution). As shown in Fig. 3b for whole dataset, R2 = ~ 0.5 and is significative, at hence, there is no signal of adaptive evolution. But looking at sublineages level (Fig. 3c), sL.1, sL.2 and sL.3 did not show correlation among Jaccard distance and the core genome synonymous diversity (πsynonymous) indicating adaptive evolution of accessory genome. The results are not conclusive for the Moroccan isolates due to the few isolates, sL.4 (three isolates) and sL.5 (two isolates).

Fig. 3

Recombination and accessory genome analysis of S. pneumoniae serotype 1 isolates. a Recombination using Gubbins: The tree was built from non-recombinant regions in the core genome alignment. The colored strip highlights the BAPS clusters. Red blocks indicate recombinant regions along the core genome. Blue blocks indicate recombination event that have occurred in a single isolate. b Linear regression analysis of genomic fluidity (accessory genome) against synonym diversity (core genome). Plotting the values are in ln-ln. c Same as b but done by sublineage; Sublineage 1:R2 = 0.01047, p-value = 0.01066, Sublineage 2: R2 = 0.1448, p-value = 7.31e− 7, Sublineage 3: R2 = 0.2698, p-value = 2.2 e− 17. Sublineage 4 and 5 were not plotted because they have a limited value due to the small number of isolates


Recent advances in genome sequencing technologies and the emergence of WGS use in genome comparative studies have provided powerful tools to study the genomic diversity of bacterial pathogens by using genome sequence data. In this study, we made the genomic comparison of six clinical S. pneumoniae serotype 1 isolates from Morocco to those from other African countries by using WGS. Serotype 1 still the main cause of IPD in developing countries. In African countries, serotype 1 is one of the top three of most common vaccine serotypes causing IPD among children [14, 15].

All Moroccan isolates were susceptible to all antibiotics tested. Serotype 1 is rarely linked with antibiotic resistance. This could be associated with the short duration of this serotype in asymptomatic colonization of the human nasopharynx, where exchange of genetic elements through recombination with other Streptococci took place [16,17,18].

The average length of the genome of the serotype 1 strains from Morocco was 2,227,119 bp. This genome size is greater than the size of the reference strain of serotype 1, S. pneumoniae P1041 (accession number: FQ312030) which is 2,142,122 bp. In addition to that, the average genes content found in Moroccan strains was 2113 in the range of 2041 to 2161 CDS. Compared to CDS content of S. pneumoniae P1041 (1824 CDS), the gene content of Moroccan isolates was higher. The difference of Moroccan strains in genome length and CDS content could be explained by the frequent occurrence of horizontal gene transfer events among the genus Streptococcus [19].

In our study, 1685 CDS were highlighted as core genome in all 80 isolates analyzed which corresponds on 79.7% of the total genome translation. The quantification of the bacterial core genome may vary among different isolates collections. Moreover, it is highly dependent on the core genome definition as well as the parameters fixed for the pangenome analysis [20]. In many cases, the core genome is made of genes involved in essential functions such as protein synthesis, DNA related metabolism and cellular processes [20]. The accessory genome of the serotype 1 isolates studied was highly variable. The pangenome analysis by Roary revealed a total of 1805 accessory genes found in 0% < = strains < 99% of the 80 strains analyzed. Those genes are mainly associated with evasion of the host immune system and virulence factors [21]. Nevertheless, our findings showed that the accessory genomes in invasive sublineages were substantially smaller. This result could be associated with low recombination rate in invasive strains as the human nasopharynx is the main reservoir of genetic exchanges driven by recombination, horizontal gene transfer and events of gene loss or addition occur. The IPD associated genes have been reported shorter than those in the soft-accessory genome [22].

Phylogenetic tree and the hierarchical bayesian clustering analysis revealed 2 lineages and five sublineages with a geographic structure by country. In the large study conducted on the whole genome phylogeny of serotype 1 pneumococcal isolates [23], the authors showed a high level of geographical structure especially among African isolates, with multiple inter country transfers between bordering countries, in particular Mozambique, Malawi and South Africa. Many phylogenetic studies worldwide grouped different pneumococcal collections in many clusters [18, 24, 25]. The study on phylogenetic analysis in South Africa by du Plessis et al.(2016) [26], the global invasive serotype 1 population was grouped in 9 clusters and showed a strong phylogeographic structure. According to our phylogenetic tree, the invasive strains from Morocco formed one clade, suggesting that there are genetically cohesive. Moreover, some genes were absent only in that clade. Since pneumococcal strains have substantial genomic variation, the locations of genes within their respective genomes are not constant [27].

The sublineage 1 is exclusively formed by strains from Malawi and Mozambique. This sublineage is highly different from the remaining sublineages based on the DAPC using the accessory genes. As reported elsewhere, the Malawi and Mozambique serotype 1 pneumococci formed a single and genetically stable clade over the sampling period without detectable impact from imported clones [23]. As published by Chaguza et al. [28], the identified sequence Clusters (SCs) in African pneumococcal serotype 1 population matched the phylogenetic clades from the phylogeny and were predominantly associated with geographical origin of the isolates. With regards to Moroccan isolates, the identified ST306 matched the lineage 2 exclusively associated with invasive strains. The ST306 is a worldwide distributed pneumococcal serotype 1, suspected to be an important pathogen behind the increase of the serotype 1 IPD expansion and is responsible for more than 80% of cases of serotype 1diseases [29, 30].

As expected, the strains analyzed underwent multiple recombinations. S. pneumoniae is naturally transformable and recombination events play a major role in its molecular evolution [31, 32]. Some studies reported low presence of recombination in serotype 1, and that the rates with which this phenomenon occurs vary greatly among pneumococcal serotypes and lineages [33, 34]. Compared to invasive strains, the strain isolated from PDBS were associated with the largest genome and the highest number of CDS as well as the accessory genes. This finding could suggest that the strain was probably carried for long periods enough to result in extensive genetic exchanges with other closely related species as Streptococcus mitis as reported in the large study of Lessa et al. [17]. In that study, wzy1 operons of S. mitis clones were found highly similar to wzy1 of the serotype 1 S. pneumoniae reference strain.


This study constitutes the first pneumococcal genome analysis by whole genome sequencing in Morocco, providing increased resolution on the analysis of the highly virulent serotype 1. Our results confirmed the phylogeographic structure by country of S. pneumoniae serotype 1 despite the small size of the Moroccan sample, and showed it basal position on the phylogenetic tree of African isolates. The Moroccan isolates are structured in three sublineages, with the non-invasive strain which clusters in sL.2 being highly divergent from the invasive strains (sL.4 and sL.5).


Bacteria isolates

The microbiology laboratory of Ibn Rochd University Hospital Centre (IR-UHC) of Casablanca carries out surveillance of invasive and non-invasive pneumococcal infections in children ≤5 years [8]. All pneumococcal strains were isolated and identified according to the standard bacteriology procedures. Serogrouping was done by the checkerboard method with Pneumotest-latex (Statens Serum Institute antisera, Copenhague, Denmark). Serotyping was performed by Quellung capsule swelling using Statens Serum Institute antisera (Statens Serum Institute antisera, Copenhague, Denmark).

Antibiotic susceptibility tests were performed on Mueller-Hinton agar additioned with 5% of sheep blood (BioMérieux, Marcy-l’Etoile, France) and interpreted according to the Clinical Laboratory Standard Institute (CLSI, 2012) recommendations [35]. Oxacillin (1 μg) was used for screening of penicillin non-susceptible S. pneumoniae. Erythromycin, chloramphenicol, clindamycin, vancomycin, cotrimoxazole, rifampicin, tetracycline and levofloxacin were tested by disc diffusion method. The MIC of penicillin G and ceftriaxone were determined by E-test method with E-Tests from Oxoid (Oxoid, Basingstoke, UK) on Mueller-Hinton agar additioned with 5% of sheep blood (BioMérieux, Marcy-l’Etoile, France). The breakpoints recommended by CLSI in 2012 were used for interpretation: ≤ 0.06 μg/ml and ≥ 2 μg/ml for penicillin, ≤ 0.5 μg/ml and ≥ 2 μg/ml, for ceftriaxone for meningeal isolates and ≤ 1 μg/ml and ≥ 4 μg/ml for non-meningeal isolates. Quality control was conducted using S. pneumoniae ATCC 49619.

From 2007 to 2014, 9 strains (invasive and non-invasive) of serotype 1 were isolated in children under 5 years old. Three of them were lost. Six (6) isolates (5 invasive and 1 non-invasive) of S. pneumoniae serotype 1 causing infections among children under 5 years, were randomly selected from the data bank of the microbiology laboratory of IR-UHC of Casablanca, to perform the WGS analysis. All isolated strains were stored in brain heart infusion broth with 15% of glycerol at − 80 °C.

Bacterial DNA preparation and whole genome sequencing

The genomic DNA of the six strains selected for this surveillance was purified with the QIAamp DNA Mini Kit (Hilden, Germany) following the manufacturer’s recommendations. DNA quality and quantity were estimated by measuring the absorbance of the sample using NanoVue™ Plus Spectrophotometer (GE Healthcare UK Limited, UK) at wavelengths 260 nm and 280 nm following the manufacturer’s instructions. Extracted DNA were stored at − 20 °C. The DNAs of the six strains were whole-genome-sequenced using an Illumina HiSeq 2500 platform at the Wellcome Trust Sanger Institute, as part of the PAGe project. Libraries were constructed using the Nextera XT DNA Library Preparation Kit and sequenced with the HiSeq Reagent Kit (pair-end reads of 150 bp).

Genome assembly and annotation

The quality of the generated reads from high throughput NGS was assessed using FastQC v0.11.8 [15]. After removal adaptor sequences, reads of each isolate were de novo assembled using SPAdes v3.11.1 [36] with a k-mer size automatically determined by the package. The obtained draft assemblies were annotated using the Prokka (Prokaryotic annotation) software, which predicts genes, based on available annotation informations such as proteins and coding sequences (CDS) [37]. Average Nucleotide Identity (ANI), a whole-genome similarity metric was used to investigate the relatedness among isolates genomes.

Recombination, phylogenetic and population structure analysis

In the study of by Chaguza et al. [28], the phylogenic analysis of the global population structure of serotype 1 in Africa showed that all isolates were grouped in five distinct clades. From those clades, we selected a balanced genomic data of 74 public genomes of serotype 1 from nine African countries (Egypt, Ethiopia, Ghana, Malawi, Mozambique, Niger, Nigeria, South Africa and Gambia) previously published [28]. Data were extracted in the European Nucleotide Archive (ENA) database (Additional file 1). Recombination was analyzed among the strains from Morocco, and 74 public genomes of serotype 1 using Gubbins algorithm [38] over the core genome alignment generated by progressiveMauve [39], a software package that attempts to align orthologous and xenologous regions among genome sequences. First, we removed inconsistent alignment columns with trimAl [40] in all concatenated locally collinear blocks, and then Gubbins was run over the core genome alignment. For the inference of the phylogenetic relationships among the 80 isolates, Maximum Likelihood (ML) phylogenetic analyses were performed by using RAxML v8.2.12 [41] based on core genome obtained with progressiveMauve (recombinations were filtered out with Gubbins), with 1000 bootstrap iterations. The clustering analysis was done with hierBaps (Bayesian clustering tool for population genetics). Sequence types (STs) of moroccan S. pneumoniae isolates were determined by the sequences of seven housekeeping genes (aroE, gdh, gki, recP, spi, xpt, and ddl) obtained from the results of WGS. Allelic numbers and STs were assigned by using the pneumococcal Multilocus Sequence Typing (MLST) website (

Pangenome reconstruction

To accurately reconstruct the pangenome of the whole dataset, all 80 assembled and annotated genomes (6 genomes from Morocco and 74 public genomes) were analyzed by Roary v3.11.2 [42]. DAPC [43] was used to investigate the accessory genes distribution among sublineages in order to explore its differences.

Finally, we did a linear regression of the Jaccard distance based on the accessory genes and the nucleotide diversity of synonym sites of core genes for each pair of genomes to provide insights of accessory genome adataptive evolution. The analysis was done using the R package pagoo (, computing the Jaccard distance between each pair of organisms by the vegan::vegdist function [44] and the pairwise nucleotide diversity by the pegas::nuc.div function [45].

Availability of data and materials

Sequence information of the 6 Moroccan pneumococcal serotype 1 genomes are available in the European Nucleotide Archive (ENA) database under the accession number given in Additional file 1. The 74 public genomes of S. pneumoniae serotype 1 used for the phylogeny reconstruction in this study were all downloaded from the ENA database; their accession numbers are listed in Additional file 1.



Coding DNA sequence


Bayesian clustering tool for population genetics


Discriminant analysis of principal components


Average Nucleotide Identity


European Nucleotide Archive


Clinical Laboratory Standard Institute


World Health Organization


Pneumococcal conjugated vaccines


Invasive pneumococcal diseases


Whole genome sequencing


Protected distal bronchial sample




Pulse Field Gel Electrophoresis


Multi Locus Sequence Typing


Sequence Type


Minimum inhibitory concentration


Single nucleotide polymorphisms


  1. 1.

    Feldman C, Anderson R. Recent advances in our understanding of Streptococcus pneumoniae infection. F1000Prime Reports. 2014;6. doi:

  2. 2.

    WHO | Weekly Epidemiological Record, 22 February 2019, vol. 94, 08 (pp. 85–104). Accessed 26 Feb 2019.

  3. 3.

    Geno KA, Gilbert GL, Song JY, Skovsted IC, Klugman KP, Jones C, et al. Pneumococcal capsules and their types: past, present, and future. Clin Microbiol Rev. 2015;28:871–99.

    CAS  Article  Google Scholar 

  4. 4.

    Desai AP, Sharma D, Crispell EK, Baughman W, Thomas S, Tunali A, et al. Decline in pneumococcal nasopharyngeal carriage of vaccine serotypes after the introduction of the 13-Valent pneumococcal conjugate vaccine in children in Atlanta, Georgia. Pediatric Infectious Disease J. 2015;34:1168–74.

    Article  Google Scholar 

  5. 5.

    Kwambana-Adams BA, Asiedu-Bekoe F, Sarkodie B, Afreh OK, Kuma GK, Owusu-Okyere G, et al. An outbreak of pneumococcal meningitis among older children (≥5 years) and adults after the implementation of an infant vaccination programme with the 13-valent pneumococcal conjugate vaccine in Ghana. BMC Infect Dis. 2016;16.

  6. 6.

    Dagan R, Gradstein S, Belmaker I, Porat N, Siton Y, Weber G, et al. An outbreak of Streptococcus pneumoniae serotype 1 in a closed Community in Southern Israel. Clin Infect Dis. 2000;30:319–21.

    CAS  Article  Google Scholar 

  7. 7.

    Izurieta P, Bahety P, Adegbola R, Clarke C, Hoet B. Public health impact of pneumococcal conjugate vaccine infant immunization programs: assessment of invasive pneumococcal disease burden and serotype distribution. Expert Review Vaccines. 2018;17:479–93.

    CAS  Article  Google Scholar 

  8. 8.

    Diawara I, Zerouali K, Katfy K, Zaki B, Belabbes H, Najib J, et al. Invasive pneumococcal disease among children younger than 5 years of age before and after introduction of pneumococcal conjugate vaccine in Casablanca, Morocco. Int J Infectious Dis. 2015;40:95–101.

    Article  Google Scholar 

  9. 9.

    Pan F, Zhang H, Dong X, Ye W, He P, Zhang S, et al. Comparative genomic analysis of multidrug-resistant Streptococcus pneumoniae isolates. Infection Drug Resistance. 2018;11:659–70.

    CAS  Article  Google Scholar 

  10. 10.

    Jindal HM, Ramanathan B, Le CF, Gudimella R, Razali R, Manikam R, et al. Comparative genomic analysis of ten clinical Streptococcus pneumoniae collected from a Malaysian hospital reveal 31 new unique drug-resistant SNPs using whole genome sequencing. J Biomed Sci. 2018;25.

  11. 11.

    Chaguza C, Cornick JE, Andam CP, Gladstone RA, Alaerts M, Musicha P, et al. Population genetic structure, antibiotic resistance, capsule switching and evolution of invasive pneumococci before conjugate vaccination in Malawi. Vaccine. 2017;35:4594–602.

    CAS  Article  Google Scholar 

  12. 12.

    Chang B, Morita M, Lee K, Ohnishi M. Whole-genome sequence analysis of Streptococcus pneumoniae strains that cause hospital-acquired pneumonia infections. J Clin Microbiol. 2018;56.

  13. 13.

    Demczuk WHB, Martin I, Hoang L, Van Caeseele P, Lefebvre B, Horsman G, et al. Phylogenetic analysis of emergent Streptococcus pneumoniae serotype 22F causing invasive pneumococcal disease using whole genome sequencing. PLoS One. 2017;12:e0178040.

    Article  Google Scholar 

  14. 14.

    Johnson HL, Deloria-Knoll M, Levine OS, Stoszek SK, Freimanis Hance L, Reithinger R, et al. Systematic evaluation of serotypes causing invasive pneumococcal disease among children under five: the pneumococcal global serotype project. PLoS Med. 2010;7:e1000348.

    Article  Google Scholar 

  15. 15.

    Hammitt LL, Etyang AO, Morpeth SC, Ojal J, Mutuku A, Mturi N, et al. Effect of ten-valent pneumococcal conjugate vaccine on invasive pneumococcal disease and nasopharyngeal carriage in Kenya: a longitudinal surveillance study. Lancet. 2019.

  16. 16.

    Williams TM, Loman NJ, Ebruke C, Musher DM, Adegbola RA, Pallen MJ, et al. Genome analysis of a highly virulent serotype 1 strain of Streptococcus pneumoniae from West Africa. PLoS One. 2012;7:e26742.

    CAS  Article  Google Scholar 

  17. 17.

    Lessa FC, Milucky J, Rouphael NG, Bennett NM, Talbot HK, Harrison LH, et al. Streptococcus mitis expressing pneumococcal serotype 1 capsule. Sci Rep. 2018;8.

  18. 18.

    Chaguza C, Andam CP, Harris SR, Cornick JE, Yang M, Bricio-Moreno L, et al. Recombination in Streptococcus pneumoniae Lineages Increase with Carriage Duration and Size of the Polysaccharide Capsule. mBio. 2016;7. doi:

  19. 19.

    Thompson CC, Emmel VE, Fonseca EL, Marin MA, Vicente ACP. Streptococcal taxonomy based on genome sequence analyses. F1000Research. 2013;2:67.

    Article  Google Scholar 

  20. 20.

    Donkor ES, Stabler RA, Hinds J, Adegbola RA, Antonio M, Wren BW. Comparative phylogenomics of Streptococcus pneumoniae isolated from invasive disease and nasopharyngeal carriage from west Africans. BMC Genomics. 2012;13:569.

    Article  Google Scholar 

  21. 21.

    Hiller NL, Sá-Leão R. Puzzling over the pneumococcal Pangenome. Front Microbiol. 2018;9:2580.

    Article  Google Scholar 

  22. 22.

    Obolski U, Gori A, Lourenço J, Thompson C, Thompson R, French N, et al. Identifying Streptococcus pneumoniae genes associated with invasive disease using pangenome-based whole genome sequence typing. preprint. Bioinformatics; 2018. doi:

  23. 23.

    Cornick JE, Chaguza C, Harris SR, Yalcin F, Senghore M, Kiran AM, et al. Region-specific diversification of the highly virulent serotype 1 Streptococcus pneumoniae. Microbial Genomics. 2015;1.

  24. 24.

    Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol. 2010;11:R107.

    CAS  Article  Google Scholar 

  25. 25.

    Golden AR, Adam HJ, Karlowsky JA, Baxter M, Nichol KA, Martin I, et al. Molecular characterization of predominant Streptococcus pneumoniae serotypes causing invasive infections in Canada: the SAVE study, 2011–15. J Antimicrobial Chemotherapy. 2018;73(suppl_7):vii20–31.

    CAS  Article  Google Scholar 

  26. 26.

    du Plessis M, Allam M, Tempia S, Wolter N, de Gouveia L, von Mollendorf C, et al. Phylogenetic analysis of invasive serotype 1 pneumococcus in South Africa, 1989 to 2013. J Clin Microbiol. 2016;54:1326–34.

    Article  Google Scholar 

  27. 27.

    Blomberg C, Dagerhamn J, Dahlberg S, Browall S, Fernebro J, Albiger B, et al. Pattern of accessory regions and invasive disease potential in Streptococcus pneumoniae. J Infect Dis. 2009;199:1032–42.

    CAS  Article  Google Scholar 

  28. 28.

    for the PAGe Consortium, Chaguza C, Cornick JE, Harris SR, Andam CP, Bricio-Moreno L, et al. Understanding pneumococcal serotype 1 biology through population genomic analysis. BMC Infect Dis. 2016;16:649.

    Article  Google Scholar 

  29. 29.

    Harvey RM, Hughes CE, Paton AW, Trappetti C, Tweten RK, Paton JC. The impact of Pneumolysin on the macrophage response to Streptococcus pneumoniae is strain-dependent. PLoS One. 2014;9:e103625.

    Article  Google Scholar 

  30. 30.

    Hanachi M, Kiran A, Cornick J, Harigua-Souiai E, Everett D, Benkahla A, et al. Genomic characteristics of invasive Streptococcus pneumoniae serotype 1 in New Caledonia prior to the introduction of PCV13. Bioinform Biol Insights. 2020;14:117793222096210.

    Article  Google Scholar 

  31. 31.

    Straume D, Stamsås GA, Håvarstein LS. Natural transformation and genome evolution in Streptococcus pneumoniae. Infect Genet Evol. 2015;33:371–80.

    CAS  Article  Google Scholar 

  32. 32.

    Chaguza C, Cornick JE, Everett DB. Mechanisms and impact of genetic recombination in the evolution of Streptococcus pneumoniae. Computational Structural Biotechnol J. 2015;13:241–7.

    CAS  Article  Google Scholar 

  33. 33.

    Chewapreecha C, Harris SR, Croucher NJ, Turner C, Marttinen P, Cheng L, et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet. 2014;46:305–9.

    CAS  Article  Google Scholar 

  34. 34.

    Croucher NJ, Finkelstein JA, Pelton SI, Mitchell PK, Lee GM, Parkhill J, et al. Population genomics of post-vaccine changes in pneumococcal epidemiology. Nat Genet. 2013;45:656–63.

    CAS  Article  Google Scholar 

  35. 35.

    M100-S21: Performance Standards for Antimicrobial Susceptibility Testing; Twenty-First Informational Supplement:172.

  36. 36.

    SPAdes 3.11.1 Manual. Accessed 12 Aug 2019.

  37. 37.

    Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.

    CAS  Article  Google Scholar 

  38. 38.

    Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA, Bentley SD, et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 2015;43:e15.

    Article  Google Scholar 

  39. 39.

    Darling AE, Mau B, Perna NT. progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS ONE. 2010;5:e11147.

    Article  Google Scholar 

  40. 40.

    Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses Bioinformatics 2009;25:1972–1973.

  41. 41.

    Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3.

    CAS  Article  Google Scholar 

  42. 42.

    Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015;31:3691–3.

    CAS  Article  Google Scholar 

  43. 43.

    Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 2010;11:94.

    Article  Google Scholar 

  44. 44.

    vegdist: Dissimilarity Indices for Community Ecologists in vegan: Community Ecology Package. Accessed 19 July 2020.

  45. 45.

    Paradis E. Pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics. 2010;26:419–20.

    CAS  Article  Google Scholar 

Download references


We would like to thank the sequencing teams at the Wellcome Trust Sanger Institute for their expertise. We are also grateful to the Pneumococcal African Genome project (PAGe) core for the support.


This study is a part of the Pneumococcal African Genome project (PAGe) project. PAGe is a Bill and Melinda Gates Foundation funded project (OPP1023440).

Author information




Study Design: NE, KZ, ID, MT, HB. Data Collection and Laboratory analysis: ID, NN, KK, FM, KN. Analyses: NN, ID, PF, MM, JC, CC, ID, FM, KN. Manuscript Writing: NN and ID wrote the manuscript and all authors contributed to revisions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Néhémie Nzoyikorera.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethical committee for biomedical research of the University Mohammed V-Soussi, Faculty of Medicine, Pharmacy and Dental Medicine of Rabat, Morocco (No. 408, 10 May 2013). The patients/legal guardians were informed about the study; they signed a consent form, and the study was carried out in an anonymous way.

Consent for publication

Not Applicable.

Competing interests

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nzoyikorera, N., Diawara, I., Fresia, P. et al. Whole genomic comparative analysis of Streptococcus pneumoniae serotype 1 isolates causing invasive and non-invasive infections among children under 5 years in Casablanca, Morocco. BMC Genomics 22, 39 (2021).

Download citation


  • Streptococcus pneumoniae
  • Serotype 1
  • Whole genome sequencing
  • Comparative genomic analysis