Genomic analysis of nontypeable pneumococci causing invasive pneumococcal disease in South Africa, 2003–2013

Background The capsular polysaccharide is the principal virulence factor of Streptococcus pneumoniae and a target for current pneumococcal vaccines. However, some pathogenic pneumococci are serologically nontypeable [nontypeable pneumococci (NTPn)]. Due to their relative rarity, NTPn are poorly characterized, and, as such, limited data exist which describe these organisms. We aimed to describe disease and genotypically characterize NTPn causing invasive pneumococcal disease in South Africa. Results Isolates were detected through national, laboratory-based surveillance for invasive pneumococcal disease in South Africa and characterized by whole genome analysis. We predicted ancestral serotypes (serotypes from which NTPn may have originated) for Group I NTPn using multilocus sequence typing and capsular region sequence analyses. Antimicrobial resistance patterns and mutations potentially causing nontypeability were identified. From 2003–2013, 39 (0.1 %, 39/32,824) NTPn were reported. Twenty-two (56 %) had partial capsular genes (Group I) and 17 (44 %) had complete capsular deletion of which 15 had replacement by other genes (Group II). Seventy-nine percent (31/39) of our NTPn isolates were derived from encapsulated S. pneumoniae. Ancestral serotypes 1 (27 %, 6/22) and 8 (14 %, 3/22) were most prevalent, and 59 % (13/22) of ancestral serotypes were serotypes included in the 13-valent pneumococcal conjugate vaccine. We identified a variety of mutations within the capsular region of Group I NTPn, some of which may be responsible for the nontypeable phenotype. Nonsusceptibility to tetracycline and erythromycin was higher in NTPn than encapsulated S. pneumoniae. Conclusions NTPn are currently a rare cause of invasive pneumococcal disease in South Africa and represent a genetically diverse collection of isolates. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2808-x) contains supplementary material, which is available to authorized users.


Background
Streptococcus pneumoniae frequently colonizes the nasopharynx asymptomatically and is also a significant human pathogen causing diseases such as otitis media, pneumonia and meningitis [1]. The capsule is a major virulence factor of S. pneumoniae, protecting it from host cell-mediated phagocytosis [2]. The capsule induces protective antibodies which provide protection against pneumococcal disease, and serves as the basis for current pneumococcal vaccines [3]. Based on the structure and antigenicity of the capsule, more than 90 serotypes have been identified [4,5].
Nontypeable pneumococci (NTPn) are not assigned a serotype when using the Quellung reaction [6]. The inability to do so may be due to low-level capsule expression or novel capsule types not detectable by the Quellung reaction, or absence of capsule due to genetic modifications in the capsular polysaccharide synthesis locus (cps) [7][8][9][10][11]. NTPn are predominantly detected in carriage or non-invasive disease episodes, and rarely cause invasive pneumococcal disease (IPD) [12][13][14]. In vitro studies have shown that NTPn display increased adherence to epithelial cells and are more easily transformable compared to encapsulated S. pneumoniae (Ec-Sp) isolates [15,16].
NTPn have been categorized into two groups based on the contents of the cps locus [7]. Group I has at least partial cps genes, whereas in Group II the cps genes are completely deleted and may be replaced with other genes. Group II is further subdivided into four null capsule clades (NCC) [17]. NCC1 has the psk gene which has been shown to play a role in adherence to epithelial cells and colonisation [18]. NCC2 has aliC and aliD genes either with (NCC2a) or without (NCC2b) a putative toxin-antitoxin system (encoded by ntaAB genes). AliC and aliD genes facilitate upregulation of competence for genetic transformation and mediate colonization, respectively [19]. NCC3 has the aliD gene, and NCC4 contains only transposable elements in the cps locus.
There are limited studies describing invasive NTPn [10,12], possibly due to the rarity of these isolates in IPD. The prevalence of NTPn as a cause of IPD has been reported to range from 0.6 to 3 % [10,12]. The majority (~90 %) of invasive NTPn described in the literature is Group I and is thought to be derived from Ec-Sp [10,12]. In contrast, almost all Group II isolates are related to established NTPn lineages [12]. Exploring the potential changes in cps structure and understanding the genomic diversity of NTPn is of importance, especially because current pneumococcal conjugate vaccines (PCV) are not effective against these isolates. We aimed to describe disease and genotypically characterize NTPn causing IPD in South Africa.

Invasive pneumococcal disease surveillance in South Africa, 2003-2013
From 2003 through 2013, 46,485 cases of IPD were reported, of which 32,824 (71 %) had viable isolates. Thirty-nine (0.1 %) were nontypeable by the Quellung reaction and were thus classified as NTPn. Case characteristics are shown in Table 1. Forty-four percent (17/38) of cases were in children <5 years. Two patients with known vaccination status had received one dose of PCV13 and two doses of PCV7, respectively. Thirteen percent (2/15) of patients with known immune status were immunocomprised. Of ten patients with known HIV status, 50 % (5/10) were seropositive. Twenty-two percent (4/18) of patients with known outcome data died.
A total of 28 STs were identified (Table 2). Among Group I isolates, 18 STs were identified: 6 were novel STs and 12 were STs usually associated with Ec-Sp. Among Group II isolates, 10 STs were identified: 7 were novel STs and 3 (NCC2a) were ST344 which is usually associated with carriage NTPn, and ST105 and ST218 (NCC4) which are associated with Ec-Sp. Two CCs were identified among Group I isolates; CC217 (n = 6) and CC53 (n = 3) (Fig. 1). The remainder (n = 12) of Group I isolates were not assigned to any CC, except for ST5604 which is a single-locus variant of a ST105 isolate from Group II. CC344 (n = 6) was identified among Group II isolates. The remainder (n = 10) of Group II isolates were not assigned to any CC.

cps mutations in Group I isolates
Comparisons between Group I isolates and their ancestral serotype cps regions revealed a variety of mutations Nontypeable predicted ancestral serotypes for Group I isolates f Immunocompromising conditions were defined as medical record-documented pre-existing history of head injury, connective tissue disease, asplenia, pregnancy, premature birth, malignancy, burns, gastric acid suppression, aplastic anemia, organ transplant, primary immunodeficiency conditions, chromosomal conditions, protein energy malnutrition, alcohol dependency, current smoking, or immunosuppressive therapy g Comorbid conditions were defined as medical record-documented pre-existing history of pulmonary disease, renal disease, cerebrovascular accident, hepatic disease, cardiac disease, or diabetes mellitus within the cps region of Group I isolates. These mutations are summarized in Table 2, described in detail in Additional file 1 and sketched in Additional file 2. Briefly, predicted ancestral serotype 1 isolates had identical partial deletions of cps genes (NT11, NT12, NT17, NT18, NT45 and NT224). Isolate NT5 also had partial deletions. The remaining isolates had almost all cps genes present, as in their ancestral serotypes; however, we identified a variety of mutations [single nucleotide polymorphisms (SNPs), insertions and deletions] in some of the cps genes. Most isolates had a combination of mutations. The vast majority of SNPs resulted in amino acid changes and a small number in stop codons. Base deletions and insertions were not common as SNPs.

Phylogenomic analysis
Ancestral serotype prediction was further confirmed by phylogenomic clustering of Group I isolates with representative genomes of their predicted ancestral serotypes (Fig. 2). An exception was NT30 which was predicted to be derived from serotype 16F but was more closely related to serotype 11A. For some isolates, there was more divergence between the NTPn and their ancestral serotypes as indicated by the long branches. Group II NCC1 (NT38), NCC4 (NT28 and NT33) and two carriage NTPn (MNZ37 and MNZ11b) from other studies [20] also clustered with invasive Ec-Sp and appear to be related to serotypes 3, 7F, 25A and 15A, respectively. Group II NCC2 NTPn belonging to ST9811 (n = 5) and ST10240 formed a distinct clade separate from other Group II NCC2 isolates and seem to be more closely related to invasive Ec-Sp. The remaining Group II NCC2 NTPn (n = 8) formed a distinct clade with carriage NTPn from other studies (n = 4) [20,21]. Overall, 31/39 (79 %) of NTPn from this study ( [22] Group I, Group II NCC1 (n = 1), NCC4 (n = 2) and six NCC2 isolates belonging to ST9811 (n = 5) and ST10240) appear to have been derived from Ec-Sp.

Discussion
We characterized NTPn causing IPD in South Africa, a middle-income, high HIV prevalence country with PCV introduction in 2009. NTPn represented 0.1 % of IPD cases, much lower than other studies which have reported a prevalence of between 0.6 and 3 % [10,12]. Our study showed that among patients with NTPn infection and for whom data were available, 44 % were children less than 5 years, proportionately more than Ec-Sp (29 %), 50 % were HIV positive versus 76 % for Ec-Sp and the case-fatality ratio (22 %) was similar to cases with Ec-Sp (29 %). Fifty-six percent of our NTPn were Group I, for which the most common predicted ancestral serotypes were 1 (27 %) and 8 (14 %). Fiftynine percent of the predicted ancestral serotypes were PCV13 serotypes. The cps loci of the Group I isolates harbored a variety of mutations. Nonsusceptibility to tetracycline and erythromycin was significantly higher in NTPn than encapsulated S. pneumoniae. Nonsusceptibility to penicillin and erythromycin was also significantly higher in Group II NTPn than Group I.
The mechanisms by which NTPn are able to cause IPD without the capsule are not clear. Immune suppression of the host and/or other lesser known or novel virulence factors might possibly explain the success of NTPn in causing IPD. However, in this study the  were not immunocompromised and this was similar to patients with Ec-Sp strains (69 %). NTPn appear to be equally as virulent as Ec-Sp as the patients with NTPn infection did not seem to be more susceptible to infection.
We observed a higher prevalence (44 %) of Group II isolates amongst our NTPn that has not been described in other studies, where 4 to 10 % of NTPn were Group II isolates [10,12]. The majority of Group II isolates (82 %) were NCC2 and no isolates belonging to NCC3 were identified, a finding similar to invasive NTPn collected through the Active Bacterial Core surveillance program in the United States during 2006-2009 [12]. In addition, two isolates were classified as NCC4, a newly defined clade in Group II, characterized by deletion of all cps genes [12]. Similar to what has been shown in the US Active Bacterial Core surveillance, NCC4 isolates in this study also appeared to be derived from Ec-Sp [12].
We were able to predict ancestral serotypes for all Group I isolates because they had partial cps genes, suggesting that these isolates were derived from Ec-Sp which, at some point, lost their capsule through mutations in the cps locus. Indeed we found a diverse range of mutations in the cps locus of Group I isolates, some of which may be responsible for the nontypeable phenotype. For predicted ancestral serotype 1 isolates, all cps mutations were identical. The deleted region is flanked by identical IS1167 repeats and this, together with the fact that the predicted ancestral serotype 1 isolates were genotypically related, indicates that the cps mutations probably occurred as a result of a single deletion event in a particular strain rather than multiple independent deletions in different strains. The same mutations have been described for serotype 1-derived NTPn in other studies [9,10].
Serotype predictions were confirmed by the phylogenomic analyses as these isolates clustered with encapsulated strains expressing the same serotype as the predicted ancestral serotypes. One isolate (NT30), however, was more closely related to serotype 11A than its predicted serotype 16F. This isolate has a novel ST, which shares three of seven alleles with the 11A isolate, suggesting that NT30 may have been derived from a common ancestor with serotype 11A at some point in time. For some isolates, there was more divergence between the NTPn and their ancestral serotypes. This could be that certain serotypes are inherently diverse while others are more stable.
Predicted ancestral serotypes 1 (27 %) and 8 (14 %) were most prevalent among our NTPn isolates. Analysis of invasive NTPn isolates from Native American communities collected from 1994-2007 showed that predicted ancestral serotype 1 (19 %) and 7 F (15 %) were most prevalent [10]. In addition, serotype 8 (25 %) was most prevalent among invasive NTPn from the US Active Bacterial Core surveillance [12]. These data suggest that serotypes 1 and 8 may be more prone to mutations in their cps region than other serotypes.
Two of the NTPn isolates (NT224 and NT225) were detected in mixed infections in two patients with encapsulated serotype 1 and 18C isolates, respectively. For the serotype 1 and NT224 mixed infection, S. pneumoniae was identified from the same culture, with the NT isolate representing approximately 98 % of the culture (using the Quellung reaction). After several attempts to separate the two variants, only the NTPn isolate could be obtained, however, real-time PCR confirmed the presence of the serotype 1 in the mixed culture. NT224 was confirmed to be ST217 which is associated with serotype 1. For the serotype 18C and NT225 mixed infection, both isolates were available and NT225 had the same cps region, with the exception of a single SNP, compared to its encapsulated counterpart (serotype 18C), the same ST and their core genomes were identical (unpublished observations).
Pneumococcal serotypes targeted by PCV have the potential to switch from vaccine serotype to non-vaccine serotype or to nontypeable, thereby providing a mechanism whereby they may be able to evade vaccine pressure. However, no evidence of adaptation to PCV occurred among invasive NTPn from Native American communities, where 45 % of NTPn isolates had ancestral PCV7 serotypes pre-vaccine and none of their NTPn isolates were ancestral PCV7 serotypes post-vaccine [10]. Although 59 % of the predicted ancestral serotypes of NTPn in our study were PCV13 serotypes, there is no evidence to suggest that serotype switching occurred as a result of PCV pressure as the majority of NTPn were already present prior to PCV introduction.
Antimicrobial nonsusceptibility data for invasive NTPn are limited. In our study, there was a significantly higher prevalence of nonsusceptibility among NTPn than Ec-Sp for erythromycin and tetracycline. These higher nonsusceptibility rates coupled with higher transformation rates of NTPn [16] enable these strains to serve as a reservoir of antibiotic resistance genes. A serotype 19F clone from Switzerland became increasingly resistant to penicillin by acquisition of a pbp2x gene from NTPn [23]. Recently, a comparative genome analysis of over 3000 carriage pneumococci from a refugee camp in Thailand showed that the highest rates of receipt and donation of recombinant of DNA occurred in NTPn and that the most commonly exchanged genes were those associated with antibiotic resistance and immune interactions [24].
It is possible that our NTPn isolates may have lost the ability to express capsule during sub-culturing in the laboratory, but we believe this was not the case as we would expect NTPn to be detected more frequently. We did not perform in vitro experiments to confirm whether the mutations identified in the cps locus of our isolates were responsible for the lack of capsule expression. However, in some of these isolates, there were complete deletions of cps genes making expression of a capsule impossible. For Ec-Sp isolates different methods [agar dilution or Etest (2003[agar dilution or Etest ( -2008 or broth microdilution (2009-2013)] were used for MIC testing compared to NTPn (broth microdilution). This may have overestimated the prevalence of β-lactam resistance in NTPn for the 2003-2008 period as broth microdilution detects more resistance than agar dilution as was previously shown [25].

Conclusion
NTPn are currently a rare cause of IPD in South Africa and are genetically diverse but have a higher prevalence of antimicrobial resistance than Ec-Sp. The majority of NTPn were derived from Ec-Sp, of which a significant proportion was PCV13 serotypes. Further studies which explore capsule-independent virulence mechanisms of NTPn should be considered.

Bacterial isolates
Isolates were obtained through the GERMS-SA (Group for Enteric, Respiratory and Meningeal Disease Surveillance in South Africa) network, which conducts active, national, laboratory-based surveillance for IPD. Surveillance began in 1999 and was enhanced in 2003 [26,27]. Over 200 diagnostic laboratories across South Africa routinely submit clinical isolates and basic patient demographic data to the network. A case of IPD was defined as the detection of S. pneumoniae from a normally sterile site (e.g., cerebrospinal fluid, blood) from January 2003 through December 2013. Isolates were cultured on 5 % horse blood agar (Diagnostic Media Products, Johannesburg, South Africa) and incubated at 37°C in 5 % CO 2 for 18-24 h. Optochin sensitivity and bile solubility assays were performed to confirm S. pneumoniae identification. In addition, for NTPn isolates, real-time PCR detecting lytA was performed [28]. A mixed infection was defined as simultaneous isolation of at least two serotypes (including nontypeable isolates) either from the same normally-sterile site specimen or from two or more normally-sterile site specimens obtained from the same patient within 21 days of each other. Such cases were detected during routine serotyping by Quellung if an isolate reacted partially with a specific antiserum pool.

Serotyping and antimicrobial susceptibility testing
Serotyping was performed by the Quellung reaction using serotype-specific antisera (Statens Serum Institut, Copenhagen, Denmark). Minimum inhibitory concentrations (MIC) for NTPn isolates were determined by broth microdilution using commercially customized TREK panels (Trek Diagnostics Inc., Ohio, United States).
MICs for Ec-Sp isolates were determined as previously described [25,27]. MIC breakpoints were interpreted using Clinical and Laboratory Standards Institute guidelines [29]. For penicillin, isolates with MICs of ≥0.12 mg/L were defined as nonsusceptible. Isolates that were intermediately resistant or resistant to any of the antibiotics were regarded as nonsusceptible. Multidrug resistance was defined as nonsusceptibility to three or more classes of antibiotics.

DNA preparations, genome sequencing and assembly
Brain heart infusion broth (Diagnostic Media Products) was inoculated with overnight cultures and incubated overnight at 37°C in 5 % CO 2 . A pre-lysis step was performed by suspending bacterial pellets in 200 μl of 10 mg/ml lysozyme (Sigma-Aldrich, St. Louis, MO, USA) and incubating at 37°C for 1 h. Genomic DNA was extracted using the QIAamp® DNA Mini kit (QIAGEN, Hilden, Germany). Paired-end libraries (2 x 300 bp) were prepared using the Nextera XT DNA sample preparation kit (Illumina, San Diego, CA, USA) and sequencing was performed on an Illumina MiSeq. The resulting pairedend reads were quality trimmed and mapped to the reference genome of S. pneumoniae ATCC 700669 [GenBank accession no. FM211187, serotype 23 F, sequence type (ST) 81] using CLC Genomics Workbench v8 (CLC bio, Aarhus, Denmark), giving on average 121x depth of coverage and 88 % coverage of the reference genome. De novo assembly was performed for all isolates using CLC Genomics Workbench and ordered relative to S. pneumoniae ATCC 700669 using Mauve [30]. The ordered contigs were concatenated and annotated using Prokka v1.11 [31].
cps regions and multi-locus sequence typing (MLST) The cps locus and seven MLST alleles were derived from whole genome data. The cps loci were used to categorize NTPn into different groups/clades [12,17]. MLST allele numbers and sequence type (ST) were assigned using the Bio-MLST-Check module (http://search.cpan.org/~ajpage/ Bio-MLST-Check 1.133090/lib/Bio/MLST/Check.pm). The eBURST v3 algorithm (http://eburst.mlst.net) was used to determine genetic relationships and grouped isolates into clonal complexes (CC) [32]. A CC was defined as a group of isolates sharing six of seven alleles with any other isolate in the group.

In silico prediction of ancestral serotypes for Group I NTPn
To predict ancestral serotypes of Group I isolates (serotypes from which NTPn may have originated), two methods were used, namely, serotype-ST associations using the MLST database (http://pubmlst.org/spneumoniae), accessed March 2015, and blasting of the cps locus against reference cps loci for the 94 known serotypes (GenBank accession no's. CR93162 -CR931722, EF538714, HM171374, GU074953 and JQ653094). The serotype that gave the highest BLAST bit score was assigned as the likely ancestral serotype.

cps mutations in Group I NTPn
To determine the genetic variations within the cps locus of Group I isolates that may be responsible for their phenotypic nontypeability, we used the alignment module in CLC to align the cps locus of each isolate together with a number of its predicted ancestral serotype cps loci shown in Additional file 3.

Phylogenomic tree construction
To confirm predicted ancestral serotypes for Group I isolates, a maximum likelihood phylogenetic tree was constructed based on core genome SNPs of NTPn from this study (n = 39), together with invasive Ec-Sp with different serotypes from South Africa (n = 42) Additional file 4 matching the predicted ancestral serotypes for the NTPn isolates, and carriage NTPn from other countries (n = 6) [20,21]. The core genome alignment module in the rapid large-scale prokaryote pan genome analysis pipeline was used to extract predicted coding regions from annotated assemblies and convert them to protein sequences [33]. All protein sequences were compared with each other using BLASTP. Proteins that had alignment similarity of ≥70 % and were present in at least 90 % of the isolates were defined as the core genome. RAxML was used to create a phylogenetic tree from the resulting core genome alignment and this was visualized in Figtree v1.4.2 [34].

Statistical analysis
Differences between NTPn versus Ec-Sp antimicrobial nonsusceptibility and between Group I and Group II NTPn were analyzed using the chi-square test and Fisher's exact test, respectively, with statistical significance assessed at P < 0.05. Statistical analyses were performed with GraphPad InStat 3.