Non-typeable pneumococci circulating in Portugal are of cps type NCC2 and have genomic features typical of encapsulated isolates

Background Pneumococcus is a major human pathogen and the polysaccharide capsule is considered its main virulence factor. Nevertheless, strains lacking a capsule, named non-typeable pneumococcus (NT), are maintained in nature and frequently colonise the human nasopharynx. Interest in these strains, not targeted by any of the currently available pneumococcal vaccines, has been rising as they seem to play an important role in the evolution of the species. Currently, there is a paucity of data regarding this group of pneumococci. Also, questions have been raised on whether they are true pneumococci. We aimed to obtain insights in the genetic content of NT and the mechanisms leading to non-typeability and to genetic diversity. Results A collection of 52 NT isolates representative of the lineages circulating in Portugal between 1997 and 2007, as determined by pulsed-field gel electrophoresis and multilocus sequence typing, was analysed. The capsular region was sequenced and comparative genomic hybridisation (CGH) using a microarray covering the genome of 10 pneumococcal strains was carried out. The presence of mobile elements was investigated as source of intraclonal variation. NT circulating in Portugal were found to have similar capsular regions, of cps type NCC2, i.e., having aliB-like ORF1 and aliB-like ORF2 genes. The core genome of NT was essentially similar to that of encapsulated strains. Also, competence genes and most virulence genes were present. The few virulence genes absent in all NT were the capsular genes, type-I and type-II pili, choline-binding protein A (cbpA/pspC), and pneumococcal surface protein A (pspA). Intraclonal variation could not be entirely explained by the presence of prophages and other mobile elements. Conclusions NT circulating in Portugal are a homogeneous group belonging to cps type NCC2. Our observations support the theory that they are bona-fide pneumococcal isolates that do not express the capsule but are otherwise essentially similar to encapsulated pneumococci. Thus we propose that NT should be routinely identified and reported in surveillance studies. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-863) contains supplementary material, which is available to authorized users.

Results: A collection of 52 NT isolates representative of the lineages circulating in Portugal between 1997 and 2007, as determined by pulsed-field gel electrophoresis and multilocus sequence typing, was analysed. The capsular region was sequenced and comparative genomic hybridisation (CGH) using a microarray covering the genome of 10 pneumococcal strains was carried out. The presence of mobile elements was investigated as source of intraclonal variation. NT circulating in Portugal were found to have similar capsular regions, of cps type NCC2, i.e., having aliB-like ORF1 and aliB-like ORF2 genes. The core genome of NT was essentially similar to that of encapsulated strains. Also, competence genes and most virulence genes were present. The few virulence genes absent in all NT were the capsular genes, type-I and type-II pili, choline-binding protein A (cbpA/pspC), and pneumococcal surface protein A (pspA). Intraclonal variation could not be entirely explained by the presence of prophages and other mobile elements. Conclusions: NT circulating in Portugal are a homogeneous group belonging to cps type NCC2. Our observations support the theory that they are bona-fide pneumococcal isolates that do not express the capsule but are otherwise essentially similar to encapsulated pneumococci. Thus we propose that NT should be routinely identified and reported in surveillance studies.

Background
Pneumococcus is a major human pathogen, causing a wide range of infections from otitis media to bacteraemia and meningitis. Its main virulence determinant is a polysaccharide capsule that surrounds pneumococcal cells, providing protection against phagocytosis [1]. Together with colony morphology, susceptibility to optochin, and bile solubility, assignment of a serotype (based on the capsular type) has been traditionally the ultimate assay to identify pneumococcus [2]. To date, more than 95 serotypes have been described and, with the exception of type 37, the genes responsible for the expression of the capsule are located in the chromosome between the dexB and aliA genes (capsular region) [1,3]. The pneumococcal capsule is also the target of all currently available pneumococcal vaccines [4].
Pneumococci lacking a polysaccharide capsule are known to exist in nature and are frequent inhabitants of the upper respiratory tract of humans [5]. Although these isolates, often named non-typeable pneumococcus (NT), are mostly asymptomatically carried in the nasopharynx, they have also been associated with conjunctivitis outbreaks and sporadically associated with other disease manifestations including invasive disease [6][7][8][9]. Studies have suggested, using a combination of phenotypic and genotypic methods, that some of these isolates are bona-fide pneumococci and share common properties with encapsulated pneumococci [5,10]. Also, in vitro studies with non-encapsulated pneumococci have shown that these strains display increased adherence to epithelial tissue, increased capacity for biofilm formation, and are highly transformable [11][12][13]. Hence, high carriage rates combined with high transformability rates may provide NT with the features needed to play an important role in the evolution of pneumococcus as recently proposed by Chewapreecha, et al. [14].
In a previous study, we have described the population structure of NT strains in Portugal and identified major lineages associated with them [5]. In parallel, others have identified the same lineages in circulation in other geographical settings and the capsular region of NT has been characterised [10,[15][16][17]. Based on the capsular region, NT have been proposed to be divided in two groups: Group I includes isolates with a disrupted or non-functional capsular locus and Group II includes isolates with genes not found in conventional capsular types [17]. Group II NT have been proposed to be further divided into cps types NCC1, when isolates have the pspK gene (pneumococcal surface protein Korea, also referred to as nspA, non-typeable pneumococcal surface protein A), encoding for a novel pneumococcal surface protein with several features suggesting a role in cell adhesion and enhanced colonisation, and NCC2, when isolates have both the aliB-like ORF1 and aliB-like ORF2 genes, predicted to encode for lipoproteins [15][16][17][18]. A cps type NCC3 has also been described for isolates with aliB-like ORF2 but not aliB-like ORF1, but these were shown not to be pneumococci [15].
The observation that several distinct clonal lineages lacking the capsule operon have been in circulation for decades and are not derived from encapsulated strains has raised the question of how different is the genome of these strains compared to encapsulated pneumococci [5,8]. The aim of this study was to characterise a carriage collection of NT circulating in Portugal in a period of 11 years to obtain insights into the genetic basis of nontypeability and their genomic content and diversity.

Capsular region of NT
To obtain insights into the genetic basis of non-typeability, the capsular region was characterised for a set of 42 NT strains representative of the lineages detected in crosssectional colonisation studies conducted in Portugal among children between 1997 and 2007 (Table 1). Amplification of this region yielded, in all strains, a fragment of 6,000-8,500 bp. To investigate the heterogeneity of the capsular region, restriction fragment length polymorphism (RFLP) patterns were determined by digestion with HinfI. Nine different patterns could be distinguished after digestion with HinfI ( Figure 1, Table 1). We then selected 13 isolates, representative of the different capsular RFLP patterns found in each CC, for sequencing. The findings are summarised in Figure 2 that shows a schematic organisation of the locus compared to strains previously described by Hathaway, et al. [17]. All strains had aliB-like ORF1, aliB-like ORF2, and capN-like regions; eight had the doc-like region between capN-like and aliA. Based on the classification previously proposed by Park, et al. [15], the strains were therefore classified as belonging to cps type NCC2a (eight isolates containing the doc-like region) or NCC2b (the remaining five isolates). Of the eight strains belonging to cps type NCC2a, two had an insertion of a tnp region of~1.7 kb between dexB and aliB-like ORF1 previously described [15,16].

Candidate core genome
To determine if the genome content of NT strains is comparable to that of encapsulated strains, 34 NT representing the diversity of profiles identified by PFGE, MLST, and characterisation of the capsular region, were characterised by CGH using an array that covers the genome of nine encapsulated pneumococcal strains and R6 (a non-encapsulated derivative of D39) (Additional file 1). From the 3,052 genes present in the array, 1,666 (54.6%) were present in all NT tested, 839 (27.5%) were present in some, and 547 (17.9%) were absent in all (Additional file 2). In an independent analysis, conducted in the framework of an ongoing study, 180 encapsulated strains were analysed by CGH. These strains were representative of 20 serotypes and included all strains in the array (except R6). Results from this analysis were used for comparison. In this collection, 1,654 genes (54.2%) were present in all strains, the same proportion found for the NT isolates. Of these 1,654 genes, 1,499 (90.6%) were also present in all NT isolates (Additional file 2). Among the remaining 155 genes, 149 were present in some (but not all) NT and only 6 were absent in all. The proportion of these 155 genes present in the NT strains ranged between 80.0% and 58.7% (Additional file 3). The 149 genes with variable presence among NT strains could be grouped into the following functions: 22.8% cellular metabolism, 16.1% transporters, 8.7% DNA metabolism, 7.4% phages and mobile elements, 2.0% surface proteins, 2.0% signalling and communication, and 41.0% were annotated as hypothetical proteins. The six genes absent in all NT were SP_0346 (annotated as capsular polysaccharide biosynthesis protein Cps4A), SP_0368 (cell wall surface anchor family protein), SP_1153 (hypothetical protein), SP_2157 (alcohol dehydrogenase, iron-containing), SP_2158 (L-fucose isomerase), and SP_2168 (fucose operon repressor, putative). Furthermore, NT isolates contained between 2,049 and 2,120 genes detected by CGH with an average of 2,095 genes, while the 180 encapsulated strains had between 2,119 and 2,306 genes with an average of 2,235. Based on these experiments, although the size of "core" genomes of NT versus encapsulated strains was comparable, NT strains characterised in this study had 6.3% less genes detected by CGH than encapsulated strains.
Nine choline binding proteins have been implicated in virulence, and all were present on the array [20,27,33,34]. Of these, cbpD, cbpE/pce, lytA, lytB, and lytC were present in all strains, with cbpA/pspC and pspA being absent in all strains. Variation between CCs was found for cbpF, cbpG and pcpA (Table 3).
In addition, 12 genes implicated in colonisation were present on the array. Of these, pavA, eno, pyrR, strH, trpG, rr01, and SPY2053 were present in all NT, while rlrA was absent in all strains. Clonal variation was found for genes hyl, nanA, bgaA, and phoU (Table 3).
Among other major virulence factors, ply, psaA, htrA, IgA, and spxB were present in all strains with variations between clones found for the operons piuA-D and piaA-D and zmpB.
Further details on the variable presence of virulence genes can be found in Additional file 2.

Intraclonal variation
Comparison of SmaI-PFGE patterns of NT strains resulted in an unexpected high diversity of profiles for strains belonging to the same ST ( Figure 5) [5]. Likewise, there were also strains with similar PFGE profiles belonging to different STs. This lack of concordance was puzzling, as previous studies have found a good general agreement with PFGE and MLST for encapsulated pneumococci [35]. To investigate possible genomic variations that could account for the lack of concordance found between PFGE and MLST results, CGH results were compared for strains belonging to the same CC. For any given CC, all strains analysed shared at least 72% of the genes detected in the NT pool ( Figure 6).
When we looked at intraclonal diversity, within each CC, variation between strains was mostly due to only a few (if any) genes. Still, exceptions were found: strains PT944 of CC344, PT4014 of CC1156, and DCC2787 of CC941 had 162, 144, and 244 genes, respectively, uniquely present in their genomes compared to other strains of the same CC. Also, the two strains of CC1618 were found to differ from each other in more than 400 genes.
When looking for the functions of genes uniquely present in one strain of a given CC, most were found to encode for hypothetical proteins (51.3%). Other genes had the following functions: transport and secretion (13.4%), cell metabolism (9.9%), phages and mobile elements (9.5%), DNA metabolism (7.8%), cell wall, cell membrane, and cell division (3.8%), signalling and communication (2.7%), and stress (1.5%). Furthermore, only 10.2% of this latter group of genes have been described as virulence genes. Not surprisingly, close to half of these genes were found in ARs (44.4%). Dark yellow -AR is present; light yellowmore than 50% of the genes in the AR are present; white -50% of the genes in the AR are present; light bluemore than 50% of the genes in the AR are absent; dark blue -AR is absent.
To investigate if the high variability of PFGE types found could be due to the presence of prophages, as previously reported [36], or the presence of other mobile elements, we evaluated their distribution among NT strains (Figure 7). In some cases, e.g. NT1, NT2, and NT6 of ST344 or NT22 and NT24 of ST1153, the content of mobile elements was indeed distinct between strains, which might explain the variability found. However, in other cases, such as NT2, NT3, NT5, NT8, and NT11 of ST344 and NT15 and NT16 of ST448, the strains shared the same mobile elements. On the other hand, examples of strains belonging to the same PFGE type and ST but with different mobile elements' profiles were also observed (e.g. NT17 of ST448). To complement this analysis, the presence of prophages was also determined by lytA hybridisation (Additional file 4). In ST344, the six PFGE types tested exhibited three lytA hybridisation patterns, whereas the two ST448 PFGE types tested showed the same lytA hybridisation pattern. According to these results, the high variability of PFGE types observed within STs could not be entirely explained by the presence of prophages or other mobile elements.

Discussion
In this study we aimed to characterise the genomic content of a collection of NT strains representative of the carriage lineages circulating in Portugal in a period of 11 years (1997-2007). Strains were analysed by CGH against a panel of 10 pneumococcal strains and their capsular region was sequenced. According to their capsular regions, strains in this study could be classified as NCC2, as they all contained aliB-like genes [15]. Strains with similar capsular regions have also been identified in carriage and disease isolates circulating in Switzerland, the Netherlands, UK, USA, Brazil, South Korea, Thailand, and the Gambia [15][16][17]37]. In our collection we did not find isolates of cps type NCC1 (containing the pspK/nspA gene) and we did not include NT strains derived from encapsulated lineages that had alterations in the capsular operon leading to absence of capsular production (Group I NT). Of interest, a recent study by Park, et al. aimed to characterise invasive NT strains from the USA. The authors reported that these strains are rare, accounting for less than 1% of the invasive pneumococcal disease cases, and most are of Group I NT, with only a few cases caused by NCC2 NT. Nonetheless, it has been clearly demonstrated that NCC2 NT are capable of causing invasive disease and therefore should not be disregarded [17,37].
In relation to core genome, 54.6% of the genes represented on the array were found in all NT strains, the same proportion found for a collection of 180 encapsulated strains used for comparison (54.2%). However, the average number of total genes detected in the NT strains (2,095) was 6% less than the corresponding value found for encapsulated strains. Still, this result should be interpreted with caution as, by using a CGH approach, NT genes were probably missed to an unknown extent.
Twenty-five new ARs, dispersed around the TIGR4 genome, were identified in this study. Of the 66 ARs identified to date, only seven were absent in all NT and encoded for genes associated with sugar metabolism, capsular synthesis, type-I pilus, and hypothetical proteins [19]. Also, more than 90% of the virulence factors identified in pneumococcus were found in NT. The most relevant virulence factors absent from all NT were the capsular genes and type-I pilus (referred to above), type-II pilus, choline-binding protein A (cbpA/pspC), and pneumococcal surface protein A (pspA) [23]. Also absent in the majority of NT was the major iron ABC transport system piaA-D. However, piuA-D, a second iron ABC transport system, was present in the majority of NT. Mutations in these systems have been shown to result in mild (piuA-D) to moderate (piaA-D) reduction in virulence [38]. Together with the lack of capsule and other important virulence genes, the absence of these genes in NT should contribute to a lower propensity of NT to cause disease.
As expected, all strains had all competence genes, including the newly described transformation pilus [14,31,39]. According to the type of competence stimulating peptide (CSP, encoded by comC) secreted by pneumococcal strains, strains can be divided in pherotypes. The dominant pneumococcal pherotypes are CSP1 and CSP2, respectively found in 60-75% and 25-40% of carriage or clinical isolates [40,41]. In NT, the dominant pherotype was CSP2 (65% of the strains), with the remaining strains belonging to pherotype CSP1. In our study, pherotype was a clonal property, with all strains within a CC belonging to the same pherotype. The same association was previously observed in encapsulated pneumococcus [42]. These results further support that NT are bona-fide pneumococci, in contrast with atypical strains of ambiguous speciation, where multiple ComC alleles can be found [43].
To explore the reasons underlying the observation that NT had highly variable PFGE profiles in contrast to   relatively conserved STs, we assessed whether the presence of prophages or other mobile elements could account for these observations. Although that seemed to be the case in some strains, the presence of these mobile elements could not entirely explain the variability found in NT isolates, at least with the approaches that were used. A more detailed characterisation of phage presence, such as the prophage typing system proposed by Romero, et al., could have provided additional information but was beyond the purpose of this study [44,45]. Our study has a major limitation. Information obtained by CGH is restricted to what is present in the array and     therefore limited by nature. Still, interesting information regarding variability and presence/absence of pneumococcal genes implicated in virulence was obtained, providing further hypothesis related to the low disease capacity of these strains. Our study has also some strengths. The thorough characterisation of a representative collection of NT circulating in Portugal for over a decade provided insight on the most frequent features of the lineages in circulation and definitely supported the inclusion of these strains as part of the pneumococcal population.

Conclusions
NT circulating in Portugal are a homogeneous group belonging to cps type NCC2. Our observations support that this group are bona-fide pneumococcal isolates that do not express the capsule but are otherwise essentially similar to encapsulated pneumococci, having a comparable core genome and most virulence factors. Given that NT are not targeted by current pneumococcal vaccines and that they are highly transformable, we recommend that these isolates are routinely identified and reported in surveillance studies monitoring pneumococcal serotype evolution.

Ethics statement
Approval for the original studies [5,46,47] was obtained from the Ministry of Education. The studies were registered and approved at the Health Care Centre of Oeiras that reports to Administração Regional de Saúde (ARS; "Regional Health Administration") of Lisboa and Vale do Tejo from the Ministry of Health. Signed informed consent was obtained from parents/guardians of participating children. All samples were coded numerically upon collection and processed anonymously. In the present study, only bacterial isolates were characterised (no human subjects, human material or human data were used). Thus, ethical approval was not required.

Study collection
We selected 52 NT strains for detailed characterisation. This collection was extracted from a total of 422 NT strains isolated between 1997 and 2007 from the nasopharynx of preschool children attending day-care centres in Lisbon, Portugal. The isolates were previously characterised by PFGE, MLST, and antibiotic susceptibility to penicillin, amoxicillin, ceftriaxone, erythromycin, clindamycin, tetracycline, chloramphenicol, and trimethoprim sulfamethoxazole (SXT) [5,46,47]. The 52 strains characterised in this study were selected to cover the diversity of profiles observed among the 422 isolates, as determined by PFGE, MLST and antibiotyping. CCs were defined based on goe-BURST classification [48].

DNA extraction
Total genomic DNA was isolated using either the DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany), or the High Pure PCR Template Preparation kit (Roche Diagnostics GmbH, Mannheim, Germany), according to the manufacturers' recommendations.

Characterisation of the capsular (dexB-aliA) region
The dexB-aliA region, corresponding to the capsular region in encapsulated pneumococci, was amplified by PCR using the primers described by Kilian RFLP signatures of the capsular region were determined after digestion of 15 μL of purified PCR fragments with HinfI or StyI for 3 h at 37°C. For a total volume of 20 μL, 5U of enzyme, 1x NEBuffer (New England Biolabs), and 2 μg of BSA (for StyI) were added. Results were analysed by gel electrophoresis and Bionumerics software (version 3.0, Applied Maths, Gent, Belgium). Patterns were clustered by UPGMA and a dendrogram was generated from a similarity matrix calculated using the Dice similarity coefficient with an optimisation of 0.5% and a tolerance of 1.0%. RFLP patterns determined by digestion with HinfI were arbitrarily named A to H.
Sequences were analysed and aligned using the Lasergene software (DNASTAR Inc., Madison, WI, USA). Nucleotide sequences of the capsular region were further analysed by performing a nucleotide BLAST search at the National Center for Biotechnology Information Website against the nucleotide database and also against the capsular region sequences previously described for NT strains [15][16][17]50].

CGH
Microarrays used in this study were 12x135K NimbleGen arrays (Roche). Labelling, hybridisation, and washing of the samples was done as recommended by the manufacturer using a NimbleGen microarray workflow (Roche): 1 μg of DNA from each strain was fluorescently labelled with Cy3 Random Nonamers using the NimbleGen One-Color DNA Labeling kit, samples were hybridised to the microarray slide using the NimbleGen Hybridization System, slides were washed using the NimbleGen Wash Buffer kit, and CGH data was acquired on a NimbleGen MS 200 Scanner. Normalisation and background correction of data was done by quantile RMA analysis using the ArrayStar software (DNASTAR). A cut-off of 512 was reached by drawing a graph of frequencies of signal intensities for all strains. Genes with signal intensities of 512 or above were considered present (assigned 1) and genes with signal intensities bellow that value were considered absent (assigned −1) from a given strain.

Validation of the microarray
The microarray used was designed based on the genome sequence of 10 pneumococcal strains: TIGR4, R6, D39, BHN100, CBR206, LGST215, BHN191, BHN418, Sp14-BS69, and Sp3-BS71 [51][52][53][54][55][56][57][58]. Triplicates of probes representing genes present in these strains were added sequentially resulting in 3,052 non-redundant ORFs. Nine of the 10 strains represented in the array were hybridised with it for validation. Only 16 of 3,052 (0.52%) ORFs present in the microarray gave false negative results (Additional file 6). Most of these genes encoded for hypothetical proteins or mobile elements that might have been lost (during repeated handling). None of the 16 genes were part of the core genome, were related to virulence or located in ARs.

ARs
The presence of ARs (or regions of diversity) previously identified (reviewed in [19]) was investigated for NT strains. New ARs were identified as defined by Tettellin and Hollingshead: three or more contiguous genes in the TIGR4 genome that were absent from at least one of the analysed strains [59]. Classification of new ARs followed the nomenclature proposed by Blomberg, et al. and was done sequentially [59].

Detection and characterisation of genes by PCR
The presence of genes comC, comD, and piaA and the presence of type-I and type-II pili was assessed by PCR and characterised by sequencing when needed.  [40,60]; the presence of piaA was assessed as described by Whalan, et al. [61], and the presence of type-I and type-II pili as described by Zahner, et al. [62].
Prophage detection by southern hybridisation of PFGE restriction profiles with a lytA probe Preparation of chromosomal DNA, digestion with SmaI endonuclease, and separation of DNA fragments by PFGE were carried out as previously described [63]. Southern blotting of PFGE gels with a probe for the lytA gene was performed as previously described [36].