The variome of pneumococcal virulence factors and regulators
BMC Genomics volume 19, Article number: 10 (2018)
In recent years, the idea of a highly immunogenic protein-based vaccine to combat Streptococcus pneumoniae and its severe invasive infectious diseases has gained considerable interest. However, the target proteins to be included in a vaccine formulation have to accomplish several genetic and immunological characteristics, (such as conservation, distribution, immunogenicity and protective effect), in order to ensure its suitability and effectiveness. This study aimed to get comprehensive insights into the genomic organization, population distribution and genetic conservation of all pneumococcal surface-exposed proteins, genetic regulators and other virulence factors, whose important function and role in pathogenesis has been demonstrated or hypothesized.
After retrieving the complete set of DNA and protein sequences reported in the databases GenBank, KEGG, VFDB, P2CS and Uniprot for pneumococcal strains whose genomes have been fully sequenced and annotated, a comprehensive bioinformatic analysis and systematic comparison has been performed for each virulence factor, stand-alone regulator and two-component regulatory system (TCS) encoded in the pan-genome of S. pneumoniae. A total of 25 S. pneumoniae strains, representing different pneumococcal phylogenetic lineages and serotypes, were considered. A set of 92 different genes and proteins were identified, classified and studied to construct a pan-genomic variability map (variome) for S. pneumoniae. Both, pneumococcal virulence factors and regulatory genes, were well-distributed in the pneumococcal genome and exhibited a conserved feature of genome organization, where replication and transcription are co-oriented. The analysis of the population distribution for each gene and protein showed that 49 of them are part of the core genome in pneumococci, while 43 belong to the accessory-genome. Estimating the genetic variability revealed that pneumolysin, enolase and Usp45 (SP_2216 in S. p. TIGR4) are the pneumococcal virulence factors with the highest conservation, while TCS08, TCS05, and TCS02 represent the most conserved pneumococcal genetic regulators.
The results identified well-distributed and highly conserved pneumococcal virulence factors as well as regulators, representing promising candidates for a new generation of serotype-independent protein-based vaccine(s) to combat pneumococcal infections.
Streptococcus pneumoniae, also known as the pneumococcus, is a Gram-positive, α-hemolytic and facultative aerobic bacterium. This microorganism is normally found as a harmless commensal in the upper respiratory tract of humans. Pneumococi have a great epidemiological importance due to their high impact on public health, causing more than one and a half million of deaths per year around the world . S. pneumoniae is the main etiologic agent of community-acquired pneumonia. However, this is not its only clinical manifestation, because other kind of diseases such as otitis media, sinusitis, septicemia and meningitis are also caused by this pathogen and associated with high mortality rates .
Given the particular biochemical and molecular features of Streptococcus pneumoniae (Gram-positive, catalase-negative, optochin-sensitive and bile-soluble bacteria), its identification process in the laboratory is relatively simple. Nevertheless, the great molecular, biochemical and immunological diversity of its capsule and other antigens such as choline-binding proteins make them one of the hardest bacterial pathogens to face because of its variability [3, 4]. The “Quellung Reaction”, developed over 100 years ago by Neufeld, allows the specifical and reliable identification of each one of the >94 serotypes that have been discovered up to date. The capsular polysaccharide is the sine qua non virulence factor, however the pathogenic potential of serotypes may vary and similarly, the frequencies or prevalence varies from one geographic region to the other . Despite this, the capsule is not the only factor required to induce disease by S. pneumoniae. In fact, the surface of the pneumococcus is decorated by various proteins, which have been already associated with its high pathogenic potential. In addition, their interaction level with the host cellular receptors has been proved, exhibiting crucial pathogenic functions such as adhesion, colonization, breaching tissue barriers and immune evasion .
An important group of regulatory proteins of great interest are the histidine kinases (HK), located in the bacterial surface and functioning as the sensors of two-component regulatory systems (TCS). The sensing of environmental signals via TCS, regulates the genetic expression of cellular processes that are of great importance such as natural competence, antibiotic resistance, adaptation to different environmental situations, surface proteins expression, and others [7, 8]. In general, TCS are composed of a histidine kinase, a membrane protein sensing the extracellular signals and transmitting these signals to a cytoplasmatic regulator/effector protein refered to as response regulator (RR). This happens via the HK autophosphorylation and a subsequent trans-phosphorylation process. In Streptococcus pneumoniae, 13 TCS and one orphan RR have been identified .
The relevance of the cellular, physiological and pathogenic functions that these pneumococcal proteins fulfill, have aroused a great scientific and biotechnological interest, given their potential pharmaceutical applications as vaccine candidates . Nowadays, the antibiotic treatment of the infections caused by the pneumococcus is often complicated due to the increase of antibiotic resistance . Furthermore, prevention by the use of the pneumococcal polysaccharide vaccines and/or pneumococcal conjugated vaccines only helps to control the disease caused by some of the serotypes and has an indirect impact on colonization . Thus, there is an urge to define more global and effective strategies for the treatment and/or prevention, and to fight the pneumococcus and its local and invasive diseases. Consequently, the idea of a protein-based vaccine has taken great importance in the last years. However, in order to be considered or included in a recombinant vaccine formulation, a bacterial protein has to fulfill specific criteria such as: (1) playing an important role in the bacterial fitness and/or pathogenesis of S. pneumoniae, (2) possessing a wide distribution among the circulating strains and clinical isolates, (3) exhibiting a major conservation at its genetic and protein sequence, (4) being inmunogenic, (5) demonstrating protectivity in experimental assays, and (6) having favorable physico-chemical properties for expression and purification of its recombinant products.
Streptococcus pneumoniae is a pathogen exhibiting a fratricide behavior and an enormous capacity for natural competence, acquiring foreign genetic material and integrating it into its genome . These processes, in addition to the mutation rates [12, 13], greatly stimulate the horizontal gene transfer with other microorganisms, and explains pneumococcal genetic variability and genome plasticity [14, 15]. This model of pneumococcal population evolution, where recombination highly outpasses mutation, is also caused by the relatively high numbers of repetitive sequences in the genome thereby facilitating the incorporation of foreign DNA in the chromosome [15,16,17,18]. In consequence, these events contribute to structural reorganizations, and influence the presence or absence of protein-encoding genes in differente subsets of the global pneumococcal population, making them highly heterogeneous from the core- and pan-genomic point of view . Likewise, the generation and fixation of particular changes in the genome affect the mutation rates, which in turn influence the evolution and conservation of genes and contribute to adaptative changes that potentially lead to an increased virulence and a more complex interaction with the host .
Due to these molecular events and their importance, there is a need to fully and globally understand the genetic heterogeneity and variability among the different pneumococcal strains/serotypes (variome), and to get a deeper and detailed molecular undestanding of the different physiological and pathogenic mechanisms that this microorganim uses to cause severe and life-threatening diseases. Definitely, obtaining this knowledge will allow to identify potential pharmaceutical targets for new antimicriobial therapies. By the recognizition of their conservation and distribution degree among pneumococcal strains, this will confirm protein candidates for vaccines. However, despite the availability of a high number of completely sequenced genomes and the importance to analyse the genetic differences among pneumococci, only a few studies have focused on studying its variability from a global perspective, similarly as the Human variome databases do . To date only the “Microbial Variome Database” , which possesses and organizes the available information of the variome of the two Gram-negative bacterial species Escherichia coli and Salmonella enterica, is providing such information for microorganisms. Remarkably, there are no open-source data of this nature for any Gram-positive bacterial genome. Hence, this study focused on the construction of the first S. pneumoniae Variome model, starting with the identification of all allellic and protein variants, a mutation and distribution analysis (presence and absence) of the virulence factors and regulators, among a set of pneumococcal strains that possess a fully sequenced and annotated genome.
Definition of the study population set and determination of the optimal representation of the entire population of pneumococci
The search and selection of the Streptococcus pneumoniae strains for the analysis in this study was done using the microbial database of the “National Center for Biotechnology Information” NCBI (http://www.ncbi.nlm.nih.gov/genome) . Likewise, in order to ensure an optimal representation of the global pneumococcal population, a genomic BLAST of 8290 available S. pneumoniae genomes was carried out. In brief, DNA alignments, employing the tool “Microbial Nucleotide BLAST” , that can be found in the website http://blast.ncbi.nlm.nih.gov/Blast.cgi, were performed for all the currently reported draft or complete sequenced genomes. The comparative data was then employed to construct a DNA-based Phylogenetic Tree (dendrogram), by using the Genome Tree Report Tool of the NCBI (ncbi.nlm.nih.gov/genome/tree/176). Afterwards, the file containing the dendrogram, constructed for the 8290 strains, was downloaded from the NCBI database. Finally, the dendrogram file was viewed, analyzed and adapted in order to generate circular, slanted and/or rectangular cladograms, by using the online NCBI Tool “Tree Viewer 1.17.0”, which is available online at the website: ncbi.nlm.nih.gov/projects/treeview (Fig. 1).
Definition of the virulence factors and two-component regulatory systems to be studied in S. pneumoniae
The search and selection of genes and proteins widely known as virulence factors or gene encoding factors possessing a proven interaction with the human host was done by an exhaustive bioinformatic screening in the database “Virulence Factors DataBase - VFDB” , available at the website http://www.mgc.ac.cn/VFs. Aditionally, the virulence factors and proteins involved in interactions with the host were confirmed and completed by a systematic review of the literature [14, 25]. The common names of each one of the selected virulence factors were then introduced in the database UNIPROT , available at http://www.uniprot.org/, with the aim of obtaining the locus tag for S. pneumoniae TIGR4 genome/strain. In addition, the genes encoding the HK or RR of the pneumococcal TCS were identified by using the database Prokaryotic Two-Component Systems - P2CS , available at the website http://www.p2cs.org/index.php. Likewise, the corresponding locus tag for S. pneumoniae TIGR4 genome / strain, of each one of the histidine kinases genes (hk) and response regulator genes (rr), were also recovered from the same database.
Chromosomal localization of the virulence factor and two-component regulatory systems genes in S. pneumoniae
The chromosomal location of all the genes in the genome of S. pneumoniae TIGR4 and the construction of the genomic maps, in linear or circular representation, was done by using the software SnapGene® (GSL Biotech), available at http://www.snapgene.com. In brief, the studied genomes of S. pneumoniae were imported through its corresponding access code in GenBank (ie: NC_003028.3 for TIGR4). Then, the chromosomal location of each virulence factor gene, and the factors involved in the interaction with the host and the genes encodying for proteins of simple or two-component regulatory systems were identified. Finally, the lineal maps for the scale genomic localization for the virulence factors and the circular maps for the genomic periphery of the genes that form the two-component regulatory systems were constructed.
Distribution of the virulence factors and two-component regulatory systems in the different strains of S. pneumoniae
The identification of the genetic and protein sequences of interest to perform the comparative analysis was done, having as reference the codes (Locus Tag) in the genomes of S. pneumoniae TIGR4 and/or R6 in the database Kyoto Encyclopedia of Genes and Genomes – KEGG , available at http://www.kegg.jp/kegg/. Once every gene of interest was established in the database, a series of comparisons (BLASTs) were performed using the GenomeNet , available at http://www.genome.jp/, using only the fully sequenced and annotated genomes of S. pneumoniae. For the nucleotide sequences the search was performed using the program BLASTN 2.2.29+, which uses nucleotide vs nucleotide alignments based on a punctuation matrix BLOSUM62 [23, 30]. In the same way, the search was done for the amino acid sequences using the program BLASTP 2.2.29+ [31, 32], that performs amino acids vs amino acids alignments based on a similar matrix. Once the BLAST was finalized for each virulence factor, the list was purged using as selection criteria genes with an expectancy value: e-Value = 0. The inclusion of genes with an e-value >0 was done by direct visual inspection of the alignments to check that it was indeed the same sequence. By having defined the list with the genes and proteins that fulfilled the selection criteria, it was defined to which strains of S. pneumoniae they belong. All the DNA and protein sequences were downloaded and stored in an organized way using the fasta format.
Genetic variability (variome) of the virulence factors and two-component regulatory systems among the different pneumocococal strains
The multiple comparative alignments of pneumococcal sequences were done using the web tool MultAlin , available at http://multalin.toulouse.inra.fr/multalin/, for which an identity matrix 1–0 was used to assign a penalty even for the slightest change in the nucleotides or amino acids sequences, covering substitution, deletions, insertions and variations in the length. From these analyses, the number of allelic and protein variants were determined for each gene according to the registry value assigned by the program to each sequence, where equal sequences have the same registry value, while different sequences possess different values. The results of the alignments were manually curated and stored for further analysis. Finally, the precise determination of the total mutations, synonymous and nonsynonymous was done using the software DnaSP V.5.1 [34, 35], available at http://www.ub.edu/dnasp/. There, all the sequences found for a determined gene were introduced and the calculations were perfomed for the corresponding type of mutation as mentioned before.
Results and discussion
“Hundreds to thousands” of S. pneumoniae strains and clinical isolates recovered from the nasopharynx, blood or cerebrospinal fluid (CSF) have been included up to date in genomic sequencing projects worldwide. However, pneumococcal strains, whose genomes are fully sequenced, annotated and publicly available, are the focus of this study. Therefore, a set of 25 pneumococcal strains were selected from the NCBI database, as population study, to perform the bioinformatic analysis needed to accomplish the construction of the variome of the virulence factors and two-component regulatory systems of Streptococcus pneumoniae (Table 1).
A Variome model of the Pneumococcal Virulence Factors and Regulators is an intraspecific study, aiming to highlight variable genetic loci on the genome of Streptococcus pneumonie. A perfect and ultimate Variome model would be that constructed with the 100% of the genomic information correctly assessed from the entire pneumococcal population. However, the current state of the art is far away from this scenario and an optimal representation of the pneumococcal sets assessed up to date would be appropriate in order to validate these genomic analyzes. Currently, 8290 pneumococcal sequencing projects are reported as draft or complete genomes in the Genome Assembly and Annotation Report of the NCBI database. Therefore, a global genomic BLAST (DNA alignment) of those 8290 available S. pneumoniae genomes/strains was performed and a DNA-based Phylogenetic Tree was constructed by using the Genome Tree Report Tool of the NCBI. The topology of this phylogenetic tree (slanted cladogram) showed different pneumococcal lineages, where the selected set of 25 pneumococcal genomes/strains can be identified as external nodes (“well-distributed” key features highlighted in red), evidencing an optimal representation of the pneumococcal population (Fig. 1). In addition, it is important to highlight that the serotypes (1, 2, 3, 4, 5, 6B, 11A, 14, 19A, 19F and 23F), represented in this study population set, have been described as the pneumococcal types with the highest pathogenic potencial, due to the high burden of invasive pneumococcal diseases (IPDs) they cause worldwide. This is the reason why the majority of them (except serotypes 2 and 11A) have been included in the pneumococcal conjugate vaccines (PCVs) currently used for immunization .
An initial considerable number of pneumococcal virulence factor genes were identified, by employing the database VFDB . This database provided further detailed information to establish their function, pathogenic role and type of interaction with a receptor in its human host. Aditionally, a systematic screening of the literature  did not only allow the confirmation of identified factors, but also ensured the posibility to complement the list with additional factors that have not been included in the databases. Likewise, the number of the tcs genes (27) was determined using the database Prokaryotic 2-Component Systems - P2CS . In total, 92 different genes encoding 61 surface proteins, 4 stand alone transcriptional regulators, 13 HKs and 14 RRs have been selected and included in this work for the construction of the variome, after being classified by their function and grouped according to their molecular mechanisms of surface-exposure (Table 2).
The genomes of 25 analyzed pneumococcal strains comprise genome sizes ranging from 2,024,476 bp in SPN034156 up to 2,245,615 bp in Hungary 19A-6. Likewise, the G + C content varies between 39.50% in CGSP14 and 39.90% in SPN034156. 670-6B is the strain with the highest number of genes (2430) and proteins (2352) and SPN034156 is the strain with the lowest number of genes (1956) and proteins (1799). Hence, the difference among genomes, regarding the number of genes and proteins can be up to 474 genes and 553 proteins, respectively. The overall number of genes for each pneumococcal genome evaluated here overmatches the overall number of proteins because the reported number of genes includes all the tRNA-, rRNA- and protein-encoding genes.
Considering the chromosomal localization of pneumococcal virulence factors genes, they are all distributed along the pneumococcal genome (Fig. 2). Interestingly, these genes are located in a co-oriented manner in relation with the origin of replication (oriC: 2.160.822–196). During the bidirectional replication of the genome, gene transcription must be simultaneous . Hence, for the genes oriented in opposite direction to the corresponding replication fork, both molecular machineries will run into a frontal collision that might affect at least one of the processes. For replication, this phenomenon implies a genomic instability, while the gene transcription is probably inefficient. Previous studies have proven that the essential and highly constitutively expressed genes are co-oriented . For the pneumococcus, 30 of the 36 genes encoding virulence factors are localized in the first half of the genome, on the forward strand, and co-oriented with the replication fork clockwise. Similarly, 21 of the 27 virulence factor genes localized on the second half of the genome, are located on the reverse strand and co-oriented with the replication fork moving anti-clockwise (Fig. 2). A similar genome organization is observed for the 27 genes that encode the TCSs in S. pneumoniae, where only one operon, the tcs04 genes (TCS04), is not co-oriented with the replication fork (Fig. 3). These data reinforce the idea that the virulence factor genes and the genes of the tcs are highly important for the pneumococcal interaction with the human host, and its pathogenic potential in processes such as adherence, colonization, invasion, immune evasion, fitness, antibiotic resistance and natural competence (Table 2).
The analysis of the distribution of genes associated with virulence and host-pathogen interactions among the studied pneumococcal strains revealed that only 26 of the 65 genes considered here are present in the all 25 strains. These genes encode for products involved in different functions such as cell wall hydrolysis, ABC transporters and structural proteins implied in the adherence to host tissue, the so-called adhesins. Interestingly, after preliminar inspection (by locus tag, identifier names and/or product sizes) of the datasets and supplementary material reported by van Tonder and colleagues in 2017, only a few of the pneumococcal virulence factors (PspC, KsgA, and 4 hypothetical lipoproteins) and regulators (RR04, HK08, RR08, RR09, RR10) were found in the pneumococcal “supercore” genomic list of 303 genes, based on the analysis of 3121 pneumococci recovered from healthy individuals from four different subsets of the global pneumococcal population . These findings, if confirmed after deeper analysis of the datasets based on sequence comparison, may indicate that pneumococcal pathogenesis is a much more complex process than thought before. While most of the genes have a single copy in the genome, the lytA gene, encoding the major pneumococcal autolysin, is found also in two and even three copies in 13, and 2 strains, respectively. This is most likely due to the multiple integration of prophages in the chromosomal DNA  (Table 3). In strain SPNA45, the gene gnd, encoding the enzyme 6-phophogluconate dehydrogenase, is duplicated and fused with a second copy of its downstream neighbor gene, which encodes the orphan response regulator (rr14). The remaining 39 of the 65 virulence factor genes were found to belong to the accesory genome, presenting different degrees of absence in the 25 strains. Thus, all these genes are not essential but are beneficial for fitness and pathogenesis. Striking examples are the genes encoding the Pilus-1 and Pilus-2 structures that have been identified to mediate adherence, contribute to virulence and promote invasion [38,39,40,41,42]. These genes are located on pathogenicity islands (PAI) and these islands contain also the genes required for cell surface anchoring and regulation [38,39,40,41]. Remarkably, strains like ST556, Taiwan19F-14 and TCH8431/19A, were detected here as positive for both types of pili (1 and 2). Among the other genes with restricted presence in some strains it is important to mention that they encode for sortase-anchored proteins or choline-binding proteins (CBPs), as well as histidine triad proteins (pht genes). These gene products are associated with different processes of bacterial fitness and pathogenesis (Tables 3 and 2) [6, 43, 44]. Regarding the distribution and data of the analyzed strains for the TCS most of them were found in the 25 pneumococcal strains. Exceptions are presented by the TCS07 and TCS12, which contribute to fitness and competence, respectively [7, 45]. These TCS are absent in a couple of strains (Table 4). In some other strains genes like hk01, hk12 and rr04, presented incomplete sequences, an artefact leading to truncated and hence non-functional proteins/regulators (Tables 3 and 2). Interestingly, only the genes encoding the hk08, rr08, rr09, rr06 and rr04 were found to belong to the “supercore” genomic set of genes reported by van Tonder et al., in 2017 , indicating the important role these highly conserved and well-distributed regulatory proteins play in the pneumococcus and in its interplay with the environment.
The estimation of the variability for each individual virulence factor and pneumococcal regulator (at the DNA and protein level) allowed the construction of a partial variome for the analysed 25 pneumococcal strains. Briefly, the variome takes into consideration the estimation of (1) the presence, absence or the number of copies of genes in the different strains, (2) the number of total synonymous and nonsynonymous mutations, and (3) the number of allelic and protein variants explaining the variability for each factor. The results summarized in Tables 5 and 6, contain the data for the genes and proteins associated to virulence and host-pathogen interaction, and also the data for the stand-alone and TCS regulators. Specifically there are some identified factors with the best distribution and highest evolutionary conservation, These were (1) the ply gene encoding the sole pneumococcal cytolysin and cytotoxin pneumolysin , (2) the enolase, which encodes the enzyme enolase (2-phosphoglycerate dehydratase) and has an essential function in the metabolism , but also interacts specifically with plasmin(ogen) and is therefore involved in fibrinolytic processes, adherence and virulence, and (3) the pcsB (Usp45) gene, which encodes for a 45-kDa secreted and immunogenic protein that is involved in cell division and stress response . As for the mutations, these three proteins presented a minor number of changes, in comparison with others proteins that were also analyzed. The variome of the TCS (Table 6) allowed to conclude that the most conserved genes from the evolutionary point of view, are the genes hk05 and rr05 of ciaR/H (tcs05). The TCS CiaRH is involved in the resistance to cefotaxime, regulation of genetic competence and increase in pathogenicity in the respiratory tract in murine models [7, 49, 50]. Meanwhile, hk02 and rr02 (WalR/K, MicA/B or VicR/K), have been associated with resistance to erythromycin and are essential for the bacterial growth. Nevertheless, the latter was proven to be due to its regulon (pcsB), and was no longer essential upon ectopic expression of PcsB [7, 48]. Pneumococcal TCS08 is involved in the genetic regulation of pilus-1 . The mutation analysis showed that the response regulators exhibited a lower rate of variations in comparison to the histidine kinases, being the response regulators rr05, rr02, rr06, and rr08 the most conserved. All the results obtained in this study support the global idea of a new generation of protein-based and serotype-independent vaccines for Streptococcus pneumoniae. The basis is the high degree of distribution and conservation of the virulence proteins in combination with the importance of their functions and immunogenic capacities. This probably makes them ideal pharmacological targets to treat the pneumococcus and its diseases. This might be an alternative to the immunization with the conjugated serotypes, or represent a strategy to combine immunogenic and highly conserved proteins with capsular polysaccharides to generate a serotype-independent immune response.
The construction of this “low-scale” Variome model for the virulence factors and regulators of Streptococcus pneumoniae was achieved from 25 pneumococcal strains with fully sequenced and annotated genomes. According to the Molecular Phylogenetic Analysis performed on the NCBI website, this selected set of pneumococcal genomes ensured an optimal representation of the pneumococcal population (8290 strains) reported in the NCBI database up to date. Similarly, this study population set also represented an important group of highgly pathogenic pneumococcal serotypes (1, 2, 3, 4, 5, 6B, 11A, 14, 19A, 19F and 23F), which have been also included in the current pneumococcal conjugate vaccine formulations (except serotypes 2 and 11A), used to prevent penumococal infections. A total of 92 different genes and proteins were identified, classified, and studied for the construction of the variome. The genes of the pneumococcal virulence factors and TCS, are distributed along the genome, and are located in such a manner that transcription is co-oriented with replication. The analysis of the gene distribution in this study population set showed that 26 of them were found in the 100% of the 25 pneumococcal genomes/strains (core genome), while 39 are part of the flexible genome. The estimation of the variability for each individual virulence factors, stand-alone regulator or TCS, indicated that the virulence factors with the lowest variability in the pneumococcus are pneumolysin, enolase and PcsB, while the regulators with the highest conservation are TCS05 (CiaR/H), TCS02 (VicR/K) and TCS08. Finally, all the results obtained here with the bioinformatic analysis performed, constitute the first model to compare, visualize and understand the future flood of new genomic data about the genetic variation (in terms of gene presence/absence or mutation) of pneumococcal virulence factors and regulators [51,52,53]. The applicability offered by this variome model, together with further population genomic analysis of pneumococci, will provide relevant information on potential targets for vaccines, supporting the idea of a new generation of protein-based formulations to combat Streptococcus pneumoniae and its disease burden.
Basic local alignment search tool
Blocks substitution matrix
Choline binding proteins
DNA sequence polymorphism
Invasive pneumococcal diseases
Kyoto encyclopedia of genes and genomes
Multiple sequence alignment
National center for biotechnology information
Non-classical surface proteins
Prokaryotic two-component systems
- S. p. :
Two-Component regulatory Systems
The Universal Protein Resource
Pan-genomic variability map
Virulence factors data base
World Health Organization. The global burden of disease: 2004 update. Geneva: WHO; 2008.
Bridy-Pappas AE, Margolis MB, Center KJ, Isaacman DJ. Streptococcus Pneumoniae: description of the pathogen, disease epidemiology, treatment, and prevention. Pharmacotherapy. 2005;25(9):1193–212.
Brueggemann AB, Griffiths DT, Meats E, Peto T, Crook DW, Spratt BG. Clonal relationships between invasive and carriage Streptococcus Pneumoniae and serotype- and clone-specific differences in invasive disease potential. J Infect Dis. 2003;187(9):1424–32.
Johnson HL, Deloria-Knoll M, Levine OS, Stoszek SK, Freimanis Hance L, Reithinger R, Muenz LR, O'Brien KL. Systematic evaluation of serotypes causing invasive pneumococcal disease among children under five: the pneumococcal global serotype project. PLoS Med. 2010;7(10):1–13.
Jedrzejas MJ. Pneumococcal virulence factors: structure and function. Microbiol Mol Biol Rev. 2001;65(2):187–207. first page, table of contents
Voss S, Gamez G, Hammerschmidt S. Impact of pneumococcal microbial surface components recognizing adhesive matrix molecules on colonization. Mol Oral Microbiol. 2012;27(4):246–56.
Throup JP, Koretke KK, Bryant AP, Ingraham KA, Chalker AF, Ge Y, Marra A, Wallis NG, Brown JR, Holmes DJ, et al. A genomic analysis of two-component signal transduction in Streptococcus Pneumoniae. Mol Microbiol. 2000;35(3):566–76.
McCluskey J, Hinds J, Husain S, Witney A, Mitchell TJ. A two-component system that controls the expression of pneumococcal surface antigen a (PsaA) and regulates virulence and resistance to oxidative stress in Streptococcus Pneumoniae. Mol Microbiol. 2004;51(6):1661–75.
Gamez G, Hammerschmidt S. Combat pneumococcal infections: adhesins as candidates for protein-based vaccine development. Curr Drug Targets. 2012;13(3):323–37.
Centers for Disease Control and Prevention. Active Bacterial Core Surveillance Report, Emerging Infections Program Network, Streptococcus pneumoniae. Atlanta: CDC; 2015.
Eldholm V, Johnsborg O, Straume D, Ohnstad HS, Berg KH, Hermoso JA, Havarstein LS. Pneumococcal CbpD is a murein hydrolase that requires a dual cell envelope binding specificity to kill target cells during fratricide. Mol Microbiol. 2010;76(4):905–17.
Donkor ES. Understanding the pneumococcus: transmission and evolution. Front Cell Infect Microbiol. 2013;3:7.
Feil EJ, Smith JM, Enright MC, Spratt BG. Estimating recombinational parameters in Streptococcus Pneumoniae from multilocus sequence typing data. Genetics. 2000;154(4):1439–50.
Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, Oggioni M, Dunning Hotopp JC, Hu FZ, Riley DR, et al. Structure and dynamics of the pan-genome of Streptococcus Pneumoniae and closely related species. Genome Biol. 2010;11(10):R107.
van Tonder AJ, Bray JE, Jolley KA, Quirk SJ, Haraldsson G, Maiden MCJ, Bentley SD, Haraldsson A, Erlendsdottir H, Kristinsson KG et al. Heterogeneity Among Estimates Of The Core Genome And Pan-Genome In Different Pneumococcal Populations. bioRxiv 2017, doi:https://doi.org/10.1101/133991.
Aras RA, Kang J, Tschumi AI, Harasaki Y, Blaser MJ. Extensive repetitive DNA facilitates prokaryotic genome plasticity. Proc Natl Acad Sci U S A. 2003;100(23):13579–84.
Chewapreecha C, Harris SR, Croucher NJ, Turner C, Marttinen P, Cheng L, Pessia A, Aanensen DM, Mather AE, Page AJ, et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet. 2014;46(3):305–9.
Croucher NJ, Harris SR, Fraser C, Quail MA, Burton J, van der Linden M, McGee L, von Gottberg A, Song JH, Ko KS, et al. Rapid pneumococcal evolution in response to clinical interventions. Science. 2011;331(6016):430–4.
Sokurenko EV, Gomulkiewicz R, Dykhuizen DE. Source-sink dynamics of virulence evolution. Nat Rev Microbiol. 2006;4(7):548–55.
Ring HZ, Kwok PY, Cotton RG. Human Variome project: an international collaboration to catalogue human genetic variation. Pharmacogenomics. 2006;7(7):969–72.
Chattopadhyay S, Taub F, Paul S, Weissman SJ, Sokurenko EV. Microbial variome database: point mutations, adaptive or not, in bacterial core genomes. Mol Biol Evol. 2013;30(6):1465–70.
Tatusova TA, Karsch-Mizrachi I, Ostell JA. Complete genomes in WWW Entrez: data representation and analysis. Bioinformatics. 1999;15(7–8):536–43.
Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000;7(1–2):203–14.
Chen L, Yang J, Yu J, Yao Z, Sun L, Shen Y, Jin Q. VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res. 2005;33(Database issue):D325–8.
Engel P, Goepfert A, Stanger FV, Harms A, Schmidt A, Schirmer T, Dehio C. Adenylylation control by intra- or intermolecular active-site obstruction in Fic proteins. Nature. 2012;482(7383):107–10.
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004;32(Database issue):D115–9.
Barakat M, Ortet P, Jourlin-Castelli C, Ansaldi M, Mejean V, Whitworth DE. P2CS: a two-component system resource for prokaryotic signal transduction research. BMC Genomics. 2009;10:315.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Kanehisa M. Linking databases and organisms: GenomeNet resources in Japan. Trends Biochem Sci. 1997;22(11):442–4.
Morgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, Schaffer AA. Database indexing for production MegaBLAST searches. Bioinformatics. 2008;24(16):1757–64.
Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 2001;29(14):2994–3005.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.
Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988;16(22):10881–90.
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25(11):1451–2.
Sokurenko EV, Feldgarden M, Trintchina E, Weissman SJ, Avagyan S, Chattopadhyay S, Johnson JR, Dykhuizen DE. Selection footprint in the FimH adhesin shows pathoadaptive niche differentiation in Escherichia Coli. Mol Biol Evol. 2004;21(7):1373–83.
Srivatsan A, Tehranchi A, MacAlpine DM, Wang JD. Co-orientation of replication and transcription preserves genome integrity. PLoS Genet. 2010;6(1):e1000810.
Morales M, Garcia P, de la Campa AG, Linares J, Ardanuy C, Garcia E. Evidence of localized prophage-host recombination in the lytA gene, encoding the major pneumococcal autolysin. J Bacteriol. 2010;192(10):2624–32.
van Kooyk Y, Geijtenbeek TB. DC-SIGN: escape mechanism for pathogens. Nat Rev Immunol. 2003;3(9):697–709.
Figueira M, Moschioni M, De Angelis G, Barocchi M, Sabharwal V, Masignani V, Pelton SI. Variation of pneumococcal Pilus-1 expression results in vaccine escape during experimental Otitis media [EOM]. PLoS One. 2014;9(1):e83798.
Soriani M, Telford JL. Relevance of pili in pathogenic streptococci pathogenesis and vaccine development. Future Microbiol. 2010;5(5):735–47.
Song XM, Connor W, Hokamp K, Babiuk LA, Potter AA. The growth phase-dependent regulation of the pilus locus genes by two-component system TCS08 in Streptococcus Pneumoniae. Microb Pathog. 2009;46(1):28–35.
Iovino F, Hammarlöf DL, Garriss G, Brovall S, Nannapaneni P, Henriques-Normark B. Pneumococcal meningitis is promoted by single cocci expressing pilus adhesin RrgA. J Clin Invest. 2016;126(8):2821–6.
AlonsoDeVelasco E, Verheul AF, Verhoef J, Snippe H. Streptococcus Pneumoniae: virulence factors, pathogenesis, and vaccines. Microbiol Rev. 1995;59(4):591–603.
Blue CE, Paterson GK, Kerr AR, Berge M, Claverys JP, Mitchell TJ. ZmpB, a novel virulence factor of Streptococcus Pneumoniae that induces tumor necrosis factor alpha production in the respiratory tract. Infect Immun. 2003;71(9):4925–35.
Martin B, Granadel C, Campo N, Henard V, Prudhomme M, Claverys JP. Expression and maintenance of ComD-ComE, the two-component signal-transduction system that controls competence of Streptococcus Pneumoniae. Mol Microbiol. 2010;75(6):1513–28.
Shak JR, Ludewick HP, Howery KE, Sakai F, Yi H, Harvey RM, Paton JC, Klugman KP, Vidal JE. Novel role for the Streptococcus Pneumoniae toxin pneumolysin in the assembly of biofilms. MBio. 2013;4(5):e00655–13.
Bergmann S, Schoenen H, Hammerschmidt S. The interaction between bacterial enolase and plasminogen promotes adherence of Streptococcus Pneumoniae to epithelial and endothelial cells. Int J Med Microbiol. 2013;303(8):452–62.
Ng WL, Robertson GT, Kazmierczak KM, Zhao J, Gilmour R, Winkler ME. Constitutive expression of PcsB suppresses the requirement for the essential VicR (YycF) response regulator in Streptococcus Pneumoniae R6. Mol Microbiol. 2003;50(5):1647–63.
Sebert ME, Patel KP, Plotnick M, Weiser JN. Pneumococcal HtrA protease mediates inhibition of competence by the CiaRH two-component signaling system. J Bacteriol. 2005;187(12):3969–79.
Muller M, Marx P, Hakenbeck R, Bruckner R. Effect of new alleles of the histidine kinase gene ciaH on the activity of the response regulator CiaR in Streptococcus Pneumoniae R6. Microbiology. 2011;157(Pt 11):3104–12.
Gámez G, Castro F, Gómez-Mejia A, Gallego M, Bedoya A, Hammerschmidt S. Bioinformatic analysis and construction of the variome of the virulence factors and genetic regulators in Streptococcus Pneumoniae. In: Annual Conference of the Association for General and Applied Microbiology (VAAM). Marburg. Germany: Biospektrum; 2015.
Castro AF, Gómez-Mejia A, Gallego M, Bedoya A, Hammerschmidt S, Gámez GA. Variome of the Pneumococcal Surface-Exposed Proteins and other Virulence Factors: A Bioinformatics Analysis. [Abstract EuroPneumo-P1.27]. Pneumonia. 2015;7:17.
Gámez GA, Castro AF, Gómez-Mejia A, Gallego M, Bedoya A, Hammerschmidt S. Análisis Bioinformático y Construcción del Varioma de los Factores de Virulencia y Sistemas de Regulación por Dos-Componentes de Streptococcus pneumoniae. [Abstract 3rd Colombian Congress on Computational Biology and Bioinformatics-CCBCOL3]. Medellín - Colombia; 2015, Oral Presentation 129.
The authors thank to Prof. Vanessa Cienfuegos, School of Microbiology, University of Antioquia for her critical evaluation of this research work and manuscript. We express our acknowledgements to peer reviewers for critical review of the manuscript. Their suggestions and comments significantly improved the quality of this piece of work.
The fundings for this research work have been provided by the Committee for Development of Research at the University of Antioquia (CODI, CIEMB-097-13) in Colombia, and the DFG GRK 1870/1 (Bacterial Respiratory Infections) in Germany. Both funding sources had no involvement in the design of this study, in the collection, analysis and interpretation of data, in the writing of this manuscript, and in the decision to submit this article for publication.
Availability of data and materials
Sequence data that support the findings of this study were already-published information, retrieved from GenBank (accession numbers are provided in Table 1). All the bioinformatic-analyzed data generated here are included in this published study. However, supplementary raw information files (mainly DNA and protein sequence comparisons) are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests in relation with this research work and manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gámez, G., Castro, A., Gómez-Mejia, A. et al. The variome of pneumococcal virulence factors and regulators. BMC Genomics 19, 10 (2018). https://doi.org/10.1186/s12864-017-4376-0