Study of inter- and intra-individual variations in the salivary microbiota
© Lazarevic et al. 2010
Received: 7 May 2010
Accepted: 28 September 2010
Published: 28 September 2010
Skip to main content
© Lazarevic et al. 2010
Received: 7 May 2010
Accepted: 28 September 2010
Published: 28 September 2010
Oral bacterial communities contain species that promote health and others that have been implicated in oral and/or systemic diseases. Culture-independent approaches provide the best means to assess the diversity of oral bacteria because most of them remain uncultivable.
The salivary microbiota from five adults was analyzed at three time-points by means of the 454 pyrosequencing technology. The V1-V3 region of the bacterial 16S rRNA genes was amplified by PCR using saliva lysates and broad-range primers. The bar-coded PCR products were pooled and sequenced unidirectionally to cover the V3 hypervariable region. Of 50,708 obtained sequences, 31,860 passed the quality control. Non-bacterial sequences (2.2%) were removed leaving 31,170 reads. Samples were dominated by seven major phyla: members of Firmicutes, Proteobacteria, Actinobacteria, Bacteroidetes and candidate division TM7 were identified in all samples; Fusobacteria and Spirochaetes were identified in all individuals, but not at all time-points. The dataset was represented by 3,011 distinct sequences (100%-ID phylotypes) of ~215 nucleotides and 583 phylotypes defined at ≥97% identity (97%-ID phylotypes). We compared saliva samples from different individuals in terms of the phylogeny of their microbial communities. Based on the presence and absence of phylotypes defined at 100% or 97% identity thresholds, samples from each subject formed separate clusters. Among individual taxa, phylum Bacteroidetes and order Clostridiales (Firmicutes) were the best indicators of intraindividual similarity of the salivary flora over time. Fifteen out of 81 genera constituted 73 to 94% of the total sequences present in different samples. Of these, 8 were shared by all time points of all individuals, while 15-25 genera were present in all three time-points of different individuals. Representatives of the class Sphingobacteria, order Sphingobacteriales and family Clostridiaceae were found only in one subject.
The salivary microbial community appeared to be stable over at least 5 days, allowing for subject-specific grouping using UniFrac. Inclusion of all available samples from more distant time points (up to 29 days) confirmed this observation. Samples taken at closer time intervals were not necessarily more similar than samples obtained across longer sampling times. These results point to the persistence of subject-specific taxa whose frequency fluctuates between the time points. Genus Gemella, identified in all time-points of all individuals, was not defined as a core-microbiome genus in previous studies of salivary bacterial communities. Human oral microbiome studies are still in their infancy and larger-scale projects are required to better define individual and universal oral microbiome core.
Bacterial communities in the mouth have a significant impact on the general health by either preventing or causing infections. Recent data suggest a causative relationship between oral microbiota profiles and specific diseases including periodontitis [1, 2]. Bacterial species that are more prevalent in healthy subjects as well as those that have higher counts in diseased individuals have been identified. Based on an extensive literature review, Siqueira and Rôças  concluded that some oral pathologies may have a polymicrobial aetiology and that different types of infection are represented by various mixed bacterial consortia.
Most of the bacteria in saliva are attached to exfoliated human epithelial cells . In addition to its clinical importance as a diagnostic indicator of oral cancer  and possibly other diseases, the human salivary microbiome may provide insights into human populations structure and migrations. Surprisingly, initial studies suggest there is little geographic structure of the human salivary microbiome, although specific bacterial genera e.g. Serratia and Enterobacter vary significantly across geographic locations .
Using traditional and molecular approaches, more than 700 bacterial species have been identified in the human oral cavity and approximately half of them are not yet cultivated . The study of complex oral microbiotas with a classical approach would require new culturing technologies whose development is laborious and intrinsically limited. Metagenomics offers an attractive alternative, bypassing the need to culture bacteria. The sequencing of the entire microbiome, used to characterize communities dominated by a small number of species  cannot be readily applied for the analysis of highly-complex human microbiota. Therefore, high-throughput sequencing of amplified partial 16S rDNA sequences of a bacterial community currently provides the best compromise between sequence coverage, analytical speed and experimental costs.
Recent studies of oral microbiota using high-throughput sequencing estimated the number of species-level phylotypes between 540 and about 10,000 [9–11]. However, these figures were obtained using different sequencing coverage, sampling different anatomical sites and analyzing samples pooled from different number of individuals. Therefore, not all of the identified taxa are expected to be present in the same subject and at the same time . In the current study, we assessed the inter- and intra-individual variations of salivary microbiota, by means of a culture-independent approach based on the pyrosequencing of the bacterial 16S rDNA V3 region. We compared salivary bacterial communities of five individuals at three time-points using 16S rDNA pyrosequencing in order to assess their short-term stability and interindividual differences.
We explored the microbial diversity of the saliva samples from five individuals by targeting the 16S rDNA hypervariable V3 region. Of 50,708 obtained reads, 31,860 passed the quality control. They were submitted to the MG-RAST server  for taxonomic analysis. The BLAST-based MG-RAST analysis with a minimum alignment length of 105, the RDP dataset option and a maximum e-value of 10-30 removed 690 sequences (2.2%), leaving 31,170 sequences which were further analyzed. The majority of removed sequences were identical or nearly-identical to human sequences and two sequences were recognized by the MG-RAST as chimeras. One additional sequence was recognized as a putative chimera after multiple sequence alignment (see below).
To get an estimate of the pyrosequencing errors we calculated the potential errors that can be derived from the most abundant sequence in the entire dataset. This 216-nt long sequence, which matches the relevant 16S rDNA segment of several species of genus Streptococcus exactly, occurred 3291 times and was found in all samples. Then we identified sequences that: (i) differed from the most frequent sequence by deleting, inserting or changing any single nucleotide and (ii) lacked exact matches in the reference database. As expected, nucleotide substitutions were the rarest error type with 8 examples (6 distinct sequences). Deletions were more frequent with 26 counts (19 distinct sequences), followed by insertions represented by 58 sequences (35 distinct sequences), which is similar to the trend observed by Huse et al. . The longest homopolymer stretch associated with a putative insertion or deletion was a 4T which became 5T (3 occurrences). All potential error products represented together 2.8% of the most abundant sequence. However, this may be an overestimation since the single-read accuracy of pyrosequencing with the GS FLX System is 99.5% and the majority of errors are undercalling or overcalling the length of homopolymeric stretches .
The impact of pyrosequencing errors on classification has been shown to be very small: an insertion/deletion rate of 2% would affect classification of 0.2% reads . In line with this observation, all 92 single-nucleotide derivatives of the most occurring sequence were (correctly) classified using RDP classifier as genus Streptococcus with over 98% confidence.
The most abundant phylum in each sample was either Firmicutes or Proteobacteria (Figure 2). In four subjects there was a clear dominance of one of these over the other phylum in the three time-point samples. Subject #4 had the lowest Bacteroidetes content. In subject #5, a low proportion of Proteobacteria was associated with a higher abundance of Actinobacteria.
The dataset was represented by 3012 phylotypes defined at 100% identity (100%-ID phylotypes or distinct sequences). One 100%-ID phylotype was discarded because it was placed more distantly than the standard archaeal Methanocaldococcus jannaschii DSM 2661 outgroup sequence in the MUSCLE alignment-based clustering. The BLAST and RDP analyses showed that this sequence was obviously chimeric, consisting of two distinct domains. The 5' domain (residues 1-175) was assigned to Veillonellaceae (100% RDP confidence) whereas the remaining 3' domain (residues 176-219) corresponded to Lachnospiraceae (69% RDP confidence).
The number of genera in each subject ranged from 23 to 46 and the number of OTUs defined at 97% identity (97%-ID phylotypes or OTUs003) ranged from 56 to 259 for the different samples. However, due to different sampling sizes, these figures cannot be readily compared (Additional File 2).
We also investigated whether the samples could be grouped based on sequences of individual phyla. We constructed phylogenetic trees from the sequences for each of the 5 phyla identified across all samples (Firmicutes, Proteobacteria, Bacteroidetes, Actinobacteria and TM7). Individual samples were then clustered within each of the five trees using the unweighted UniFrac method. The results presented in Figure 5A show that intrapersonal differences were globally smaller than interpersonal ones for all examined phyla except TM7. Of all phyla, Bacteroidetes were the best indicator of intraindividual similarity over time; the three time points of four subjects and two of one subject grouped closely. When bacterial communities were compared based on sequences of each of the six individual orders present in all samples, the best clustering by subject was found for Clostridiales (Figure 5A).
Different clustering patterns were obtained for different taxon-specific sequences. Samples from subjects #2 and #3 formed distinct clusters in 4/5 phylum-specific datasets. Samples from subject #2 were also grouped in 6/6 analyzed order subsets (not presented) which suggests the high stability of the microbiota in this individual. Taken together, these results indicate that the rate of oral microbiota changes differs between taxonomic groups of bacteria as well as between individuals.
Number of common and subject-specific taxa
Total # of taxa
# of common taxa (%)
# of subject-specific taxa (%)
Distinct sequences (100%-ID phylotypes)
Firmicutes, which are generally the most abundant in the oral metagenome, also have the highest numbers of both universal core and subject-specific phylotype representatives (Additional File 3). Representatives of the class Sphingobacteria, order Sphingobacteriales and family Clostridiaceae were found only in subject #1. In the three time points from subject #1, Sphingobacteria had a frequency of 0.04%, 0.1% and 3.7%. Therefore, Sphingobacteria may be temporal colonizers of susceptible individuals. Sequences corresponding to family Peptostreptococcaceae were detected in all samples except those from subject #3. The only subject-specific genus was Olsenella. This genus, found only in subject 5, is diverse, encompassing five different OTUs003.
The salivary bacterial community comparisons using UniFrac revealed that samples from the same individual were clustered, i.e. the salivary microbial community appeared to be stable over at least 5 days. Including samples from more distant time points (15-29 days) performed for three subjects confirmed subject-specific grouping. Moreover, our results show that within the same subject, samples taken at closer time intervals were not necessarily more similar than samples obtained across longer sampling times. These results point to the persistence of subject-specific taxa whose frequency fluctuates between the time points. Because of its relative stability, salivary microbiota may be potentially applied as an alternative or complementary approach in forensics for person identification, as it has been recently proposed for skin bacterial communities .
Recently, Costello et al.  showed that whole-body bacterial communities may be perfectly clustered by host. The three-month time point samples share many taxa, and the oral microbiota are less variable than other investigated body sites. Indeed, in another study twenty-six percent of distinct sequences were shared in oral microbiomes when single samples of three unrelated individuals were compared .
Although the present study did not reach a stable value for phylotype richness, several universal core taxa were identified and putative subject-specific taxa were proposed. A deeper sample coverage is expected to increase the number of universal core taxa whereas the effect on subject-specific taxa frequency remains more difficult to predict: A richer sampling depth may reveal new subject-specific taxa members, and some of those defined in this study may no longer appear specific to a given individual or group of individuals. The fact that the same genera are not uniformly considered as universal core members across different studies shows that the metagenomic studies of oral microbiota require larger-scale high-throughput approaches to better define their individual and universal core.
Although the stringent phylotype identity level cutoff of 100% inflates diversity estimates due to pyrosequencing errors  it may, as shown in this study, lead to a better clustering of samples by subject than a 97%-ID phylotype-based approach which includes the removal of hypervariable 16S rDNA regions. Applying the latter procedure (partly in order to minimize the influence of sequencing errors) some of the sample diversity is masked. Therefore, the impact of the sequence alignment procedure and the identity threshold used for phylotype grouping on clustering of bacterial communities may depend not only on the frequency of sequencing errors but also on the bacterial community composition.
Factors influencing the oral microbiota composition include age, gender, dietary habits, smoking, oral hygiene, use of antibiotics and, probably, genetic factors. Since salivary microbiota analysis revealed subject-specific grouping over time, it will greatly benefit the field to conduct a long-term survey of a large number of subjects in order to provide insight into the impact of different factors and the dynamics of the microbiota changes. Improvements in high-throughput sequencing techniques, including longer and more accurate reads, will enable better classification of bacteria. Direct metagenome sequencing without a PCR amplification step will eventually provide a less biased measure of the microbial diversity.
Unstimulated saliva was obtained from five adult individuals with informed consent. Individuals without obvious signs of oral disease recruited between the laboratory staff were as follows: subject 1, 44-year, male, non-smoker; subject 2, 30-year, pregnant female with well-controlled Type 1 diabetes, non-smoker; subject 3, 34-year, male, non-smoker; subject 4, 30-year, male, smoker; subject 5, male, 30-year, smoker. Samples were taken between 10 and 11 am at three time-points ranging within a 29-day period. Samples were collected by spitting in sterile plastic 50-ml tubes, 100 μL saliva was mixed with the same volume of 2× lysis buffer [Tris 20 mM, EDTA 2 mM (pH 8), Tween 1%] and kept frozen at -20°C until processing. We added proteinase K (Fermentas) 200 μg/ml and incubated samples for 2.5 hours at 55°C . Proteinase K was inactivated by a 10 min incubation at 95°C and the samples were stored at -20°C. The lysis procedure used allows for the detection of many hard-to-lyse species .
We aligned 753 16S rDNA sequences from the HOMD (October 2008) using MAFFT (- FFT-NS-2, v6.531b) . Primers were selected from the conserved areas of the alignment flanking the V1-V3 region in order to match most sequences. With a 100% match, primers 8Fhomd (5'-GAGTTTGATCMTGGCTCAG) and 534RhomdDEGa (5'-CCGCGRCTGCTGGCAC) produced 723 and 741 hits, respectively, or 713 (94.7%) of the HOMD sequences. Species coverage was over 90% for the phyla Proteobacteria, Firmicutes, Fusobacteria, Bacteroidetes, Spirochaetes and TM7, and 84.7% for the phylum Actinobacteria. The PCR primers were not designed to amplify single HOMD representatives of phyla Chlamydiae and SR1. In a more general way, the 16S forward and reverse primers used produced 81.3% and 96.23% hits in the RDP database . The V1-3 amplicons corresponded to E. coli 16S rDNA positions 28 to 514, excluding primer sequence.
PCR amplification was carried out in two steps. The first PCR included 2 μL of lysate and 0.5 μM of each forward (8Fhomd) and reverse (534RhomdDEGa) primer in 25 μl PrimeStar HS Premix (Takara). The samples were run for 11 cycles using the following parameters: 98°C for 10 s, 60°C for 15 s, and 72°C for 1 min.
The second PCR contained 0.4-4 μl aliquot from the first reaction and 0.5 μM of both forward B-8fhomd (5'-gccttgccagcccgctcag- ac -GAGTTTGATCMTGGCTCAG-3') and reverse A-[601-to-615]-534RhomdDEGa primer (gcctccctcgcgccatcag -NNNNNNNN -at- CCGCGRCTGCTGGCAC-3') in 50 μl of PrimeStar HS Premix. The composite PCR primers included: (i) the 454 Life Science 19-base adaptors A (lowercase, underlined) or B (lowercase); (ii) an eight-base sample specific barcode sequence (NNNNNNNN; designated 601 to 615 in ); (iii) the sequence of the broad range 16S forward or reverse primer (uppercase) used in the first PCR, and (iv) a dinucleotide sequence (lowercase italic) introduced between the 16S primer and the barcode sequence designed to prevent pairing of different barcodes with rDNA targets. The amplicons were generated in PCR reactions using 28 cycles of 98°C for 10 seconds, 60°C for 15 seconds, and 72°C for 1 min. The negative control was amplified by 35 PCR cycles. PCR products were purified using the MinElute PCR purification kit (Qiagen). Subsequently, 1 μl of the amplified reaction mix was run on the Agilent 2100 Bioanalyzer using a DNA1000 lab chip to confirm the proper size of the generated products and assess their concentration. Hundred ng of each of the purified sample were pooled and sequenced on a Genome Sequencer FLX system (Roche).
Sequences containing uncalled bases, incorrect primer sequences or runs of ≥10 identical nucleotides were removed. Reads with the 16S rDNA reverse oligonucleotide sequence CCGCGRCTGCTGGCGC, containing G instead of A at the penultimate position of the 3' end, were relatively frequent (35%). They are likely due to a sequencing artefact and were not removed from the dataset if other quality criteria were met. After trimming primer sequences, reads shorter than 200 nt were discarded.
After removing sequence duplicates, we created a multiple alignment of the sequences using MUSCLE  (using the following parameters: -maxiters 2 and -diags). Sequences corresponding to E. coli 16S rDNA positions 300-514 were extracted from the multiple alignment and each distinct sequence was assigned as a 100%-ID phylotype. Sequences were assigned to representative phylotypes at 97% identity (OTUs003) using CD-HIT  with a minimum coverage of 99%. Distances between 100%-ID and 97%-ID phylotypes were calculated using MUSCLE (-maxiters 2 and -diags) . Alternatively, the 97%-ID phylotypes were aligned using Aligner of the RDP pyrosequencing pipeline  and the hypervariable regions were removed leaving 187 comparable positions. Then, a phylogenetic tree of these phylotypes (OTUs003-hv) was constructed using FastTree . Bacterial diversity was assessed using RDP pyrosequencing tools: Aligner, Complete Linkage Clustering, Rarefaction, Shannon Index and Chao1 Estimator [24, 35]. Clustering and principal coordinates analysis were carried out using UniFrac .
Trimmed dataset (31,169 reads) is publicly available at the MG-RAST repository  under ID 4445506.3. The fasta identifiers of this dataset are described in Additional file 4. Data will also be available through the GenBank Short Read Archive via accession number SRA012505.
We thank for Antoine Huyghe for help with Microsoft Excel and Bernard Berger for advice and assistance in the 454 sequencing. JS and PF are supported by the Swiss National Science Foundation grants 3100A0-112370/1 and 3100A0-116075, respectively.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.