Pyrosequencing analysis of the human microbiota of healthy Chinese undergraduates

Background Elucidating the biogeography of bacterial communities on the human body is critical for establishing healthy baselines from which to detect differences associated with disease; however, little is known about the baseline bacterial profiles from various human habitats of healthy Chinese undergraduates. Results Using parallel barcoded 454 pyrosequencing targeting on the 16S rRNA gene V3 region, the bacterial diversity of the nasopharynx, saliva, dominant hands, and feces were investigated from 10 healthy Chinese junior boarding undergraduates at Zhejiang University. The participants were 21–24 years of age with a body mass index (BMI) < 24 kg/m2. A total of 156,717 high-quality pyrosequencing reads were obtained for evaluating bacterial diversity, which represented 29,887 unique phylotypes. The overall taxonomic distribution of the 16S rRNA gene-based amplicons demonstrated that these 4 habitats of the human body harbored distinct microbiota and could be divided into different clusters according to anatomic site, while the established patterns of bacterial diversity followed the human body habitat (feces, hands, saliva, and nasopharynx). Although significant inter-individual variation was observed, the healthy microbiota still shared a large number of phylotypes in each habitat, but not among the four habitats, indicating that a core microbiome existed in each healthy habitat. The vast majority of sequences from these different habitats were classified into different taxonmies that became the predominant bacteria of the healthy microbiota. Conclusions We first established the framework of microbial communities from four healthy human habitats of the same participants with similar living environments for the Chinese undergraduates. Our data represent an important step for determining the diversity of Chinese healthy microbiota, and can be used for more large-scale studies that focus on the interactions between healthy and diseases states for young Chinese adults in the same age range.


Background
An enormous number of microorganisms, the vast majority of which are bacterial species, are known to colonize and form complex communities, or microbiota, at various sites within the human body [1,2]. Microbial cells that thrive on and within the human body are approximately 10 times more numerous than our own cells and contain, in aggregate, approximately 100 times more genes, leading to the suggestion that humans and our microbial symbionts be considered "supraorganisms" [3]. The microbiota can be viewed as a forgotten "organ" exquisitely tuned to our physiology that performs functions that we have not had to evolve on our own [4]. A growing body of evidence suggests that the composition and function of microbiota in different human body habitats, including nasal passages, oral cavities, skin, gastrointestinal tract, and urogenital tract, play vital roles in human development, physiology, immunity, and nutrition [1,[5][6][7][8][9][10][11][12][13][14][15]. With the advent of molecular techniques, especially the development of nextgeneration high-throughput sequencing techniques, more and more bacterial phylotypes, which were undiscovered by traditional microbiological techniques, were found in these habitats of the human body. Prior studies have shown that the taxonomic composition of the microbial community may affect the propensity to develop obesity, inflammatory bowel disease, type 1 diabetes, cardiovascular diseases, bacterial vaginosis, dental caries and so on [7,[16][17][18][19][20][21][22]. Studies of healthy individuals, including the Chinese population, have focused on particular body habitats, such as the gastrointestinal tract, nasopharynx, skin, and oral cavity, using high-throughput sequencing techniques [16,[20][21][22][23][24][25][26][27][28], and have revealed microbial communities with significant intra-and inter-individual variations [27,29]. However, the five microbial habitats in the human body mentioned above are not isolated from one another; instead, each person comprises a complex, yet interconnected landscape consisting of many body habitats harboring distinctive microbiota [30]. Recently, the Human Microbiome Project (HMP) has analyzed the largest cohort and set of distinct, clinically relevant body habitats from healthy adults of a Western population to characterize the ecology of human-associated microbial communities using 454 pyrosequencing and Illumina sequencing [31,32]. However, little is known about the overall structure of microbiota from different body habitats in healthy Chinese individuals. A good definition of commensal microbiota in these habitats and an understanding of the relationship to health are essential in preventing and combating disease.
The purpose of the present study was to characterize the baseline bacterial communities from various human habitats, such as the nasopharynx, saliva, dominant hand and feces, of healthy male and female Chinese junior undergraduates boarding at Zhejiang University using parallel barcoded 454 pyrosequencing targeting the 16S rRNA gene V3 hypervariable region to gain a "whole-body" view [33,34]. The bacterial profiles of these healthy undergraduates, with the same living environmental circumstances and nearly the same living habits, such as diet, rest, exercise, and hygiene in the university, could represent the real healthy microbiota prevailing in Chinese undergraduates, which might be useful for human microbiota-related diseases analyzing the same age range.

Results and discussion
Distinct microbial diversity of different healthy human habitats Determining the role of our microbiota in disease predisposition and pathogenesis will critically depend upon first defining it in healthy states. Previous studies have shown that there are fundamental differences in the composition and structure of bacterial communities in the same habitats within the subjects of different ethnic and racial backgrounds [35,36]. However, most of the studies investigating microbial composition within a specific habitat using deep sequencing analysis have focused on sampling people from the West and some Asian countries [26,27,31,37,38], and little is known about the overall structure and composition of the healthy Chinese microbiota. This imbalance needs to be addressed to more fully understand the interactions between humans and their microbiota. Human microbial communities were strongly affected by external factors, such as lifestyle, dietary patterns, antibiotic usage, and host genotype [16,[39][40][41][42][43][44]. In China, undergraduate students, often boarding at the same university for 4 or 5 years, have adapted to the specific environmental circumstances of the university and formed almost the same lifestyle features, such as diet, rest, exercise, and hygiene after 2 years. Thus, Chinese undergraduate boarders might be an excellent model to explore the representative microbiota of the human body in the same age range. Unlike previous studies, the well-controlled similar living environmental circumstances for these participants could reduce the influence of various environmental factors on the healthy human microbiota and minimize the variations between participants [27,31,32]. The present study characterized the composition and structure of the healthy microbiota of the nasopharynx, saliva, dominant hands and feces from Chinese male and female boarding undergraduates, and established the baselines for human healthy microbiota, which might be useful in understanding the relationship between health and prevailing diseases in the Chinese population in the same age range. Because most of the undergraduates were not married, the vaginal microbiota from female participants was not included, as gynecologic examinations are not included in routine physical examinations for female participants in China. With nearly the same living environments and habits, as mentioned above, the diversity profiles of bacterial communities from these habitats could be considered as representative microbiota of healthy human bodies. Samples were only collected from a single time point, our data are in agreement with the conventional sampling habits in other studies [23,26,45,46]. Recent studies have also shown that withinsubject variation over time is consistently lower than between-subject variation, and that the uniqueness of the microbial community in each individual appears to be relatively stable over time (relative to the population as a whole), which suggests that a set of microbiomes in a given healthy habitat can be considered a core microbiome [2,31].
After removing sequences of insufficient quality and sequences that could not be adequately assigned, nearly 156,000 sequences remained from the four human habitats, including 30,521 reads from nasopharynx, 28,309 reads from saliva, 43,708 reads from dominant hands, and 54,179 reads from feces respectively ( Table 1). The average length of the sequences was 145 bp after trimming the primers. These high-quality pyrosequencing reads (95.41% of the total sequences) were classified using the RDP Classifier to assign taxonomic classifications to the sequences for ecologic analysis. Although the total number of sequences in each group was no more than in previous studies that mainly focused on a single human habitat [20][21][22][23][24], the average number of nearly 3000 sequences from each habitat in the same participant with a Good's coverage > 96% could represent the overall structure and composition of human microbiota in the specific healthy habitat. The indices of bacterial richness and diversity of OTUs at a 3% sequence dissimilarity level are summarized in Table 1 and Figure 1, and indicate that the bacterial alpha diversity inter-individual variations (Additional file 1: Table S2 and Additional file 2: Figure S1) among participants in each group]: feces > dominant hand > saliva > nasopharynx. However, the established patterns of the bacterial diversity from healthy Chinese undergraduates were not consistent with the results of the HMP, which showed that the oral and fecal communities were especially diverse in terms of community membership, while skin microbiota had less complex alpha diversity [31]. The disparity might be due to less Chinese participant enrollment, fewer sampling sites in each habitat, different sequencing targets, and less sequencing depths in our study. In combination with our previous studies, the diversity of nasopharyngeal microbiota might be as low as vaginal microbiota from healthy Chinese women of reproductive-age [20] when compared with the other three major habitats. In fact, the number of phylotypes detected in a sample, or the number of microorganisms discerned at any given phylogenetic level, was strongly affected by the sequencing depth. With a relatively larger total number of sequences, previous studies have shown that the bacterial diversity in the oral cavity was much more complex than the skin [23,24]. However, Costello et al. also claimed that several skin locations harbored more diverse communities than the gastrointestinal tract and oral cavity, which was consistent with the data herein [30]. The richness of total bacterial communities in the habitats was estimated by rarefaction analysis. The trend of the rarefaction curves also confirmed the differences in these habitats mentioned above; however, the unsaturated shapes of the rarefaction curves indicated that bacterial richness of the nasopharyngeal swabs, saliva, palmar swabs and feces was not yet completely sampled ( Figure 2).
We assessed the differences in overall bacterial community composition using a phylogeny-based metric (UniFrac) [47]. A relatively small UniFrac distance implied that two communities were compositionally similar, harboring lineages sharing a common evolutionary history. In turn, a large UniFrac distance indicated a quite different structure of microbiota in the habitat. A The operational taxonomic units (OTUs) were defined with 3% dissimilarity level. 2 The coverage percentage (Good's), richness estimators (ACE and Chao1) and diversity indices (Shannon and Simpson) were calculated using Good's method [66] and the mothur program, respectively.

3
The Shannon index of evenness was calculated with the formula E = H/ln(S), where H is the Shannon diversity index and S is the total number of sequences in that group. Figure 1 The Shannon index was used to estimate diversity (i.e., a combined assessment of the number of 3% dissimilar bacterial taxa and their abundance) among nasopharynx, saliva, dominant hand and feces from healthy Chinese undergraduates (Data shown as mean with SEM).
clear clustering according to anatomic site was observed using unweighted UniFrac, despite significant interindividual variability ( Figure 3). With the same living environmental circumstances and nearly the same living habits for these subjects, each anatomic location within a healthy individual would shape the composition of a microbial community specifically adapted to that site. Principal Coordinates Analysis (PCoA) was performed using weighted, normalized abundance data, demonstrating that primary clustering was by anatomic site, with the saliva, feces, dominant hands, and nasopharynx separate ( Figure 4). The data herein were also consistent with the results of a large-scale study from the HMP [30,31]. The current study showed that microbiota from one human habitat was distinct from other human habitats in the same individuals, visually and directly. In spite of significant inter-individual variations, the healthy male and female students still shared a great number of OTUs, and the shared OTUs were represented by the overlapping areas of circles in Venn diagrams, which could be used to understand the relative stable, consistent components across complex microbial assemblages ( Figure 5). The results herein indicated that there might be a core microbiome in the specific habitat of the healthy human body. However, the Venn diagrams showed that bacteria from the four different habitats of the healthy human body could not share a distinct core microbiome (only 37 OTUs shared) in each individual ( Figure 6), which was consistent with the results of HMP [31].

Taxonomic analysis of healthy microbiota in different habitats
The sequences collected for this study provide an overview of the healthy human microbiota. These sequences could be clustered into 29,887 unique 16S rRNA gene V3 tagged sequences, representing 15 known phyla or candidate divisions and 424 genera (Additional file 3: Table S1). In male or female students, the vast majority of sequences in the human body belonged to one of the  . Community differentiation was measured by using the unweighted UniFrac algorithm; the scale bar indicated the distance between clusters in UniFrac units. All of the branch nodes shown here were found to be significant, indicating that bacterial community from each habitat was divided into a distinctive cluster. five following major phyla: Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Proteobacteria ( Figure 7). In the nasopharynx, 14 phyla, including Actinobacteria, Bacteroidetes, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Fusobacteria, Proteobacteria, Tenericutes, and candidate division TM7 were found. However, most sequences (76.5%) were related to the following four phyla: Actinobacteria (38.1%), Bacteroidetes (1.4%), Firmicutes (31.7%), and Proteobacteria (5.4%), which was consistent with a previous study [48]. The healthy nasopharyngeal microbiota was the only site in which Actinobacteria predominated. It was unexpected that a great number of sequences (22.1%, most often in female students) in the nasopharynx are still unknown. At the genus level, sequences from nasopharyngeal swabs represented 194 different genera (148 genera from females and 132 genera from males). The most frequently detected genera (OTU proportions > 1%) were Corynebacterium (32.4%), Dolosigranulum (15.3% [male only]), Staphylococcus (7.7%), Lactobacillus (5.4%), Propionibacterium (2.0%), Gardnerella (1.5%), unclassified Actinomycetales (1.1%), Anaerococcus (1.1%) and Prevotella (1.1%) ( Figure 8A), which were considered the predominant bacteria, and were also detected using PCR-DGGE and cloning and sequencing techniques. The other 185 genera accounted for the remainder of the genus-level diversity (approximately 32.5% of the total number of sequences), indicative of a low abundance. At this stage, it is difficult to predict the role bacteria present in low numbers, play in nasopharyngeal ecology. However, it would be an oversimplification to neglect their presence. Laufer et al. (2011) reported that members of the nasopharyngeal healthy microbiota, i.e., Corynebacterium, Dolosigranulum, Staphylococcus, Lactococcus, and Propionibacterium, are protective against Streptococcus pneumoniae colonization and otitis media [49]. Streptococcus load is negatively correlated with Corynebacterium and Dolosigranulum, suggesting a competitive interaction between this potential otitis media pathogen and these two genera. Surprisingly, the data herein showed that Dolosigranulum (Carnobacteriaceae family) was only detected in male participants, and characterized the overall structure and composition of human nasopharyngeal healthy microbiota in a relatively deeper level with high-throughput sequencing techniques. This could be used to explore the relationship between  nasopharyngeal microbiota and allergic diseases in the future [50,51].
It is necessary to define the salivary microbiota of the healthy human body before determining the role of oral bacteria in diseases. Nine phyla, including Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes, Tenericutes, and two candidate divisions (SR1, TM7), were found [29]. The vast majority of sequences in saliva (99.2%) belonged to one of the following five major phyla: Actinobacteria (4.9%), Bacteroidetes (16.8%), Firmicutes (58.6%), Fusobacteria (3.0%) and Proteobacteria (15.9%) (Figure 7). In fact, the diversity of salivary microbiota was quite different from those in different ages and ethnic groups [23]. When compared with the salivary microbiota from Chinese undergraduates, Bacteroidetes (28.8%), Fusobacteria (5.2%), and Proteobacteria (19.5%) were overrepresented in healthy Chinese children [21], while Firmicutes sequences were more abundant in adult. These disparities might be due to different hygiene habits, different dietary constitution, and different physiologic changes, which were clearly associated with age. When compared with healthy adult Americans reported by Keijser (2008), a higher relative abundance of Bacteroidetes (26.6%) and Proteobacteria (19.9%), and a lower relative abundance of Firmicutes (38.5%) were found in salivary microbiota of Americans, which might be ascribed to different lifestyles and different host genotypes [23]. At the genus level, sequences from saliva represented 105 different genera (88 genera from females and 80 genera from males). The most frequently detected genera in saliva (OTU proportions > 1%) were Streptococcus (45.5%), Prevotella (10.1%), Neisseria (7.4%), Haemophilus (5.1%), Porphyromonas (4.7%), Gemella (3.9%), Rothia (3.0%), Granulicatella (2.2%), Fusobacterium (2.1%), Actinomyces (1.8%), Veillonella (1.7%) and Aggregatibacter (1.3%) (Figure 7 and Figure 8B). The overall taxonomic distribution of the 16S rRNA gene-based amplicons at the genus level mentioned above was in general agreement with previous findings [23], although the relative abundance of several genera was not consistent with the data herein, especially the relative abundance of Streptococcus. The higher relative abundance of Streptococcus, an acidproducing cariogenic bacterium, was more abundant in the Chinese population. The above-mentioned genera provide a framework for Chinese healthy salivary microbiota diversity and investigations of the environmental, dietary, and genetic factors that influence salivary microbiota in individuals. The continuous flow of saliva made it impossible to obtain a stable salivary microbiota, but only captured the temporal stability of salivary microbiota at a single time point.
In the dominant hands, 14 phyla, including Actinobacteria, Bacteroidetes, Chloroflexi, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes, Tenericutes, Verrucomicrobia, and three candidate divisions (OP10, SR1, and TM7) were found. Most of the sequences (97.6%) were divided into one of the following five major phyla: Actinobacteria (19.7%), Bacteroidetes (7.5%), Deinococcus-Thermus (9.8%), Firmicutes (23.7%) and Proteobacteria (36.9%) (Figure 7). Interestingly, the healthy skin microbiota from Chinese was the only site in which Proteobacteria predominated when compared with other human habitats. One predominant phylum, Deinococcus-Thermus, was detected in healthy skin microbiota with a relatively high abundance, which has been recognized solely in this habitat. Bacteria from the phylum Deinococcus-Thermus are known for their resistance to extreme stresses, including radiation, oxidation, desiccation, and high temperature. Perhaps due to the special environmental conditions of the hand (e.g., sebum production, salinity, and hydration), Deinococcus-Thermus was found with a higher relative abundance in the skin microbiota. However, the phylum of Deinococcus-Thermus could not be recognized as the predominant bacteria in skin through conventional 16S rRNA gene full-length cloning and sequencing technique [52], which might be ascribed to the methodologic constraints, such as lower resolution [34]. At the genus level, sequences from skin represented 351 different genera (277 genera from females and 262 genera from males). The most frequently detected genera in skin (OTU proportions > 1%) were Enhydrobacter (19.8%), Deinococcus (8.1%), Staphylococcus (7.6%), Propionibacterium (7.6%), Corynebacterium (6.5%), Lactobacillus (6.1%), Acinetobacter (4.3%), Chryseobacterium (3.8%), Streptococcus (2.6%), Finegoldia (1.8%), Sphingomonas (1.2%), and Prevotella (1.1%) (Figure 7 and 8C). Compared with the study conducted by Fierer et al. (2008), the taxonomic composition of skin microbiota of healthy Chinese undergraduates was quite different from healthy Americans [24]. In the present study, the genus, Enhydrobacter (Proteobacteria phylum), was the most predominant bacteria; however, the role of Enhydrobacter on skin microbiota is still unknown. Other genera, as mentioned above, have previously been found to be abundant in other molecular surveys of skin bacteria [24,25,53] and are considered to be common skin residents. A previous study showed that the male and female students harbored significantly different bacterial communities on the hand surfaces [24]. However, similar results between the two genders could not be obtained in Chinese adults. Our current pyrosequencing analysis of bacterial 16S rRNA gene amplicons generated directly from superficial human palms raises the possibility of better understanding of the normal microbial ecology of the skin, and for studying the role of novel microbes or microbial communities.
There were some limitations to the current study. First, a relatively small number of healthy unmarried Chinese undergraduates participated in the study, which would affect the comprehensive understanding of healthy human microbiota. The microbiota from healthy male and female urogenital tracts of Chinese adults would effectively complete the framework of the human microbiome. Second, only samples from these habitats at a single time point were included; samples from more time points would verify the representative healthy human microbiota. Third, it still could not be ignored the wide variations observed among participants in present study. Living in the similar conditions, the wide OTU differences reported (104-777 in saliva; 219-1215 in feces; 95-1148 in dominant hand; 103-439 in nasopharynx) might be due to the different sequencing depth and different host genotype. A much greater sequencing depth was needed for establishing a database of Chinese healthy microbiota.

Conclusions
In summary, the framework of microbial communities from four healthy human habitats of the same participants has been established for Chinese undergraduates for the first time. The overall structure of healthy microbiota from these habitats was quite different from each other, which might be due to the specific microenvironments of these anatomic locations. As commensal or symbiotic bacteria, these healthy microbial communities co-exist with their specific habitats in the long-term evolutionary process. Hence, the data herein represent an important step for determining the diversity of the Chinese healthy microbiota, and can be used for additional large-scale studies focusing on the interactions between healthy and disease states for young Chinese adults in the same age range. In fact, the essential roles of microbiota in human health and disease, especially novel microbes, are gaining recognition. With an integrated "whole-body" view, many human complex diseases are associated with microbiota from more than one habitat, such as cardiovascular diseases [55]. The role of microbiota in such human diseases will provide new insights regarding pathogenesis and treatment, and can be utilized as a novel therapeutic target for antibiotics, prebiotics, and probiotics, which indicates that the maintenance or restoration of normal microbiota plays a vital role in human health.

Study population
Ten healthy junior undergraduates 21-24 years of age boarding at the College of Medicine of Zhejiang University, including five male and five female students from two dormitories, were recruited in the present study. Having stayed in the university for > 2 years, these participants were adapted to the same living environmental circumstances, such as the same living spaces, dining hall, and drinking water, and formed nearly the same living habits, such as diet, rest, exercise, and hygiene (including brushing teeth and washing in the morning and evening). Before sampling, the students presented to our hospital in July 2010 for routine medical examinations to confirm each subject's health status. The following exclusion criteria were followed: body mass index (BMI) > 24 kg/m 2 ; hypertension; diabetes mellitus; dyslipidemia; allergic rhinitis; asthma; pharyngitis; dental caries; periodontal disease; various dermatoses; gastrointestinal diseases (diarrhea, constipation, ulcers, bleeding, and tumors); the use of antibiotics, probiotics, prebiotics, or synbiotics in the last 30 days; vegetarian; smoking; alcohol consumption; menstruation at the time of sampling; and known active bacterial, fungal, or viral infections. Informed written consent was obtained from all participants prior to enrollment with approval of the Ethics Committee of the First Affiliated Hospital, College of Medicine of Zhejiang University (Zhejiang province, China).

Sample collection
After enrollment, specimens (nasopharyngeal swabs, saliva, dominant palmar swabs, and feces) were collected in our hospital from every participant before washing hands, brushing teeth, or having their morning breakfast. Nasopharyngeal specimens were obtained with flexible, sterile, dry swabs, which could reach the posterior nasopharynx easily (approximately 2 inches) and were gently rotated five times around the inside of both nostrils while applying constant pressure. The palmar surfaces of the dominant hands (usually the right hand) were swabbed with cotton-tipped swabs moistened with a solution of 0.15 M NaCl and 0.1% Tween-20. The entire palmar surface was swabbed in two perpendicular directions to ensure that the maximum surface area of the palm was represented in the sample. A 1-ml spontaneous, unstimulated whole saliva specimen was collected into a sterile cryogenic vial (Corning, NY, USA) and fecal samples (300 mg/tube) were collected in sterile tubes. Saliva and fecal samples were put in aliquots before freezing. All samples for bacterial genomic DNA extraction were transferred immediately to the laboratory in an ice box, and stored at −80°C after preparation within 15 min.

Total bacterial genomic DNA extraction
The bacterial cells retrieved on nasopharyngeal swabs and palmar swabs were submerged in 1 ml of sterile normal saline (prepared with RNase-free H 2 O [pH 7.0]) and vigorously agitated to dislodge cells, while unused swabs served as negative controls, which were handled in the same way. The cells were pelleted by centrifugation (Thermo Electron Corporation, Boston, MA, USA) at full speed (≥ 10,000 g) for 10 min, washed by re-suspending the cells in sterile normal saline, and centrifuged at full speed for 5 min. Then, bacterial DNA was extracted from the swabs using a QIAamp® DNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions, with the following minor modification: samples were agitated with 100 mg of zirconium beads (0.1 mm) in a Mini-beadbeater (FastPrep, Thermo Electron Corporation) for 2 min and then incubated at 56°C for 1 h in lysis solution containing proteinase K [20,21]. Bacteria were pelleted from saliva by centrifugation at full speed for 10 min. Then, bacterial genomic DNA was extracted using a QIAamp® DNA Mini Kit with the aforementioned modifications. Stool bacterial genomic DNA was extracted using a QIAamp® DNA Stool Mini Kit (QIAGEN) according to the manufacturer's instructions [22]. Bacterial genomic DNA was eluted with elution buffer and stored at −20°C for further analysis.

PCR and pyrosequencing
PCR amplification of the 16S rRNA gene hypervariable V3-region was performed with universal bacterial primers, which correspond to positions 341 to 534 of the conserved Escherichia coli 16S rRNA gene sequence. Amplicon pyrosequencing was performed with standard 454/Roche GS-FLX Titanium protocols, where equal molar of PCR product from each sample was pooled. To pool and sort multiple samples in a single 454 GS-FLX run, we used a set of 8-bp barcodes designed according to Fierer et al. [20,21,24,[56][57][58][59]. The main criterion of the barcodes is that the adjoining nucleotides are different because the single nucleotide repeats are the main source of errors in pyrosequencing technology. The forward primer was a fusion of the 454 Life Science adaptor A, the barcode, and the 341F primer (5′-GCCTCCCTCGCGCCATCAG-NNNNNNNN-CCTACGGGAGGCAGCAG-3′). The reverse primer was a fusion of the 454 life science adaptor B, the same barcode with the forward primer, and 534R (5′-GCCTTGCCAGCCCGCTCA-NNNNNNNN-ATTACCGCGG CTGCTGG-3′). The PCR amplicon library was created for each individual DNA sample. The PCRs were carried out in triplicate 50 μL reactions with 0.6 μM each of the primer, 20 ng of template DNA, and 1× PCR reaction buffer, 2.5 U of pfu DNA polymerase (MBI, Fermentas, USA). The amplification program consisted of an initial denaturation step at 94°C for 4 min, followed by 25 cycles, where 1 cycle consisted of 94°C for 30 s (denaturation), 55°C for 30 s (annealing) and 72°C for 30 s (extension), and a final extension of 72°C for 10 min. During amplification, negative controls were also performed. Replicate PCR products of the same sample were assembled within a PCR tube. PCR was performed with a PCR thermocycler (Bio-Rad Laboratories, Hercules, California, USA). Prior to sequencing, the DNA concentration of each PCR product was extracted with the MiniElute® Gel Extraction Kit (QIAGEN) and quantified on a NanoDrop ND-1000 spectrophotometer (Thermo Electron Corporation). The products from different samples were mixed at equal ratios for pyrosequencing with the Genome Sequencer FLX (GS-FLX) system (Roche, Basel, Switzerland) at 454 Life Sciences. Emulsion PCR and pyrosequencing were performed according to the manufacturer's recommendations [60]. Because samples were pooled by equal mass, a variation in the number of sequences recovered from each sample likely reflected slight biases in PCR efficiency among the primer barcodes.

Bioinformatic analysis
Raw pyrosequencing reads obtained from the sequencer were denoised using the "pre.cluster" command (http:// www.mothur.org/wiki/Pre.cluster) in mothur platform (version 1.25.0; http://www.mothur.org) [61] to remove sequences that are likely due to pyrosequencing error [62]. Using mothur implementation of ChimeraSlayer (http://www.mothur.org/wiki/ChimeraSlayer), chimera sequences arising from the PCR amplification were detected and excluded from the denoised sequences [63]. All pyrosequencing reads were filtered using barcode and primer sequences using a combination of tools from mothur and custom Perl scripts. The resulting sequences were further screened and filtered for quality and length. Sequences were included in the subsequent analysis only if the sequences met all four of the following criteria: (1) the sequence carries the correct barcode and exact match to the primer in at least one end; (2) the sequence carries the correct primer sequence at the other end, even though the barcode is absent; (3) the sequence has a length > 160 nucleotides (excluding barcode and primer A sequences) [64]; and (4) the sequence has no ambiguous bases (Ns) and homopolymers longer than 8 nucleotides. A total of 156,717 high-quality pyrosequencing reads were produced according to barcode-and primer-sequence filtering, which were used for the following analysis.
The high-quality pyrosequencing reads were assigned to samples according to barcodes (details shown in Additional file 3: Table S1). Sequences were aligned in accordance with the SILVA alignment [61,65] and clustered into operational taxonomic units (OTUs). The OTUs that reached at a 97% similarity level were used for alpha diversity (Shannon, Simpson, and Evenness), richness (ACE and Chao1), Good's coverage [66], Venn diagram, and rarefaction curve analysis using the mothur program. Taxonomy-based analyses were performed by classifying each sequence using the Naïve Bayesian Classifier program of the Michigan State University Center for Microbial Ecology Ribosomal Database Project (RDP) database (http://rdp.cme.msu.edu/) [67,68] with a 50% bootstrap score. Phylogenetic beta diversity measures such as unweighted UniFrac distance metrics analysis was performed using OTUs for each sample using the mothur program [47,69], and principal coordinate analysis (PCoA) was conducted according to the distance matrix created by the mothur program and visualized using SigmaPlot.

Statistical analysis
Statistical analyses were performed using the SPSS Data Analysis Program (version 16.0; SPSS Inc., Chicago, IL, USA) with a 2-tailed t-test or One-Way ANOVA. All tests for significance were two-sided, and p values < 0.05 were considered statistically significant.

Accession numbers
The high-quality sequence data from this study has been deposited in the GenBank Sequence Read Archive with accession number SRP016050.