Using pyrosequencing to shed light on deep mine microbial ecology
© Edwards et al; licensee BioMed Central Ltd. 2006
Received: 04 November 2005
Accepted: 20 March 2006
Published: 20 March 2006
Contrasting biological, chemical and hydrogeological analyses highlights the fundamental processes that shape different environments. Generating and interpreting the biological sequence data was a costly and time-consuming process in defining an environment. Here we have used pyrosequencing, a rapid and relatively inexpensive sequencing technology, to generate environmental genome sequences from two sites in the Soudan Mine, Minnesota, USA. These sites were adjacent to each other, but differed significantly in chemistry and hydrogeology.
Comparisons of the microbes and the subsystems identified in the two samples highlighted important differences in metabolic potential in each environment. The microbes were performing distinct biochemistry on the available substrates, and subsystems such as carbon utilization, iron acquisition mechanisms, nitrogen assimilation, and respiratory pathways separated the two communities. Although the correlation between much of the microbial metabolism occurring and the geochemical conditions from which the samples were isolated could be explained, the reason for the presence of many pathways in these environments remains to be determined. Despite being physically close, these two communities were markedly different from each other. In addition, the communities were also completely different from other microbial communities sequenced to date.
We anticipate that pyrosequencing will be widely used to sequence environmental samples because of the speed, cost, and technical advantages. Furthermore, subsystem comparisons rapidly identify the important metabolisms employed by the microbes in different environments.
Banded iron formations started appearing ~3,700 million years ago when localized sea floor cyanobacterial photosynthesis raised oxygen concentrations high enough that dissolved iron precipitated. That iron powered the industrial revolution. The Soudan Iron Mine in Minnesota, USA was active from 1884 to 1962, and during this period 17.9 million tons of iron ore, primarily hematite, were removed. Nowadays the mine is used as a state park and as a facility for high-energy physics experiments.
Metagenomics is a term used to describe "the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample"[1, 2]. Random shotgun sequencing of DNA from natural communities has been used to characterize seawater, sediment, and fecal viral communities [2–5], as well as the microbial communities in soil, whale falls, seawater and the Iron Mountain acid mine drainage (AMD) [6–10]. Comparative metagenomics was introduced recently, identifying those sets of genes that distinguish environmental samples. For example, samples from the surface of the ocean contain significantly more photosynthetic genes than soil or other samples[6, 8, 10]. We have used comparative metagenomics to characterize the metabolic potential of different environments, and identify those genes, pathways, and subsystems that are more common in any particular environment .
Most current sequencing is a modification of the classical Sanger method, where extending DNA fragments are stopped by the random incorporation of a fluorescently labeled ddNTP. The different-sized fragments are then separated using capillary gel electrophoresis and detected with a LASER. Pyrosequencing is a fundamentally different methodology because only one dNTP is added into the reaction at a time [12–14]. If there is a complementary base, then the DNA polymerase catalyzes the reaction and releases pyrophosphate. ATP sulfurylase uses the pyrophosphate to produce ATP in the presence of adenosine 5' phosphosulfate (APS). A Charge-Coupled Device (CCD) measures the light produced when the ATP is used by luciferase to convert luciferin to oxyluciferin. 454 Life Sciences has scaled this process up to be massively parallel, determining the composition of more than 300,000 sequences at once, for approximately the same price as 96 to 192 sequencing reactions performed using traditional chemistries. In addition to the massive parallelization, the 454 technology does not require cloning of the environmental samples, thus eliminating many of the problems that are associated with this step of metagenomics.
This report describes the first application of pyrosequencing to environmental samples. From this sequence data, we identify the 16S rDNA sequences present in the sample, and apply new annotation methods to this data using the SEED database. This paper also describes a comprehensive statistical treatment of the genes identified in each sample using a completely novel methodology that exploits the differences between metagenome sequences. We demonstrate that completely unique microbial communities inhabit proximate environments joined by a common watercourse, and that using metagenomics we can identify the unique metabolic potentials prevalent in each environment such as their mechanisms of iron acquisition and respiration. The integration of pyrosequencing, subsystems analysis, comparative metagenomics, statistics, hydrogeology, and chemistry provides a comprehensive systems analysis of the Soudan Mine.
Results and discussion
Description of the environmental samples
The first two pyrosequences of environmental samples
Summary of pyrosequence data from the Soudan Mine
Number of Sequences
Total Length of Sequences
Average Length of Sequences
Average Quality Score1
The two samples produced more than 70 Mbp of sequence data from over 700,000 sequences, and there was no significant skew in the sequence data (as measured by dinucleotide frequency) when the data generated by pyrosequencing was compared to complete genome sequences.
16S rDNA analysis of the samples
A16S clone library was created from the Red sample to validate the 454 sequencing approach. Ninety-six clones were sequenced using traditional techniques, and compared to the 16S rDNA database from the Ribosomal Database Project . The congruity between the 16S genes sequenced in the 454 library and the 16S sequences from the clone library, as shown in Fig. 2, is quite remarkable.
We also used the 16S sequences to evaluate the randomness of the library. An analysis of 160 bacterial genome sequences in the SEED database [15, 19] with annotated 16S genes showed that about 1 in 105 bases is from a 16S gene. Based on this estimate, as a rule of thumb the Soudan samples are expected to contain approximately 3,000 bases of 16S sequence in total, or approximately 30 sequences. Twenty four sequences were found to have significant similarity (with an E value less than 1 × 10-5 and a match of 50 bp or more) to 16S rDNA from the Black sample and seventy six sequences were found to have significant similarity to 16S rDNA from the Red sample.
Metabolic potential from the metagenome library
Subsystems enriched in the Black or Red samples
Subsystems statistically more likely to be present in either the Red or Black samples. These subsystems are more frequently found among sequences from either the Red or Black samples with a sample size of 5,000 proteins, 20,000 repeated samples, and P < 0.05.
Red Sample (Oxidized, pH4.37, E h -8)
Black Sample (Reduced, pH 6.70, E h -142)
Amino Acids and Derivatives
Branched-chain amino acid biosynthesis
Leucine degradation and HMG-CoA metabolism
Cell Division and Cell Cycle
Cell Wall and Capsule
N-linked glycosylation in Bacteria
Teichoic acid biosynthesis
Cofactors, Vitamins, Prosthetic Groups, Pigments
Coenzyme A biosynthesis in pathogens
Pyruvate metabolism I: anaplerotic rx, PEP
Ubiquinone menaquinone-cytochrome c reductase
NAD and NADP cofactor biosynthesis global
Coenzyme PQQ synthesis
Pyrroloquinoline quinone biosynthesis
Siderophore enterobactin biosynthesis
Siderophore enterobactin biosynthesis and ferric enterobactin transport
DNA repair, bacterial
Fatty Acids and Lipids
Fatty acid metabolism
Glycerolipid and glycerphospholipid metabolism
Fatty acid oxidation pathway
ABC transporter maltose
ABC transporter ferrichrome
ABC transporter heme
CbiQO-type ABC transporter systems
Sodium hydrogen antiporter
Metabolism of aromatic compounds
Phenylacetate pathway of aromatic compound degradation
Homogentisate pathway of aromatic compound degradation
Motility and Chemotaxis
Nucleosides and Nucleotides
De novo purine biosynthesis
Ribosome LSU bacterial
Ribosome SSU bacterial
Translation factors bacterial
F0F1-type ATP synthase
NiFe hydrogenase maturation
Terminal cytochrome C oxidases
Membrane-bound Ni, Fe-hydrogenase
Na(+)-translocating NADH-quinone oxidoreductase and rnf-like group of electron transport complexes
Respiratory complex I
Respiratory dehydrogenases 1
RNA polymerase bacterial
Glutathione redox metabolism
Resistance to fluoroquinolones
Water chemistry from Soudan Mine. No significant differences were found for Ca, Mg, Na, K, Li, Al, Mn, Sr, Ba, Si, Cr, Co, Ni, Cu, Zn, As, Se, Rb, Cd, Cs, Pb, total alkalitity, lactate, acetate, formate, chlorate, oxalate, and trace elements.
Total N (ppm)
This analysis demonstrates that by combining pyrosequencing, subsystems analysis, and comparative metagenomics the microbiology of different environments can be correlated with the chemistry and hydrogeology of those environments to identify significant ecological differences between them.
Comparisons between Soudan and Iron Mountain communities
A previous study used Sanger sequencing to determine the metagenome of the Iron Mountain community. The environmental differences (such as the difference in temperature) account for the predominant differences between the microbial communities. The organismal differences are reflected in the individual biochemistries of the samples [see Additional files 4 and 5]. For example, the AMD metagenome contains significantly more occurrences of Archaea-specific subsystems such as those involved in protein biosynthesis than the Soudan samples. The AMD sample has a preference for CO2 fixation and simple carbohydrate metabolism when compared to either of the Soudan samples. There are also many currently unexplained differences between subsystems found in these environments that must relate the biology of the organisms to the chemistry of the environment.
Comparisons between Soudan and other metagenome sequences
This is the first metagenome analysis performed using pyrosequencing, which is approximately 10 to 30 times cheaper than current Sanger sequencing. Pyrosequencing also eliminates the need for cloning, thus removing the potential for both aberrant recombinants in the surrogate host and for cloning-related artifacts such as counterselection against potentially toxic genes such as those found on phages. The main concerns with current pyrosequencing technology are the short length of sequence fragments (average of 105 bp in this study), and the requirement to use whole genome amplification to generate sufficient DNA for sequencing from environmental libraries The former may make it difficult to accurately assemble genomes in the absence of a scaffold, while the later may bias these analyses. Our preliminary unpublished data suggests that the whole genome amplification bias is minimal, and is preferentially towards the ends of linear pieces of DNA (Haynes, Rayhawk, Edwards, Rohwer; unpublished). Since these biases are applied equally to both libraries, they will be negated during the comparative study to highlight differences between metagenomes. Nonetheless, the short fragments are sufficient to determine statistically significant differences between metagenomes that reflect the most likely biology occurring in each environment. The low cost, high yield of pyrosequencing combined with statistical analyses on the abundance of subsystems in the samples allows the rapid identification of key processes driving the metabolism of different environments.
The systems approach of integrating biology, chemistry, and geology has yielded significant insights into the metabolism of two environments in the Soudan Mine. The oxidized sample is using aerobic respiratory pathways while the reduced sample is using anaerobic pathways. Nitrogen assimilation, iron acquisition, and sulfur metabolism are all differentiated between these two samples from close proximity within the same mine. However, many more significant differences between the samples remain unexplained by our current knowledge of bacterial physiology and metabolism. Explaining these differences will be a grand challenge for the future. By combining pyrosequencing, subsystems analysis, comparative metagenomics, and statistics, Occam has used his razor on metagenomics.
Sample collections, microbial enumeration, and DNA extraction
Samples were collected from several sites in the Soudan Mine. This analysis concerns the sample collection at two sites on Level 27 (714 m below the surface; Figure 1). Water and sediments were sampled from the two locations shown in Figure 1 giving the "Black" (reduced) sample and "Red" (oxidized) sample. Microbes were concentrated from these samples by filtration with 0.22 μm Sterivex units. Microbial counts were enumerated by staining the samples with SYBR-Gold (Invitrogen, Carlsbad, CA) and visualization with an epifluorescent microscope . DNA was extracted from the microbial sample using either the Ultra Clean Soil DNA Kit or Power Soil Kit (MolBio, Boulder, CO). The DNA was amplified with GenomiPhi (GE Healthcare, Piscataway, NJ) in an Eppendorf thermal cycler (Eppendorf, Westbury, NY) using multiple reactions containing 50–100 ng of the isolated DNA as template and the manufacturer's recommended protocols. After amplification, the resulting DNA was purified with silica columns (Qiagen, Valencia, CA) and concentrated by ethanol precipitation. The DNA was resuspended in water to a final concentration of 0.3 mg/ml. Approximately 10 μg of each sample was sequenced using the pyrosequencing technology (454 Life Sciences, Branford, CT).
Bacterial-specific primers 27F (5'-AGAGTTTGATCMTGGCTCAG-3') and the universal 1492R primer (5'-TACGGYTACCTTGTTACGACTT-3')  were used to amplify the 16S rDNA genes. PCR products were cloned into the pCR®4-TOPO® vector as recommended by the manufacturer (Invitrogen, Carlsbad, CA).
Water and mineral analyses
Water samples were collected by filtering the water through 0.2 μm filters into clean bottles. Field measurements of pH, Eh, temperature and conductivity were conducted in situ. The sediment samples were collected as slurries with a pipette resting on and in the sediments. Those slurries were transferred to clean centrifuge tubes, allowed to settle by gravity and then the fluid was decanted.
Major anions in the water were determined by GC (Dionex IGS-2000, Sunnyvale, CA) and major and trace elements by ICP/MS (Thermo Electron PQ ExCell, Franklin, MA). The mineral identifications are based on XRD (Bruker-AX D500 X-ray Diffractometer, Germany) measurements. The X-ray peaks were relatively small. Much of the sediment was apparently not well crystallized.
The unassembled sequences provided by 454 were compared to the SEED database using the BLASTX algorithm on the Teragrid cluster at Argonne National Laboratories[15, 23]. All BLAST searches were performed using an expect value cutoff of 1 × 10-5. At this cutoff approximately 3 of the observed hits would be expected to occur at random.
The BLASTN algorithm was used to identify 16S genes from release 9 of the RDP database [16, 24]. These BLAST searches were also performed using an expect value cutoff of 1 × 10-5 and a minimum sequence match length of 50 nt.
Statistical analyses of metagenome datasets
The statistical analysis of subsystems present in each sample was performed essentially as described elsewhere . The presence or absence of subsystems between two data sets was determined using 20,000 replicates of samples of 5,000 subsystems each. The 95% confidence interval for the median was constructed using the 0.025 and 0.975 percentiles.
The authors are grateful to Bill Miller, Director of the Soudan Facility, for arranging the sampling trips, ploughing through paperwork, and bringing these fascinating microbial communities to our attention. Thanks to Jim Essig, for guiding us to Level 10, arranging our sampling protocols, and generally helping out. Tony Zavodnick, Paul Paulisich, and Jack Zorman provided invaluable guidance and explanations of mining at the Soudan Mine site and the whole Soudan mine and facility crews for making our sampling trip so enjoyable. In addition, we thank Robert Olson (Argonne National Labs) for assistance with the computational analysis. This work was supported by a grant NSF DEB-BE 04-21955 from the NSF Biocomplexity program (to FR).
- Riesenfeld CS, Schloss PD, Handelsman J: Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004, 38: 525-552. 10.1146/annurev.genet.38.072902.091216.PubMedView Article
- Edwards RA, Rohwer F: Viral metagenomics. Nat Rev Microbiol. 2005, 3 (6): 504-510. 10.1038/nrmicro1163.PubMedView Article
- Breitbart M, Felts B, Kelley S, Mahaffy JM, Nulton J, Salamon P, Rohwer F: Diversity and population structure of a near-shore marine-sediment viral community. Proc R Soc Lond B Biol Sci. 2004, 271 (1539): 565-574. 10.1098/rspb.2003.2628.View Article
- Breitbart M, Hewson I, Felts B, Mahaffy JM, Nulton J, Salamon P, Rohwer F: Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003, 185 (20): 6220-6223. 10.1128/JB.185.20.6220-6223.2003.PubMedPubMed CentralView Article
- Breitbart M, Salamon P, Andresen B, Mahaffy JM, Segall AM, Mead D, Azam F, Rohwer F: Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci U S A. 2002, 99 (22): 14250-14255. 10.1073/pnas.202488399.PubMedPubMed CentralView Article
- Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM: Comparative metagenomics of microbial communities. Science. 2005, 308 (5721): 554-557. 10.1126/science.1107851.PubMedView Article
- Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004, 428 (6978): 37-43. 10.1038/nature02340.PubMedView Article
- DeLong EF: Microbial community genomics in the ocean. Nat Rev Microbiol. 2005, 3 (6): 459-469. 10.1038/nrmicro1158.PubMedView Article
- Cann AJ, Fandrich SE, Heaphy S: Analysis of the virus population present in equine faeces indicates the presence of hundreds of uncharacterized virus genomes. Virus Genes. 2005, 30 (2): 151-156. 10.1007/s11262-004-5624-3.PubMedView Article
- Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004, 304 (5667): 66-74. 10.1126/science.1093857.PubMedView Article
- Rodriguez-Brito B, Rohwer F, Edwards RA: An application of statistics to comparative metagenomics. BMC Bioinformatics. 2006,
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005
- Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P: Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem. 1996, 242 (1): 84-89. 10.1006/abio.1996.0432.PubMedView Article
- Ronaghi M, Uhlen M, Nyren P: A sequencing method based on real-time pyrophosphate. Science. 1998, 281 (5375): 363, 365-10.1126/science.281.5375.363.PubMedView Article
- Overbeek R, Begley T, Butler R, Choudhuri J, Chuang H, Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank E, Gerdes S, Glass E, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy A, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch G, Rodionov D, Rückert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes . Nucleic Acids Res. 2005
- Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM: The ribosomal database project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005, 33 (Database issue): D294-6. 10.1093/nar/gki038.PubMedPubMed CentralView Article
- Klubek B, Schmidt C, Burnham H: Characterization of soil bacteria that desulphurize organic sulphur compounds .1. Classification and growth studies. Microbios. 1996, 88 (357): 223-236.
- Turtura GC, Perfetto A, Lorenzelli P: Microbiological investigation on black crusts from open-air stone monuments of Bologna (Italy). Microbiologica. 2000, 23 (2): 207-228.PubMed
- The SEED. [http://theseed.uchicago.edu/FIG/index.cgi]
- Overbeek R, Disz T, Stevens R: The SEED: A peer-to-peer environment for genome annotation. Commun ACM. 2004, 47 (11): 46-51. 10.1145/1029496.1029525.View Article
- Noble RT, Fuhrman JA: Use of SYBR Green I for rapid epifluorescence counts of marine viruses and bacteria. Aquat Microb Ecol. 1998, 14 (2): 113-118.View Article
- Amann RI, Ludwig W, Schleifer KH: Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. 1995, 59 (1): 143-169.PubMedPubMed Central
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410. 10.1006/jmbi.1990.9999.PubMedView Article
- Ribosomal database project - II. [http://rdp.cme.msu.edu/]
- FIG Subsystem Forum. [http://www.subsys.info]
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.