Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures
© Pride and Schoenfeld; licensee BioMed Central Ltd. 2008
Received: 10 April 2008
Accepted: 17 September 2008
Published: 17 September 2008
Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses.
From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes.
That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
The study of metagenomes has provided important insights into physiological processes and into the diversity of microbial and viral communities in different environments [1, 2]. Metagenomic analysis is based on high-throughput DNA sequencing of clone libraries of mass-isolated cells or viral particles from different ecological environments, and is strictly defined as the study of those organisms that inhabit a given biological niche. Such community analysis has contributed to an improved understanding of microbial community structure, and can provide a broader perspective on microbial community composition and function than analysis of 16s rDNA.
Over the past decade, it has become increasingly clear that viruses are a significant component of every ecological niche in which cellular life exists. Abundances ranging from 104 to 108 virus-like-particles per milliliter have been detected in virtually every aquatic environment studied , although abundances in hot springs are generally at the lower end of this range . Estimates of viral diversity suggest that several thousand different viral types exist in a given pool, probably having a profound impact on population structure and genomic content of host populations [5–8].
Studies of viral diversity have been hampered by the absence of universal signature sequences (e.g. 16S rDNA). Metagenomic analysis has provided much of the population-level insight into diversity and distribution of viruses in the environment . The few studies addressing bacteriophage and archaeal viral assemblages have led to deeper understandings of the diversity present in these communities and may aid in the determination of how the presence of certain viruses may shape microbial communities [7, 10]; however, these studies also have highlighted the need for improved approaches in the analysis of viral metagenomes. In each of the studied viral metagenomes, a large proportion of sequences had no significant homologs identified in Genbank non-redundant database [9, 11–13]. Furthermore, in a recent viral metagenome survey in thermal environments, half of the sequences had no BLASTx homolog in the Genbank nr database , similar to results found in marine and estuarine environments [9, 11], presumably due to the relative dearth of annotated thermophilic viral sequences in Genbank. While all of the unidentified sequences in thermal virus metagenomes presumably represent bacteriophage or archaeal viruses, neither the host nor types of virus can be ascertained [4, 12]. Since, to date, BLAST alignments  have been the predominant means of associating a viral metagenomic sequence with a likely host, the lack of significant homology between most of viral metagenomic sequences and sequences in Genbank has impeded our understanding of host-virus relationships.
Genome signature analysis of DNA sequences measures biases in DNA oligonucleotide composition rather than sequence similarity, and is studied in an alignment-independent manner [15–18]. For each genome or portion of genome with detectable differences, the genome signature for each sequence analyzed will be unique [15, 19]. Previous data has demonstrated that after their divergence, microbes retain patterns of genome signature reflective of their recent common ancestry similar to that of 16s rDNA . Utilizing this quality of the genome signature, the technique now has been adapted to predict the ancestry of eukaryal, archaeal, and bacterial metagenomic sequences .
The classification of viruses has traditionally been based on morphological characteristics [21, 22]. This classification system is widely used for cultivated viruses, which significantly biases our view of diversity . Attempts have been made to correlate sequences and morphologies , but these have proven less useful in extreme thermal environments. The absence of a universal signature gene has hampered classification of viral genomes based on genomic sequences Recent studies of bacteriophages have identified conserved patterns of oligonucleotides used as genome signatures unique to each genome analyzed that appear to be co-evolving with their hosts . In contrast, these patterns are shared for groups of eukaryotic viruses in a manner largely independent of their host .
Terrestrial thermal aquifers are vast ecosystems with abundances of microbes and viruses approaching those of the ocean [4, 12]. At temperatures > 74°Celsius, the hot springs in this study are significantly above the temperature limit for eukaryotic life, generally accepted to be around 62°C, and therefore, harbor communities strictly composed of Bacteria, Archaea, and their respective viruses . While comprehensive studies of viral communities in these extreme environments are just beginning, culture-based studies have indicated the presence of bacteriophages of the bacterial Genus Thermus [27, 28], as well as archaeal viruses of the archaeal Genera Sulfolobus, Pyrobaculum, Acidianus, and Thermoproteus .
We sought to develop new methods based on genome signature to apply to analysis of viral communities from two separate thermal pools, Bear Paw and Octopus Springs, in Yellowstone National Park. Our goals were to: 1) develop the technique of genome signature-based phylogenetic classification (GSPC) to accurately predict the presumed host/virus relationships of known bacteriophages, 2) analyze the differences between viral metagenomes from Bear Paw and Octopus hot springs, 3) apply the GSPC technique to viral metagenomes to predict the microbial host of unknown members of the viral community, and 4) apply GSPC to classify the viruses present in Bear Paw and Octopus hot springs.
GSPC for known bacteriophages based on a microbial database
To compare viral genome signatures with those of Bacteria and Archaea, we constructed a microbial database of oligonucleotide frequencies for all currently sequenced Bacteria and Archaea. The database contains frequencies of all dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide combinations for each genome. To determine similarity between known bacteriophages and their potential hosts in the microbial database, Euclidean distances based on the sum of the differences for all oligonucleotide combinations were determined for each known bacteriophage and each database genome. The resulting distance matrix was subjected to neighbor-joining analysis, and phylogenies were used to classify the known bacteriophages. In cases where the known bacteriophages were positioned monophyletically, the bacteriophages were classified based on the Kingdom, Phylum, Class, Order, Family, and Genus of that monophyletic group. In cases where the known bacteriophages were positioned paraphyletically, the bacteriophages were classified based on the branches deep to that paraphyletic position.
GSPC for hot spring metagenomes based on a microbial database
We analyzed two separate hot springs, Bear Paw and Octopus in Yellowstone National Park, to gain a deeper understanding of the viral populations native to each habitat. Each hot spring is located within 5 kilometers of the other, with Bear Paw characterized by a surface temperature of 74°Celsius, visible pigmented microbial growth at the surface, a pH near 8.0, and estimated phage abundance from 105 to 106 particles per milliliter. The Octopus hot spring is characterized by a pH near 8.0, a surface temperature of 93°Celsius, an estimated viral abundances from 105 to 106 particles per milliliter, and no visible growth at the surface . Both thermophilic Bacteria and Archaea previously have been identified in Octopus hot spring, with Thermocrinis and Aquificales predominating among the sediment and filament Bacteria [31–33]. Due to the high temperatures, these hot springs are both devoid of eukaryotic life .
Collective analysis of Bear Paw and Octopus hot springs
GSPC for known bacteriophages based on a viral database
Previous data indicates that when analyzing genome signature for diverse groups of viruses, they segregate largely according to their Family designation . Furthermore, double stranded DNA viruses, including bacteriophages and archaeal viruses, cluster separately from other types of viruses such as single stranded DNA viruses and RNA viruses. Based on genome signature, bacteriophages typically segregate either according to their bacterial hosts or their Family designation (e.g. Podoviridae, Myoviridae, or Siphoviridae) .
GSPC for hot spring metagenomes based on a viral database
GSPC based on a combination database
Previous data indicates that when bacteriophages and their bacterial hosts are included in genome signature-based phylogenies, bacteriophages tend to cluster together near their bacterial hosts . We hypothesize that this clustering represents a limitation in the ability of bacteriophages to fully ameliorate to their host genome signature, and may be necessary for the bacteriophages to maintain host range . The metagenomes from Bear Paw and Octopus hot springs, are limited to bacteriophages and archaeal viruses based on previous analysis of the contigs . We constructed a database containing all sequenced Archaea, Bacteria, and viruses to determine if Bear Paw and Octopus metagenomic contigs have viral signatures or microbial signatures. Using GSPC based on this combination database, each of the contigs from Bear Paw and Octopus were classified similarly to their classification based on the viral database (Additional file 3), further suggesting their origin as bacteriophages and archaeal viruses.
Other methods of metagenomic classification
Contig Identification Summary
Number of Sequences
Another method for identification of metagenomic sequence fragments based on oligonucleotide sequence biases is the application Phylopythia, which uses a support vector machine to classify sequence fragments according to its database of oligonucleotide biases that includes Archaea, Bacteria, and Eukarya . Previous data has demonstrated that for both bacterial and archaeal DNA fragments, the technique is quite robust in assigning fragments to different taxonomic classes . While not developed specifically for bacteriophages, we applied Phylopythia towards the identification of metagenomic contigs from Bear Paw and Octopus. Phylopythia classified 91% of the sequence contigs from Bear Paw, and 91% from Octopus (Table 1). Many of the sequences could not be identified beyond the level of Kingdom or Class (Additional file 2). Some sequences were classified to eukaryotic Classes (including Ascomycota, Insecta, Sordariomycetes, and Arthropoda), bacterial Classes (including Clostridia, Bacteroidetes, Gammaproteobacteria, Epsilonproteobacteria, Alphaproteobacteria, and Spirochaetes), and archaeal Classes (including Thermoprotei, and Methanomicrobia) (Additional file 2). Since no Eukaryotes previously have been isolated at these temperatures , their viruses are unlikely to be members of these communities.
Correlation between classification techniques by Class
Discussion and conclusion
The exploration of microbial assemblages through microbiome genome analysis has provided insights into both community structure and physiology [1, 2], but also has revealed a greater need for advances in technology to identify community constituents without significant homology in Genbank. Viral metagenomic analysis currently is substantially less well developed than that of cellular populations, and is limited by a low proportion of viral sequences compared to cellular sequences. Genome signature analysis is independent of nucleotide or amino acid alignments, and predicts relationships based on separate principles from those of BLAST search algorithms .
We sought to create two separate databases based on oligonucleotide frequencies of all sequenced microbial cellular and viral genomes, respectively, to determine whether known viruses could be used for accurate prediction of host microbe or virus ancestry. Our data demonstrate that when longer nucleotide sequences are available, GSPC makes more accurate predictions of both host and viral ancestry (Figures 1 and 8). As the length of nucleotide sequences decreases, GSPC accuracy also decreases (Figures 1 and 8). The heterogeneity of oligonucleotide signatures across certain bacteriophage genomes may explain why individual bacteriophage fragments are not always representative of their viral or host genomes [19, 25].
Because previously sequenced viral metagenomes [13, 40] are comprised mostly of single reads or smaller contigs, they generally are not amenable to GSPC analysis. Bear Paw and Octopus metagenomes are less diverse and have larger contigs , thus providing a more suitable dataset for GSPC. The GSPC method predicts many of the hot spring metagenomic contigs as archaeal viruses and thermophilic Bacteria (Figure 2), a finding that is consistent with the environment from which they were recovered. When homologs to the metagenomic contigs could be identified in Genbank, the presumptive hosts were generally consistent with the findings of GSPC (Additional file 2 and Table 2). While there was some agreement in contig prediction between GSPC and Phylopythia, many of the contigs were predicted to be derived from dissimilar organisms. Because GSPC predicts the origin of many of the contigs to be consistent with the known flora of these hot springs, while Phylopythia predicts many to have eukaryotic origin, we believe GSPC may provide a more specific methodology for contigs from such extreme environments.
We chose to analyze the metagenomes of two separate hot springs, Bear Paw and Octopus in Yellowstone National Park. Their conditions at the surface differ, suggesting there may be differences between the microbial flora present in each environment. Genbank BLAST and GSPC based on a microbial database both predict the origin of many Bear Paw contigs to have bacterial origin, while the viral database suggests the contigs are from bacteriophages (Additional file 3). In contrast, Genbank BLAST and GSPC based on a microbial database predict many Octopus contigs to have archaeal origin, with the viral database indicating many contigs may belong to archaeal virus Families Fuselloviridae and Globuloviridae (Additional file 3). In support of this finding, a previous metagenomic study of these hot springs detected homologs to nearly the entire genome of Pyrobaculum spherical virus , a member of the archaeal virus Family Globuloviridae. Although geochemistry has a large influence on the microbial composition of hot springs, microbial populations are highly temperature dependent . We believe the bacterial predominance in Bear Paw hot spring compared to Octopus may be related to the lower temperature present in Bear Paw.
As greater numbers of viral communities are studied, new techniques for assessing metagenomic constituents are necessary. Previous studies of viral metagenomes have underscored the need for new techniques, as most of the available metagenomic sequences have limited detectable similarity to sequences in Genbank [9, 11–13]. GSPC provides an approach complementary to BLAST search algorithms, taking advantage of properties of DNA patterns of nucleotide usage rather than nucleotide alignments. While not applicable to most lower temperature viral metagenomes due to the limited size of typical contigs in most studies, GSPC will likely become more suitable for analysis of these environments as next generation sequencing platforms allow collection of much larger amounts of sequence data and assembly of larger contigs. This will substantially increase the sensitivity in viral metagenomic studies in both predicting the host and classifying the types of viruses in the community. GSPC is a facile approach for classifying viral metagenomic sequences and inferring host relationships and is a highly complementary alternative to traditional BLAST searches, particularly when those searches fail to identify significant homology.
Virus collection and sequencing
Samples were collected in October 2003 from both Bear Paw and Octopus hot springs in the lower geyser basin in Yellowstone National Park. Viral particles were isolated, and libraries were constructed and sequenced and sequences were assembled as described . Libraries from each hot spring were constructed using methods that select only for double stranded DNA viruses. We previously have based our minimum genome sequence length for analysis on the assumption that 95% of tetranucleotide combinations should occur at least 10 times [18, 25]. The minimum genome length analyzed in this study was 1.9 kb (3.8 kb when analyzing both strands), which represents an assumption that 95% of tetranucleotide combinations should occur at least 7.5 times. Approximately 19.3% of the Bear Paw metagenomic contigs and 39.0% of the Octopus metagenomic contigs conformed to these criteria. Since hundreds or thousands of viral types inhabit Bear Paw and Octopus hot springs , these contigs represent only the most abundant viral types. Both metagenomes are available from the NCBI trace archive using CENTER_NAME = "JGI" and SEQ_LIB_ID = "AOIX" for Bear Paw sequences and SEQ_LIB_ID = "APNO" and SEQ_LIB_ID = "ATYB" for Octopus sequences.
To determine oligonucleotide frequencies for genomes and metagenomic contigs, a Zero-Order Markov algorithm  was used, in which the expected number of oligonucleotides was determined by removing biases in mononucleotide frequencies, as determined by the equation: E(W) = [(Aa * Cc * Gg * Tt) * N], where A, C, G, and T represent the frequency of the four nucleotides within the window being evaluated, respectively, a, c, g, and t represent the number of nucleotides A, C, G, and T in each oligonucleotide, respectively, and N represents the length of the genome or contig being evaluated . The frequency of divergence for each oligonucleotide is expressed as the ratio of observed to expected, and were determined for each genome studied using Swaap Genome Search version 1.0.1 .
Microbial and viral databases
A database was constructed containing all di-, tri-, tetra-, penta-, and hexanucleotide frequencies for all fully sequenced bacterial and archaeal genomes available in the Genbank database (383 genomes stored in the database on 5-21-07). Including separate chromosomes for certain organisms, there were 440 separate entries in the microbial database. A separate database was constructed for all di-, tri-, tetra-, penta-, and hexanucleotide frequencies for all known fully sequenced viruses using the Genbank database (3866 genomes stored in the database on 8-10-07).
Genome signature-based phylogenetic classification
Genome signature-based phylogenetic classification (GSPC) was performed on individual metagenomic contigs, collective groups of metagenomic contigs, and viral fragments. Briefly, oligonucleotide frequencies were determined for all viral sequences, and Euclidean distances between each fragment and all frequencies in the databases were determined. Distances were determined by the equation: Dt = 1/NN * Σ|F1(W) - F2(W)|, where F1(W) and F2(W) represent F(W) for each of the oligonucleotides for any organisms or fragments 1 and 2, and N is the length of the oligonucleotide under evaluation [15, 16]. Bootstrapping was performed by sampling with replacement of each of the oligonucleotide frequencies, phylograms were created using neighbor-joining analysis based on the resulting distance matrices using Swaap Genome Search 1.0.1 , reviewed via Paup 4.0b10  or Treeview , and portions of phylogenies containing branches of interest were displayed using Corel Draw 11 (Corel Corp., Ottawa, Canada).
For the microbial database, contigs were classified based on their phylogenetic position, either monophyletic or paraphyletic. In cases where contigs were grouped monophyletically, they were classified based on the Kingdom, Phylum, Class, Order, Family, and Genus of that monophyletic group. When contigs were grouped paraphyletically, they were classified based on the Kingdom, Phylum, Class, Order, Family, and Genus of branches deep to that paraphyletic position. Example output of the sequence classification for the microbial database is demonstrated in Additional file 4. For the viral database, contigs were classified based on the DNA type, host type (bacterial or archaeal vs. eukaryal), viral type (Caudovirus vs. other), Family, and virus designation (e.g. T-7 like virus, etc...) based on the same principles as classification based on the microbial database.
Analysis of known viruses
Oligonucleotide frequencies for known complete and partial viral genomes were determined using Swaap Genome Search version 1.0.1 . A collection of 77 bacteriophages, for which hosts have been well described, were used for analysis of known viruses (Additional file 1). Each viral genome was assessed by GSPC using a microbial database, and results in accordance with their known hosts were determined. The percentage of viruses identified by Kingdom, Phylum, Class, Order, Family, and Genus of their known hosts were then determined.
For analysis of known viruses with the viral database, fragments rather than full-length viral genomes were used. Random bacteriophage and viral genomic fragments were generated because the viral database contains all known fully sequenced viruses, including the 77 bacteriophages used in our dataset. Random bacteriophage fragments of sizes 10,000 nucleotides, 5,000 nucleotides, and 2,000 nucleotides were generated using Swaap Genome Search 1.0.1 . Five random fragments for each specified size were generated for each genome, and each was subjected to GSPC using a viral database. The percentage of viruses classified according to DNA type, virus type (bacteriophage or archaeal virus vs. eukaryotic virus), viral type (Caudovirus vs. other phage type), viral Family, and viral designation (e.g. T7-like viruses etc...) were then determined. The standard error was determined based on the compilation of 5 separate experiments.
Other analysis of metagenomic contigs
All metagenomic contigs also were subjected to classification analysis using Phylopythia and Genbank tBLASTx analysis using the nonredundant database [14, 20]. Hits were considered significant if the Expect values were less than 10-3.
Spearman's rho correlation test was performed on metagenome contigs using SPSS (SPSS Corp., Chicago, IL). Briefly, metagenome contigs were classified using Genbank, GSPC, or Phylopythia. The results of each method were compiled using the predicted Class of each contig, and each Class was coded using numbers 1 to 41. The resulting tables were then subjected to Spearman's rho correlation test or Kendall tau's correlation test using SPSS (SPSS Corp., Chicago, IL). Results were considered significant when p < 0.01.
Genome Signature-based Phylogenetic Classification
Supported in part by the Burroughs Wellcome Fund, the Robert Wood Johnson Foundation and the UNCF-Merck Science Initiative to DP. Also supported by NSF Grants 0109756 and 0215988, and NIH-NHGRI grant 1 R43 HG002714-01 to TS.
- Gill SR: Metagenomic analysis of the human distal gut microbiome. Science. 2006, 312 (5778): 1355-9. 10.1126/science.1124234.PubMedPubMed CentralView ArticleGoogle Scholar
- Sonnenburg JL, Chen CT, Gordon JI: Genomic and metabolic studies of the impact of probiotics on a model gut symbiont and host. PLoS Biol. 2006, 4 (12): e413-10.1371/journal.pbio.0040413.PubMedPubMed CentralView ArticleGoogle Scholar
- Wommack KE, Colwell RR: Virioplankton: viruses in aquatic ecosystems. Microbiol Mol Biol Rev. 2000, 64 (1): 69-114. 10.1128/MMBR.64.1.69-114.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Breitbart M: Phage community dynamics in hot springs. Appl Environ Microbiol. 2004, 70 (3): 1633-40. 10.1128/AEM.70.3.1633-1640.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Canchaya C: Phage as agents of lateral gene transfer. Curr Opin Microbiol. 2003, 6 (4): 417-24. 10.1016/S1369-5274(03)00086-9.PubMedView ArticleGoogle Scholar
- Suttle CA: Marine viruses–major players in the global ecosystem. Nat Rev Microbiol. 2007, 5 (10): 801-12. 10.1038/nrmicro1750.PubMedView ArticleGoogle Scholar
- Weinbauer MG, Rassoulzadegan F: Are viruses driving microbial diversification and diversity?. Environ Microbiol. 2004, 6 (1): 1-11. 10.1046/j.1462-2920.2003.00539.x.PubMedView ArticleGoogle Scholar
- Filee J, Forterre P, Laurent J: The role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies. Res Microbiol. 2003, 154 (4): 237-43. 10.1016/S0923-2508(03)00066-4.PubMedView ArticleGoogle Scholar
- Breitbart M: Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci USA. 2002, 99 (22): 14250-5. 10.1073/pnas.202488399.PubMedPubMed CentralView ArticleGoogle Scholar
- Kunin V: A bacterial metapopulation adapts locally to phage predation despite global dispersal. Genome Res. 2008, 18 (2): 293-7. 10.1101/gr.6835308.PubMedPubMed CentralView ArticleGoogle Scholar
- Bench SR: Metagenomic characterization of Chesapeake Bay virioplankton. Appl Environ Microbiol. 2007, 73 (23): 7629-41. 10.1128/AEM.00938-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Schoenfeld T: Assembly of viral metagenomes from yellowstone hot springs. Appl Environ Microbiol. 2008, 74 (13): 4164-74. 10.1128/AEM.02598-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Breitbart M: Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003, 185 (20): 6220-3. 10.1128/JB.185.20.6220-6223.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-10.PubMedView ArticleGoogle Scholar
- Pride DT: Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res. 2003, 13 (2): 145-58. 10.1101/gr.335003.PubMedPubMed CentralView ArticleGoogle Scholar
- Karlin S, Cardon LR: Computational DNA sequence analysis. Annu Rev Microbiol. 1994, 48: 619-54. 10.1146/annurev.mi.48.100194.003155.PubMedView ArticleGoogle Scholar
- Burge C, Campbell AM, Karlin S: Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci USA. 1992, 89 (4): 1358-62. 10.1073/pnas.89.4.1358.PubMedPubMed CentralView ArticleGoogle Scholar
- Reva ON, Tummler B: Global features of sequences of bacterial chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage patterns. BMC Bioinformatics. 2004, 5: 90-10.1186/1471-2105-5-90.PubMedPubMed CentralView ArticleGoogle Scholar
- Pride DT, Blaser MJ: Identification of horizontally acquired genetic elements in Helicobacter pylori and other prokaryotes using oligonucleotide difference analysis. Genome Letters. 2002, 1 (1): 2-15. 10.1166/gl.2002.003.View ArticleGoogle Scholar
- McHardy AC: Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007, 4 (1): 63-72. 10.1038/nmeth976.PubMedView ArticleGoogle Scholar
- Buchen-Osmond C, ed: Manual of Clinical Microbiology. Taxonomy and Classification of Viruses. Edited by: Buchen-Osmond C. 2003, ASM Press: Washington DC, 2: 1217-1226. 8
- Lawrence JG, Hatfull GF, Hendrix RW: Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches. J Bacteriol. 2002, 184 (17): 4891-905. 10.1128/JB.184.17.4891-4905.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Snyder JC: Effects of culturing on the population structure of a hyperthermophilic virus. Microb Ecol. 2004, 48 (4): 561-6. 10.1007/s00248-004-0246-9.PubMedView ArticleGoogle Scholar
- Rohwer F, Edwards R: The Phage Proteomic Tree: a genome-based taxonomy for phage. J Bacteriol. 2002, 184 (16): 4529-35. 10.1128/JB.184.16.4529-4535.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Pride DT: Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics. 2006, 7: 8-10.1186/1471-2164-7-8.PubMedPubMed CentralView ArticleGoogle Scholar
- Brock TD, ed: Thermophilic Microorganisms and Life at High Temperatures. 1978, Springer-Verlag: New York
- Sakaki Y, Oshima T: Isolation and characterization of a bacteriophage infectious to an extreme thermophile, Thermus thermophilus HB8. J Virol. 1975, 15 (6): 1449-53.PubMedPubMed CentralGoogle Scholar
- Yu MX, Slater MR, Ackermann HW: Isolation and characterization of Thermus bacteriophages. Arch Virol. 2006, 151 (4): 663-79. 10.1007/s00705-005-0667-x.PubMedView ArticleGoogle Scholar
- Prangishvili D, Garrett RA, Koonin EV: Evolutionary genomics of archaeal viruses: unique viral genomes in the third domain of life. Virus Res. 2006, 117 (1): 52-67. 10.1016/j.virusres.2006.01.007.PubMedView ArticleGoogle Scholar
- Inskeep WP, McDermott TR, eds: Geothermal Biology and Geochemistry in Yellowstone National Park. Geochemistry and Dynamics of the Yellowstone National Park Hydrothermal System. Edited by: Fournier RO. 2005, Proceedings of the Thermal Biology Institute Workshop
- Reysenbach AL, Wickham GS, Pace NR: Phylogenetic analysis of the hyperthermophilic pink filament community in Octopus Spring, Yellowstone National Park. Appl Environ Microbiol. 1994, 60 (6): 2113-9.PubMedPubMed CentralGoogle Scholar
- Blank CE, Cady SL, Pace NR: Microbial composition of near-boiling silica-depositing thermal springs throughout Yellowstone National Park. Appl Environ Microbiol. 2002, 68 (10): 5123-35. 10.1128/AEM.68.10.5123-5135.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Jahnke LL: Signature lipids and stable carbon isotope analyses of Octopus Spring hyperthermophilic communities compared with those of Aquificales representatives. Appl Environ Microbiol. 2001, 67 (11): 5179-89. 10.1128/AEM.67.11.5179-5189.2001.PubMedPubMed CentralView ArticleGoogle Scholar
- Stahl DA: Characterization of a Yellowstone hot spring microbial community by 5S rRNA sequences. Appl Environ Microbiol. 1985, 49 (6): 1379-84.PubMedPubMed CentralGoogle Scholar
- Munster MJ: Isolation and preliminary taxonomic studies of Thermus strains isolated from Yellowstone National Park, USA. J Gen Microbiol. 1986, 132 (6): 1677-83.PubMedGoogle Scholar
- Reysenbach AL, Gotz D, Yernool D, eds: Microbial Diversity of Marine and Terrestrial Thermal Springs. Biodiversity of Microbial Life. Edited by: Reysenbach AL, Staley JT. 2002, Wiley Liss New York
- Friedman R, Drake JW, Hughes AL: Genome-wide patterns of nucleotide substitution reveal stringent functional constraints on the protein sequences of thermophiles. Genetics. 2004, 167 (3): 1507-12. 10.1534/genetics.104.026344.PubMedPubMed CentralView ArticleGoogle Scholar
- Singer GA, Hickey DA: Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene. 2003, 317 (1–2): 39-47. 10.1016/S0378-1119(03)00660-7.PubMedView ArticleGoogle Scholar
- Haring M: Morphology and genome organization of the virus PSV of the hyperthermophilic archaeal genera Pyrobaculum and Thermoproteus: a novel virus family, the Globuloviridae. Virology. 2004, 323 (2): 233-42. 10.1016/j.virol.2004.03.002.PubMedView ArticleGoogle Scholar
- Angly FE: The marine viromes of four oceanic regions. PLoS Biol. 2006, 4 (11): e368-10.1371/journal.pbio.0040368.PubMedPubMed CentralView ArticleGoogle Scholar
- Almagor H: A Markov analysis of DNA sequences. J Theor Biol. 1983, 104 (4): 633-45. 10.1016/0022-5193(83)90251-5.PubMedView ArticleGoogle Scholar
- Pride DT: Swaap Genome Search. A tool for predicting prokaryote hosts of bacteriophages and discerning virus types from metagenome data. Edited by: Pride DT. 2007, [http://asiago.stanford.edu/SWAAP/SwaapPage.htm]Google Scholar
- Swofford DL: Paup 4.0b10. Phylogenetic Analysis Using Parsimony and Other Methods. Edited by: Swofford DL. 1998, Sinauer Associates: Sunderland, MassachussetsGoogle Scholar
- Page RD: TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996, 12 (4): 357-8.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.