Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes
© Gori et al.; licensee BioMed Central Ltd. 2013
Received: 19 March 2012
Accepted: 13 December 2012
Published: 16 January 2013
Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related to a considered organism of interest in a metagenome, and whether it is beneficial to combine information obtained using different technologies. We analyze comparatively three metagenomic datasets acquired from a sample containing the anammox bacterium Candidatus ’Brocadia fulgida’ (B. fulgida). These datasets were obtained using Roche 454 FLX and Sanger sequencing with two different libraries (shotgun and fosmid).
In each dataset, the abundance of the reads annotated to B. fulgida was much lower than the abundance expected from available cell count information. This was due to the overrepresentation of GC-richer organisms, as shown by GC-content distribution of the reads. Nevertheless, by considering the union of B. fulgida reads over the three datasets, the number of B. fulgida ORFs recovered for at least 80% of their length was twice the amount recovered by the best technology. Indeed, while taxonomic distributions of reads in the three datasets were similar, the respective sets of B. fulgida ORFs recovered for a large part of their length were highly different, and depth of coverage patterns of 454 and Sanger were dissimilar.
Precautions should be sought in order to prevent the overrepresentation of GC-rich microbes in the datasets. This overrepresentation and the consistency of the taxonomic distributions of reads obtained with different sequencing technologies suggests that, in general, abundance biases might be mainly due to other steps of the sequencing protocols. Results show that biases against organisms of interest could be compensated combining different sequencing technologies, due to the differences of their genome-level sequencing biases even if the species was present in not very different abundances in the metagenomes.
Metagenomics studies the genomic content of microbial communities, acquired through DNA sequencing technology . The main advantage of this discipline is that it can overcome the limitations of individual genome sequencing, which requires isolation and cultivation of individual microbes. Bypassing the cultivation step, metagenomics is able to acquire microbial genomes unattainable through individual sequencing, since less than 1% of the microbes present in nature can be cultured .
Previous study showed that the sequencing technologies have different biases, in acquiring the DNA sequences of a microbial community and of a single organism. Indeed, biases in population distribution of a metagenome may differ according to the approach adopted to obtain sequence data . Moreover, there is the possibility that key members of a community might be poorly represented in sequenced data . From single DNA sample study, it was shown that different technologies can also have different biases in sequencing and hence different coverage patterns of the same sequence of an organism . Even sequencing errors and artifacts depend on the technology .
Here we focus on the comparative analysis of metagenomic sequencing data: we investigate how well different technologies represent information related to a considered organism of interest, and whether it is beneficial to combine information obtained using different technologies. The chosen microbe, Candidatus ‘Brocadia fulgida’, belongs to the important bacterial group of the anammox bacteria. Anaerobic ammonium oxidizing (anammox) bacteria obtain energy via oxidation of ammonium to dinitrogen gas in the absence of oxygen . They belong to the order Brocadiales within the phylum Planctomycetes[8–10]. Many studies in the last decade showed that anammox bacteria are present in many oxygen-limited marine and fresh-water ecosystems, and the process contributes significantly to the global loss of fixed nitrogen [11–15]. Moreover, the anammox process has been applied successfully as an environmentally friendly and cost-effective alternative to conventional wastewater-treatment plants [16, 17].
The choice of an anammox bacterium as the organism of interest is motivated by the lack of genomic information for this bacterial group, due also to the difficulty of acquiring it. Among the candidate genera of anammox bacteria that have been identified [10, 18, 19], detailed genomic information is available only for Candidatus ‘Kuenenia stuttgartiensis’  (henceforth referred as Kuenenia). Indeed, standard sequencing approaches cannot be applied to acquire the genomes of these bacteria: the cultivation of anammox bacteria is challenging due to their long generation times (2-3 weeks) and low biomass yields [18, 21]; moreover, no anammox species have been isolated in pure cultures up to now . Therefore metagenomics has been used for acquiring the genomic content of anammox bacteria .
We used the genomic information of the anammox bacterium Candidatus ‘Brocadia fulgida’ (henceforth referred as B. fulgida) as a model for comparing three single-technology approaches and the multi-technology resulting from their combination. Metagenomic data containing this bacterium were acquired through three metagenomic sequencing projects conducted on the same microbial community . These metagenomes were generated by the following DNA sequencing technologies: Roche 454 FLX, Sanger sequencing with shotgun library [24, 25], and Sanger sequencing with Fosmid library  (henceforth, we refer to these technologies as 454, Shotgun and Fosmid, respectively). We reported earlier a qualitative analysis of these metagenomes focused on anammox metabolic genes .
First we studied the metagenomes with respect to their taxonomic population distributions and the GC-content of the reads. Then we analyzed comparatively the sets of B. fulgida ORFs that were recovered by the different sequencing technologies; the recovered ORFs were compared with respect to the coverage pattern, and the percentage of covered amino acids (here called mapping). We also studied the ORFs with respect to their functional content and their location on the genome.
Results and discussion
Taxonomic annotation and GC-content analysis of annotated reads
Reads assigned to B. fulgida had low GC-content, consistently with their annotation. Nevertheless, a possible hypothesis is that other AT-rich reads belonging to B. fulgida were wrongly assigned by BLASTX to other species. However, less than 1.50% of the reads were assigned to other bacteria belonging to B. fulgida’s phylum - Planctomycetes. Moreover the population distributions obtained from different sequencing technologies were very similar; therefore, this hypothesis would require a significant difference in ORFs composition between B. fulgida and the other Planctomycetes, Kuenenia included. For each technology, the GC-content of the reads assigned to B. fulgida roughly followed a normal distribution, centered between 45% and 48%. This result is in accordance with the expected GC-content of B. fulgida, estimated to be close to 41%, that is, Kuenenia’s GC-content. However, from 42% to 50% of the reads had GC-content below 55%; since the corresponding distribution was centered between 38% and 50% of GC-content, there were other reads of this distribution with a GC-content compatible with B. fulgida.
In summary, these results show that GC-rich bacteria were overrepresented in the metagenomic data, for all the considered sequencing technologies. This indicates that adjustments of sequencing protocols are desirable in order to prevent overrepresentation of these microbes in the data at the expense of AT-rich B. fulgida. This bias toward GC-rich organisms might depend on DNA-fragmentation procedure, as speculated in literature . Coherency of the three population distributions obtained is consistent with the hypothesis that they are biased because of the shared DNA-extraction method . Nevertheless, one cannot exclude that other steps of the sequencing protocol could as well contribute to these phenomena.
Comparative analysis of recovered B. fulgida ORFs
Comparing the sets of recovered ORFs for different mapping thresholds, we can see that the higher the threshold was, the more the technology biases diverged (see Additional file 1: Section 4). Indeed, the higher the mapping threshold was, the smaller the intersections between sets of ORFs recovered with a feasible mapping by different technologies became (Figure 3, Additional file 1: Table S5). This trend was particularly clear for 454 and it affected its intersections with Fosmid and with Shotgun in the same way. For threshold value equal to 0%, 454 recovered about 90% of each of the sets of ORFs recovered by another technology; for a mapping thresholds of 50% and 80%, this percentage dropped to about 55% and 14%, respectively. The number of recovered ORFs that were shared by Shotgun and Fosmid decreased as well, but at a lower rate. While for a mapping threshold of 0% these two technologies shared about 70% of their recovered ORFs, for mapping thresholds of 50% and 80%, this percentage dropped to about 59% and 38%, respectively.
The coverage variability obtained with different technologies were compared using Pearson correlation coefficient. The correlation analysis of the per-amino acid sequence coverage depths performed on each B. fulgida ORF recovered by a pair of technologies indicated that the Sanger-based technologies and 454 coverage patterns were not related (Additional file 1: Figure S2 and Section 3). Indeed, for more than 50% of the ORFs recovered by 454 and Shotgun/Fosmid, the correlation was between -0.3 and 0.3, and hence not significant. On the contrary, there was a significantly positive correlation (above 0.3) for about half of the ORFs recovered by both Shotgun and Fosmid. This indicates that the coverage depths obtained with the two technologies increased or decreased together for the same ORF.
Comparative analysis of functional content and ORF location distribution
Functional content distributions based on COG classification did not show significant differences across technologies (Additional file 1: Figure S3). For all the technologies, the most abundant characterized category was COG category C (Energy production and conversion). All the categories related to Information storage and processing (A, J, K, L) were equally abundant. The only category for which there were significant differences was T (Signal transduction mechanisms), that was present in a percentage of less than 2% for 454, and around 6% for the other two technologies.
The location distribution of the recovered ORFs on the putative B. fulgida genome was quite uniform (Additional file 1: Figure S4). However, some areas of the genome had a lower coverage depth than the others, and these biases were consistent among different sequencing technologies (Additional file 1: Section 6).
Anyway, these two analyses could be affected more than the others by a potential loss of B. fulgida genomic information resulting from the adopted annotation method. Indeed, since B. fulgida proteins had not previously been described, we assumed that all reads assigned to the related anammox bacterium Kuenenia and all recovered Kuenenia ORFs belonged to B. fulgida. However, given that the two anammox bacteria are phylogenetically related but not very closely for being two microbes of the same genera [7, 32], it might be possible that B. fulgida contains ORFs not present in Kuenenia. Hence, if these B. fulgida ORFs existed, they would not be recovered by our method; in particular, the functional content and the genome location biases would be different from what we found. Nevertheless, as mentioned before, few reads were assigned to other members of B. fulgida’s phylum. Recovering B. fulgida information not present in Kuenenia through a de novo assembly of the metagenomes can lead to unreliable results, given that the coverage is below 20X .
Anammox bacteria are present in many ecosystems and have important applications in industrial wastewater-treatment. However, genomic information about these bacteria is still very limited. We analyzed the genomic information of the anammox bacterium B. fulgida contained in three metagenomes; the metagenomes were acquired from the same community but with different sequencing technologies.
Our analysis indicates that adjustments of sequencing protocols are desirable in order to prevent underrepresentation of B. fulgida in the data. This underrepresentation does not seem to be related to a genome location sequencing bias. Sequenced data alone would have given a distorted view of population distributions in the studied community, as observed for other metagenomes . The adoption of PacBio  platform could be beneficial for B. fulgida genome acquisition, because it seems less biased by GC content.
The population distributions of the three metagenomes were not very dissimilar, despite different sequencing technologies were adopted. This phenomenon is compatible with the hypothesis that DNA-extraction method contributes more to the bias in the population distributions than the sequencing technology . However, one cannot exclude that other steps of the sequencing protocol could as well contribute to the bias; indeed, DNA-fragmentation procedure might have induced the bias toward GC-rich microbes . Nevertheless, our metagenomic data did not allow to directly confirm any of these hypotheses, because the three protocols differ only from the library preparation step onward.
Our results show that the combination of data obtained by different sequencing technologies can allow to recover relevant information of underrepresented organisms. Indeed, even if different technologies recover a microbe in similar abundance, they could do it with significantly different genome-level biases. In our case, technologies coverage patterns revealed to be unrelated for many B. fulgida ORFs; moreover, the sets of ORFs recovered by the technologies for a large part of their lengths were vastly different.
Metagenome sequencing was performed on three sequencing libraries made from the same DNA sample from the freshwater propionate enrichment described previously [23, 27]. Sixty 384-well plates of clones were end sequenced from a 3 kb short-insert Sanger library constructed in pUC18 (henceforth referred as Shotgun), and 62 plates of clones from a 40 kb Fosmid library constructed in pCC1Fos (for detailed library construction and sequencing protocols see ). This procedure generated a total of 34 Mb and 30 Mb raw data respectively. A 454 library was also constructed and sequenced on the FLX platform, yielding 59 Mb from 1.25 runs. Raw sequence reads were trimmed with LUCY . The sequences we analyzed are available in DOE JGI Genome sequencing projects database under the name of ’Freshwater-Propionate Anammox bacterial enrichment’, Project ID: 4083784.
Although the size of these data is not very large (Additional file 1: Table S1), it is sufficient for the type of comparative study conducted in this paper. Indeed, data of comparable size were studied in a previous work on the comparative analysis of data generated with different technologies from the same microbial community .
With respect to length distribution of reads, a strong similarity between the data acquired by Shotgun and Fosmid could be observed (Additional file 1: Figure S1 and Table S1). The main difference between these two datasets concerned the number of reads they contained: Shotgun acquired about 23% more reads than Fosmid. However, the average length of Shotgun reads was 8% greater than the one of Fosmid. As expected, 454 produced significantly shorter reads than Sanger, but at a higher throughput. The median length of 454 reads was 182bp, about one fourth of the respective value of the other two datasets. The number of reads of 454 was sixfold and fivefold the number of reads of Shotgun and Fosmid, respectively.
All reads of the considered datasets were submitted as NCBI-BLASTX  queries against the NCBI-NR protein sequence database (version of 3 March 2009) . Default BLASTX parameters were used, adding an E-value cutoff and a neighborhood word score threshold. Since we wanted to focus only on highly significant alignments, low E-value cutoff values were chosen. Specifically, for Sanger-based technologies E-value cutoff was set to 10-6. As the 454 reads were shorter and the E-value of an alignment is directly proportional to the product of the lengths of the two aligned parts, we used for 454 read alignments an E-value cutoff of 10-7. The word score threshold was set to 14 (default value is 12), in order to increase the speed more than twofold while maintaining a high sensitiveness (see , Paragraph 188.8.131.52).
Annotation of reads was based on BLASTX results, adopting what is considered the best stand-alone method : each read was assigned to its best BLASTX hit, at protein and hence at species level. Since B. fulgida had not yet been sequenced, its reads could be assigned by BLASTX only to proteins of other organisms present in the reference database. Nevertheless, the reference database we used contained ORFs of another related anammox bacterium, namely Kuenenia. Therefore in our analysis we considered all recovered Kuenenia ORFs and all reads assigned to these ORFs as belonging to B. fulgida.
ORF recovering: assessment criteria
We used two main quantitative measures to assess the performances of the three technologies with respect to their capability to recover B. fulgida ORFs: per-amino acid sequence coverage depth and mapping.
The per-amino acid sequence coverage depth quantifies how well B. fulgida ORFs were covered at the amino-acid level by the reads generated by a technology. Specifically, for a technology and an ORF, we considered the reads (generated by that technology) aligned with BLASTX to a particular ORF; the per-amino acid sequence coverage depth of an amino acid of that ORF is defined as the number of times that the given amino acid of the subject ORF was covered by the assigned reads. We considered as covered all the amino acids between the start and the end of a read-ORF alignment. Consequently, if an alignment had gaps, the corresponding amino acids of the ORF were considered covered as well.
The notion of mapping measures the part of a B. fulgida ORF that can be recovered by the reads generated by a technology. Specifically, the mapping is defined as the percentage of the ORF’s amino acids that were covered (i.e. percentage of amino acids with coverage depth ≥1). Clearly, the mapping can be directly computed from the per-amino acid sequence coverage depths.
For computing the per-amino acid sequence coverage depths and the mapping of ORFs, we considered only those alignments having an identity score greater of equal than 30%. This additional filtering criterion had a very small effect on the recovering performance of each technology (see Additional file 1: Tables S3 and S4).
ORF Recovering: Comparison Methods
The coverage variability obtained with different technologies were compared using Pearson correlation coefficient. Given two technologies, we considered all the B. fulgida ORFs recovered by both; then we computed the correlation of the per-amino acid sequence coverage depths obtained by the two technologies for the same ORF. A similar method for comparing the coverage variability was used in a previous work .
We also performed a comparative analysis of the sets of B. fulgida ORFs recovered by different technologies. For each technology, we computed the sets of ORFs with mapping above a given threshold; 10 different thresholds were used (0% and all the multiples of 10%).
The sets of B. fulgida ORFs recovered by different technologies were also compared with respect to their functional annotation. For each technology, we focused our analysis on the ORFs mapped for at least 70% of their length because we assumed that if an ORF was mapped for such a large part of its length, then all its protein domains could be considered as present in the B. fulgida genome. These ORFs were assigned to Clusters or Orthologous Groups of proteins (COG) [41, 42] using the Signature web server introduced in .
We assessed the improvement achieved by combining different technologies, for pairwise combinations of technologies as well as for the union of all of them. To this end we estimated the resulting B. fulgida ORF mapping derived from each technology combination, where an amino acid of the ORF was considered to be covered by a certain combination of technologies if it was covered by at least one of them. Moreover, for each combination of technologies, we computed the sets of B. fulgida ORFs with mapping above a given threshold, by varying this threshold as described above.
We performed an analysis to check if sequencing technologies had some location bias in sequencing, i.e., we wanted to examine if some areas of the genome were more covered than others. To this end, we built an approximate representation of B. fulgida genome and compared the per-amino acid sequence coverage of the genome obtained with different technologies. The approximate genome was obtained concatenating all Kuenenia ORFs in one long amino acid sequence; the ORFs amino acid sequences were concatenated in the same order they are present in the genome of Kuenenia. Then, from the ORFs coverage, we computed the per-amino acid coverage of the genome for each sequencing technology.
The anammox research of MJ was supported by ERC Advanced Grant 232937. The work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The sequencing was done through the Community Sequencing Program. We also thank Kerrie Barry (JGI project manager) for managing library construction and Alex Copeland (JGI analyst) for assistance with raw reads trimming.
- Wooley JC, Godzik A, Friedberg I: A Primer on Metagenomics. PLoS Comput Biol. 2010, 6 (2): e1000667-10.1371/journal.pcbi.1000667.PubMed CentralView ArticlePubMedGoogle Scholar
- Amann R, Ludwig W, Schleifer K: Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. 1995, 59: 143-169.PubMed CentralPubMedGoogle Scholar
- Morgan JL, Darling AE, Eisen JA: Metagenomic Sequencing of an In Vitro-Simulated Microbial Community. PLoS ONE. 2010, 5 (4): e10209+PubMed CentralView ArticlePubMedGoogle Scholar
- DeLong E, Preston C, Mincer T, Rich V, Hallam S, Frigaard N: Community genomics among stratified microbial assemblages in the ocean’s interior. Science. 2006, 311 (5760): 496-503. 10.1126/science.1120250.View ArticlePubMedGoogle Scholar
- Harismendy O, Ng P, Strausberg R, Wang X, Stockwell T, Beeson K: Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009, 10 (3): R32-10.1186/gb-2009-10-3-r32.PubMed CentralView ArticlePubMedGoogle Scholar
- Luo C, Tsementzi D, Kyrpides N, Read T, Konstantinidis KT: Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample. PLoS ONE. 2012, 7 (2): e30087-10.1371/journal.pone.0030087.PubMed CentralView ArticlePubMedGoogle Scholar
- Francis CA, Beman JM, Kuypers MMM: New processes and players in the nitrogen cycle: the microbial ecology of anaerobic and archaeal ammonia oxidation. ISME J. 2007, 1: 19-27. 10.1038/ismej.2007.8.View ArticlePubMedGoogle Scholar
- Broda E: Two kinds of lithotrophs missing in nature. Zeitschrift für Allgemeine Mikrobiologie. 1977, 17 (6): 491-493. 10.1002/jobm.3630170611.View ArticlePubMedGoogle Scholar
- Strous M, Fuerst JA, Kramer EH, Logemann S, Muyzer G, van de Pas-Schoonen KT: Missing lithotroph identified as new planctomycete. Nature. 1999, 400 (6743): 446-449. 10.1038/22749.View ArticlePubMedGoogle Scholar
- Jetten MSM, Op den Camp HJM, Kuenen J, Strous M: Family I. “Candidatus Brocadiaceae” fam. nov. Bergey’s Manual of Systematic Bacteriology, Volume 4. Edited by: Krieg N, Staley J, Brown D, Hedlund B, Paster B, Ward N, Ludwig W, Whitman W. 2010, New york: Springer, 596-602.Google Scholar
- Kuypers MM, Sliekers AO, Lavik G, Schmid M, Jørgensen BB, Kuenen JG: Anaerobic ammonium oxidation by anammox bacteria in the Black Sea. Nature. 2003, 422 (6932): 608-611. 10.1038/nature01472.View ArticlePubMedGoogle Scholar
- Kuypers MM, Lavik G, Woebken D, Schmid M, Fuchs BM, Amann R: Massive nitrogen loss from the Benguela upwelling system through anaerobic ammonium oxidation. PNAS. 2005, 102 (18): 6478-6483. 10.1073/pnas.0502088102.PubMed CentralView ArticlePubMedGoogle Scholar
- Hamersley MR, Lavik G, Woebken D, Rattray JE, Lam P, Den Burg AB: Anaerobic ammonium oxidation in the Peruvian oxygen minimum zone. Limonology And Oceanography. 2007, 52: 923-933. 10.4319/lo.2007.52.3.0923.View ArticleGoogle Scholar
- Jaeschke A, Hopmans EC, Wakeham SG, Schouten S, Sinninghe Damsté JS: The presence of ladderane lipids in the oxygen minimum zone of the Arabian Sea indicates nitrogen loss through anammox. Limonology And Oceanography. 2007, 52: 780-786. 10.4319/lo.2007.52.2.0780.View ArticleGoogle Scholar
- Schmid MC, Risgaard-Petersen N, Van De Vossenberg J, Kuypers MMM, Lavik G, Petersen J: Anaerobic ammonium-oxidizing bacteria in marine environments: widespread occurrence but low diversity. Environ Microbiol. 2007, 9 (6): 1476-1484. 10.1111/j.1462-2920.2007.01266.x.View ArticlePubMedGoogle Scholar
- Jetten MSM, Horn SJ, van Loosdrecht MCM: Towards a more sustainable municipal wastewater treatment system. Water Sci Technol. 1997, 35 (9): 171-180. 10.1016/S0273-1223(97)00195-9.View ArticleGoogle Scholar
- Kartal B, Kuenen JG, van Loosdrecht MCM: Sewage Treatment with Anammox. Science. 2010, 328 (5979): 702-703. 10.1126/science.1185941.View ArticlePubMedGoogle Scholar
- Kartal B, Rattray J, van Niftrik LA, van de Vossenberg J, Schmid MC, Webb RI, Schouten S, Fuerst JA, Sinninghe Damsté J, Jetten MSM, Strous M: Candidatus ‘Anammoxoglobus propionicus’ a new propionate oxidizing species of anaerobic ammonium oxidizing bacteria. Syst Appl Microbiol. 2007, 30: 39-49. 10.1016/j.syapm.2006.03.004.View ArticlePubMedGoogle Scholar
- Quan ZX, Rhee SK, Zuo JE, Yang Y, Bae JW, Park JR: Diversity of ammonium-oxidizing bacteria in a granular sludge anaerobic ammonium-oxidizing (anammox) reactor. Environ Microbiol. 2008, 10 (11): 3130-3139. 10.1111/j.1462-2920.2008.01642.x.View ArticlePubMedGoogle Scholar
- Strous M, Pelletier E, Mangenot S, Rattei T, Lehner A, Taylor MW: Deciphering the evolution and metabolism of an anammox bacterium from a community genome. Nature. 2006, 440 (7085): 790-794. 10.1038/nature04647.View ArticlePubMedGoogle Scholar
- Strous M, Heijnen JJ, Kuenen JG, Jetten MSM: The sequencing batch reactor as a powerful tool for the study of slowly growing anaerobic ammonium-oxidizing microorganisms. Appl Microbiol Biotechnol. 1998, 50 (5): 589-596. 10.1007/s002530051340.View ArticleGoogle Scholar
- Kartal B, van Niftrik L, Keltjens JT, Op den Camp HJM, Jetten MSM: Anammox-Growth Physiology, Cell Biology, and Metabolism. Applied Microbiology and Biotechnology, Volume 60 of Advances in Microbial Physiology. 2012, Waltham, Massachusetts: Academic Press, 211-262.View ArticleGoogle Scholar
- Kartal B, Rattray J, Van De Vossenberg JL, Schmid MC, Sinninghe Damsté J, Van Niftrik L: Candidatus Brocadia fulgida: an autofluorescent anaerobic ammonium oxidizing bacterium. FEMS Microbiol Ecol. 2008, 63: 46-55. 10.1111/j.1574-6941.2007.00408.x.View ArticlePubMedGoogle Scholar
- Messing J, Crea R, Seeburg PH: A system for shotgun DNA sequencing. Nucleic Acids Res. 1981, 9 (2): 309-321. 10.1093/nar/9.2.309.PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson S: Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res. 1981, 9 (13): 3015-3027. 10.1093/nar/9.13.3015.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim UJ, Shizuya H, de Jong PJ, Birren B, Simon MI: Stable propagation of cosmid sized human DNA inserts in an F factor based vector. Nucleic Acids Res. 1992, 20 (5): 1083-1085. 10.1093/nar/20.5.1083.PubMed CentralView ArticlePubMedGoogle Scholar
- Gori F, Green ST, Kartal B, Marchiori E, Jetten MSM: The metagenomic basis of anammox metabolism in Candidatus ‘Brocadia fulgida’. Biochem Soc Trans. 2011, 39: 1799-1804. 10.1042/BST20110707.View ArticlePubMedGoogle Scholar
- Bernaola-Galván P, Oliver J, Carpena P, Clay O, Bernardi G: Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes. Gene. 2004, 333: 121-133.View ArticlePubMedGoogle Scholar
- Bohlin J, Snipen L, Hardy S, Kristoffersen A, Lagesen K, Donsvik T: Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics. 2010, 11: 464-10.1186/1471-2164-11-464.PubMed CentralView ArticlePubMedGoogle Scholar
- Temperton B, Field D, Oliver A, Tiwari B, Muhling M, Joint I: Bias in assessments of marine microbial biodiversity in fosmid libraries as evaluated by pyrosequencing. ISME J. 2009, 3 (7): 792-796. 10.1038/ismej.2009.32.View ArticlePubMedGoogle Scholar
- Kestler HA, Müller A, Gress TM, Buchholz M: Generalized Venn diagrams: a new method of visualizing complex genetic set relations. Bioinformatics. 2005, 21 (8): 1592-1595. 10.1093/bioinformatics/bti169.View ArticlePubMedGoogle Scholar
- Harhangi HR, Le Roy M, van Alen T, Bl Hu, Groen J, Kartal B, Tringe SG, Quan ZX, Jetten MSM, Op den Camp HJM: Hydrazine Synthase, a Unique Phylomarker with Which To Study the Presence and Biodiversity of Anammox Bacteria. Appl Environ Microbiol. 2012, 78 (3): 752-758. 10.1128/AEM.07113-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Luo C, Tsementzi D, Kyrpides NC, Konstantinidis KT: Individual genome assembly from complex community short-read metagenomic datasets. ISME J. 2012, 6 (4): 898-901. 10.1038/ismej.2011.147.PubMed CentralView ArticlePubMedGoogle Scholar
- Pacific Biosciences. [http://www.pacificbiosciences.com/]
- JGI - Protocols in Production Sequencing. [http://www.jgi.doe.gov/sequencing/protocols/prots_production.html]
- Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.View ArticlePubMedGoogle Scholar
- Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009, 37 (suppl 1): D5—D15PubMed CentralPubMedGoogle Scholar
- Korf I, Yandell M, Bedell J: BLAST. 2003, Sebastopol, CA, USA: O’Reilly & Associates, IncGoogle Scholar
- Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods. 2009, 6 (9): 673-676. 10.1038/nmeth.1358.PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov RL, Koonin EV, Lipman DJ: A Genomic Perspective on Protein Families. Science. 1997, 278 (5338): 631-637. 10.1126/science.278.5338.631.View ArticlePubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41+-10.1186/1471-2105-4-41.PubMed CentralView ArticlePubMedGoogle Scholar
- Dutilh BE, He Y, Hekkelman ML, Huynen MA: Signature, a web server for taxonomic characterization of sequence samples using signature genes. Nucleic Acids Res. 2008, 36 (suppl 2): W470—W474PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.