Whole-genome sequencing has profoundly impacted the field of prokaryotic genetics since its first demonstration. Almost all of the economically and medically important microbes have had at least one representative with their genome sequenced. This achievement was first seen as the main goal of bacterial genomics, but is now strongly challenged by two observations. First, most microbial diversity is represented by uncultivated organisms, so the genomes sequenced today only represent a small fraction of the microbial gene space. Second, the variability between members of the same bacterial "species" can be very high in terms of gene content[2, 3]. Therefore, the definition of the proteome for a defined taxon may necessitate the sequencing of numerous related genomes. New technologies are therefore needed to sequence a larger amount of prokaryotic genomes than previously thought. A number of new methods have reached the commercialization stage in the last few years. They are based on principles that are different from dideoxy termination and electrophoretic separations, as in the Sanger method[4, 5]. As such, they display different error rates and types, and produce assemblies with different characteristics. The most commonly used method, that make use of highly parallelized pyrosequencing, has an inherently higher error rate around tracts of mononucleotides[7, 8]. This translates into higher insertion-deletion errors in assembly consensus, and in-frame stop codons in genes.
For de novo sequencing, these technologies have two main drawbacks beyond the sequencing error issue. First, they have been developed in the framework of the resequencing of the human genome, and thus produce mostly short reads that are useful for detecting substitution polymorphisms against a reference genome, but are more difficult to use for de novo assembly of a new genome. Second, their initial implementation permitted only un-paired sequences. The presence of links between two reads is a major element for de novo sequencing, enabling both the linkage of different contigs separated by a sequence gap, and the construction of robust contigs by detection of assembly problems due to repeated elements. For these reasons, the accuracy and continuity of assemblies obtained with new sequencing technology data were lower than those traditionally obtained with the Sanger approach. Recent improvements of the new technologies brought the promise of a better final product for WGS projects. Here, we evaluated how assemblies made with such improvements compare with assemblies produced with Sanger data, and how a mix of Roche/454 and Solexa/Illumina technologies performed in whole-genome sequencing of a reference bacterial genome.