This study is one of the first to use Applied Biosystems SOLiD sequencing for genomic sequencing of bacteria. Whole genome analysis has progressed considerably since the publication of the first complete DNA sequence of the pathogenic bacterium Haemophilus influenzae . Until recently, the wealth of complete genomes available in public databases was decoded via the large-scale industrialization of the Sanger dideoxy chain-termination sequencing method [45, 46]. The prospect of quickly and inexpensively resequencing large segments of the human genome or whole genomes of populations or species is driving development of a new generation of sequencing technologies with impacts in microbiology, functional genomics, ecology and evolutionary biology, human health, and beyond [45, 47–51]. In particular, bacterial sequencing has been advanced by the high throughput, parallel format of the 454 Sequencer , the first 'next-generation' technology to de novo sequence and assemble whole bacterial genomes including Mycoplasma genitalium in a single machine run . Bacterial comparative genomics has expanded rapidly owing to the speed of the 454 Sequencer compared to Sanger sequencing  as well as from a combination of the two technologies , while assessment of microbial diversity from complex communities (metagenomics)  has revealed insights into complex interactions such as mammalian obesity and the microbiome [56, 57], the ocean biosphere , and the role of microbes in colony collapse disorder in honeybees .
More recently released 'second-generation' sequencing technologies such the Illumina GA2X Genome Analyzer (GA) and ABI SOLiD system have been developed . To date, these next generation sequencing technologies generate shorter read lengths than Sanger sequencing, which poses a difficulty for de novo sequence assembly and defining large chromosomal rearrangements [49, 51]. So far in prokaryotes, high quality draft sequences have been assembled in the absence of Sanger sequencing by combing the 454 and GA technologies [60–64]. Studholme et al.  utilized the Illumina platform alone for the de novo assembly of the draft genome sequence of Pseudomonas syringae pathovar tabaci strain 11528, revealing insights into the nature of type III protein-mediated pathogenicity.
The improved throughput from the massively parallel format of the new platforms (billions of bases in a single run) is ideal for revealing patterns of genetic variation among individuals by resequencing. For example, Srivatsan et al.  employed Illumina sequencing to improve the existing draft of the extensively studied model bacterium Bacillus subtilis, while also identifying polymorphisms between other well studied laboratory strains and their isolates. Moreover, this method was sensitive enough to identify typically difficult to isolate suppressor mutations in a single strain . Using the same platform, whole-genome analysis of 12 isolates of the monomorphic human pathogen Salmonella enterica serovar Typhi revealed evolutionary loss of gene function consistent with the effects of genetic drift on a small effective population size . Resequencing of the Caernohabditis elegans N2 Bristol strain and SNP discovery in another strain demonstrated the effectiveness of this technology in eukaryotes , and single base mutations in a mutant C. elegans strain were mapped, avoiding traditional genetic mapping efforts .
As one of the newer second-generation sequencers currently available, (although 'third-generation' single molecule sequencers are set to be marketed in 2010), the ABI SOLiD platform has been used more with eukaryotes than prokaryotes. One of the first studies focused on assessing cross-platform performance for sequence detection of known mutations in C. elegans. Comparable accuracy between GA and SOLiD for mapping the same C. elegans mutant strain as Sarin et al.  was reported . Similarly, comparable accuracy was reported between 454, GA, and SOLiD methods for comparing a mutant strain of yeast to a reference genome . At present the utility of the SOLiD platform is reflected in several resequencing studies in humans, including haplotype analysis, breakpoint mapping in disease-associated chromosomal rearrangements, and polymorphism discovery in protein coding exons [72–74]. With bacteria, SOLiD sequencing has been limited to verifying an E. coli reference strain sequence in conjunction with traditional sequencing , as well as resequencing of Bacillus anthracis strains for rapid and accurate forensic typing . In our presently described study, the SOLiD platform was successfully utilized for rapid comparative genomic analysis of clade-specific and core genome sequences of the opportunistic pathogen V. vulnificus.
By examining the genomic DNA of each of four V. vulnificus strains on one-fourth of a SOLiD slide, we obtained 3.16 × 107 to 3.50 × 107 35-nt reads. This level of sequencing yielded approximately 100-fold coverage of each genome. Although the total numbers of reads would have predicted over 200-fold coverage, there was a significant amount of low complexity reads, as well as reads that were unmappable to the reference genomes.
We identified sequences that are unique to the highly virulent clade 2 strains. These 80 genes represent the set that could contain virulence genes that are responsible for the ability of clade 2 strains to cause systemic infection and death in subcutaneously inoculated iron dextran-treated mice (Thiaville, P.C., et al., Infect. Immun. submitted). Furthermore, we identified 61 additional genes that are common to the clade 2 strains and an unusual highly virulent clade 1 strain but absent from a typical attenuated clade 1 strain and a biotype 2 eel isolate. These 61 genes represent a very interesting set that could contain generally clade 2-specific genes that were acquired by a clade 1 strain and increased its virulence to that of typical clade 2 strains. Among these putative virulence genes were genomic island XII identified by Cohen et al. , and most interesting was a set of genes involved with sialic acid catabolism. Jeong et al.  recently determined that the ability to utilize sialic acid for metabolism was essential for virulence of V. vulnificus. We are currently examining the possible roles of several of these loci in virulence.
At the time of our performing this genomic sequence analysis, we had not performed virulence studies of biotype 2 ATCC 33149 in our subcutaneously inoculated iron dextran-treated mouse model for infection. However, Amaro et al.  previously reported that ATCC 33149 was of modest virulence in a different mouse model involving intraperitoneal infection. Based on our results indicating that ATCC 33149 lacked the genes shared among virulent clade 2 strains or clade 2 strains plus virulent clade 1 99-738 DP-B5, we hypothesized that ATCC 33149 would be attenuated for virulence in our model. In fact, when administered at the standard lethal dose of 1,000 CFU for virulent strains, ATCC 33149 caused only minimally detectable skin infection in one of five mice. Furthermore, when administered at 100 times the typical lethal dose (105 CFU/mouse), skin infection but no systemic infection or death ensued. Therefore, our genomic analysis of ATCC 33149 correctly predicted its attenuated virulence. It should be noted that Amaro and Biosca  reported that some biotype 2 strains are virulent for mammals, so the attenuation of ATCC 33149 was not a foregone conclusion.
Because phenotypic differences are not only rooted in presence or absence of whole genes, but also nucleotide polymorphisms, we generated a set of SNPs among the shared sequences of the reference and newly sequenced genomes (Table 4 and Additional Files 6, 7, 8, 9, 10, 11, 12, and 13). By examining Sanger-derived sequences for a subset of SNPs, we determined that 98.4% of our reported SNPs are accurate. Of the 128 SNPs examined, only two in one gene of one strain were not confirmed by Sanger sequencing.
Although the sample size of newly sequenced strains was small and each strain is a single representative of a unique genotype/virulence phenotype combination, some interesting relationships in SNPs were observed. Most interesting, the rate of SNPs was significantly higher for genes encoded on chromosome 2 compared with chromosome 1. Given that chromosome 1 of Vibrio is believed to encode most of the essential genes and that chromosome 2 is believed to have been acquired exogenously , it is reasonable that the highest rate of polymorphisms would occur in the chromosome 2. The number of SNPs between M06-24/O and the reference genomes was much lower than those from the other three genomes (Table 4), even though there were slightly more genes identified in M06-24/O. Because M06-24/O is in the same clade as the reference genomes, this result would be expected. Significant differences were observed in the frequencies of SNPs among about every subset of genes examined, e.g., clade 2-specific, core genome, hypothetical proteins (Figure 2). However, it must be noted that the numbers of strains contributing to the SNP pool for these subsets of genes differ between the sets. For example, the core genome is shared among all six strains, so all four newly sequenced strains contributed SNPs and could have generated a higher frequency of SNPs. In contrast, for the clade 2-specific genes, the only newly sequenced strain contributing SNPs, by definition of the subset, was M06-24/O.
By comparing the sequences shared among all six genomes, we identified the core V. vulnificus genome consisting of 3,459 genes. Gu et al.  examined the genomic sequences of all Vibrio species as of 2008 and identified 1,882 genes common to the genus. We are presently examining the core V. vulnificus genome to deduce possible metabolic and virulence characteristics of the species. We identified 20 genes previously unreported in V. vulnificus by using MAQ to compare the unmapped reads to the V. cholerae N16961 genome. If the clade 1 or biotype 2 genomes possessed sequences with sufficient similarity to the V. cholerae genome, we should have been able to identify and assemble them exactly as we did for the V. vulnificus reference genomes.
Most recently, Chun et al.  examined the genomes of 23 V. cholerae strains collected over 98 years. Their newly sequences genomes were obtained using a combination of Sanger and 454 sequencing. Like us, they based their phylogenetic relationships primarily on presence or absence of ORFs. Their analysis enabled the division of that species into 12 lineages, with one comprising the O1 strains and the seventh pandemic comprising a nearly identical clade. They determined that horizontal gene transfer significantly contributed to the evolution of the species.