Shotgun sequencing of Yersinia enterocolitica strain W22703 (biotype 2, serotype O:9): genomic evidence for oscillation between invertebrates and mammals

Background Yersinia enterocolitica strains responsible for mild gastroenteritis in humans are very diverse with respect to their metabolic and virulence properties. Strain W22703 (biotype 2, serotype O:9) was recently identified to possess nematocidal and insecticidal activity. To better understand the relationship between pathogenicity towards insects and humans, we compared the W22703 genome with that of the highly pathogenic strain 8081 (biotype1B; serotype O:8), the only Y. enterocolitica strain sequenced so far. Results We used whole-genome shotgun data to assemble, annotate and analyse the sequence of strain W22703. Numerous factors assumed to contribute to enteric survival and pathogenesis, among them osmoregulated periplasmic glucan, hydrogenases, cobalamin-dependent pathways, iron uptake systems and the Yersinia genome island 1 (YGI-1) involved in tight adherence were identified to be common to the 8081 and W22703 genomes. However, sets of ~550 genes revealed to be specific for each of them in comparison to the other strain. The plasticity zone (PZ) of 142 kb in the W22703 genome carries an ancient flagellar cluster Flg-2 of ~40 kb, but it lacks the pathogenicity island YAPIYe, the secretion system ysa and yts1, and other virulence determinants of the 8081 PZ. Its composition underlines the prominent variability of this genome region and demonstrates its contribution to the higher pathogenicity of biotype 1B strains with respect to W22703. A novel type three secretion system of mosaic structure was found in the genome of W22703 that is absent in the sequenced strains of the human pathogenic Yersinia species, but conserved in the genomes of the apathogenic species. We identified several regions of differences in W22703 that mainly code for transporters, regulators, metabolic pathways, and defence factors. Conclusion The W22703 sequence analysis revealed a genome composition distinct from other pathogenic Yersinia enterocolitica strains, thus contributing novel data to the Y. enterocolitica pan-genome. This study also sheds further light on the strategies of this pathogen to cope with its environments.

We have recently shown that strain W22703 (biotype 2, serotype O:9) confers lethality towards nematodes and Manduca sexta larvae upon oral infection, and that this insecticidal activity is correlated with the presence of the so-called pathogenicity island TC-PAI Ye [13,14]. This 20 kb-fragment is present in the biotype 2-5 strains, but absent in most biotype 1A and B strains, and carries the toxin complex (TC) genes tcaA, tcaB, tcaC and tccC with homology to TC genes of Photorhabdus luminescens. However, the absence of TC-PAI Ye is not reflected by a loss of toxicity in case of subcutaneous infection, indicating the presence of yet unknown insecticidal determinants in Y. enterocolitica [15].
To investigate the genomic heterogeneity of the species Y. enterocolitica, we have chosen to sequence the genome of the low-pathogenicity strain W22703. We report the annotation of this second genome sequence of a Y. enterocolitica strain, and a detailed comparative genome analysis of the W22703 genome with that of strain 8081, a representative of the highly pathogenic biotype 1B group. The data obtained provide novel insights into the biology, metabolism, adaptation strategies and evolutionary relationships of Y. enterocolitica.

General features
The shotgun sequencing of the Y. enterocolitica strain W22703 genome revealed a total number of 243656 reads with an average read length of 363. Assembly of 232502 reads resulted in 305 contigs larger and 705 contigs shorter than 1,000 base pairs (bp) with a median level of coverage in contigs > 5 kb of 16.49 (Additional file 1); one contig (1796) exceeds this coverage level more than twice (40x). The genome has an average G + C content of 46.9% (Table 1). Upon PEDANT based annotation [16] and search against a non-redundant protein database, 4003 genes corresponding to a coding density of 84.4% could be identified, but an unknown number of genes might have been missed due the short contigs not assembled (Additional file 2). The analysis also revealed at least 68 tRNA genes. The fewer number of tRNA genes compared to finished Yersinia genomes is probably due to collapsing of reads of the repeat sequences into fewer contigs [17]. The exact number of rRNA operons could not be estimated from this draft assembly, as reads from identical copies probably assemble into the same contigs. The risk of frameshifts due to sequencing errors in longer homo-oligomers was reduced by the high coverage of the assembly. We have determined 111 pairs of consecutive ORFs having best similarity to the same protein. However, this number also includes real pseudogenes not affected by any sequencing error.
Genome comparison with strain 8081 Y. enterocolitica 8081 is one of three strains of this species whose sequences were available until February 2011 [18][19][20]. It belongs to the biotype 1B group with higher pathogenicity potential to humans than the biotype 2-5 group. To delineate the most relevant features of the W22703 genome, we decided to base our further analysis on a genome comparison between the shotgun sequence of strain W22703 and the linear genome sequence of Y. enterocolitica strain 8081. The alignment of both genomes using Mauve [21] shows long syntenic regions with few rearrangements and a general high sequence conservation, but also regions in both genomes that are not shared with the other (Additional file 3). Upon automatic and manual BLAST analysis, we revealed 550 genes present in the 8081 genome but absent in that of W22703, and 551 genes that are specific for W22703 with respect to strain 8081. The virulence plasmid pYV [22] was not considered here. Figure  1 shows the categories under which the W22703 genes absent in 8081 are summarized. Besides hypothetical genes and those of unknown function, the largest numbers of gene-encoded factors fall into the functional groups transporter, metabolism and DNA/RNA processing. The latter group comprises 18 regulatory genes. The motility and phage sections are mainly composed of the ancestral flagellar locus flgII (see below) and one specific prophage.

Regions of difference
We then searched for regions of difference (ROD) between the genome sequences of 8081 and W22703. By definition, those ROD genes do not belong to the core genome of the two Y. enterocolitica strains compared here, but might constitute additional metabolic or virulence-associated properties contributing to the overall strain fitness. Twelve ROD present in strain W22703 are shown in Figure 2A. While the average GC-content of the W22703 genome sequence is 46.9%, the ROD on contigs 1240, 1162, 1764, 1812, and 1280 show an at least 2% higher or lower GC-content, suggesting their acquisition by lateral gene transfer (LGT) [23]. Phylogenetic tree analysis, however, revealed closely related genes of these contigs in other Yersinia strains, with the exception of contig 1280 that harbours phage related genes. The 8081 genes flanking the ROD might give additional information about the underlying recombination events. For example, a glycosyltransferase operon of 8081 (YE3070-YE3087) might have been replaced by a related operon on contig 1878 (or vice versa) that possibly contributes to O-antigen synthesis. A substitution of hypothetical genes by a non-homologous cluster of functionally unknown genes is observed in contig 1764. The ROD on contigs 1186, 1240, 1280 and 1973 obviously interrupt gene linearity with respect to the 8081 genome, indicating that loss and substitution, or insertion, of genes might have taken place in these cases. Contig 1280 harbours several phage-related genes and is therefore assumed to represent a second prophage region. Transposase genes were found in the 8081 genome between YE2773 and YE2779, a region covered by the LPS synthesis in W22703 (contig 1162), and a similar observation was made for the PTS encoding cluster on contig 1884. More functional details on these ROD are described below.
Virulence genes or cluster present or absent in W22703 compared with 8081 While YGI-1, which is responsible for adhesion and includes a T4SS, is completely encoded on contig 1802, the high pathogenicity island (HPI) encoding yersiniabactin [24] is missing in the genome of strain W22703.    In contrast, we identified 18 functions probably involved in defence or virulence mechanisms present in W22703, but absent in 8081 (Table 2). Two autotransporter or type V secretion proteins (contig 1177), one of them with homology to an AidA-like adhesin, might play a role in (non-mammalian) host-recognition by W22703. The insecticidal pathogenicity island TC-PAI Ye is characteristic for biotype 2-5 strains, but absent in biotype 1A and 1B strains with the exception of WA314 (biotype 1B, serotype O:8) [14,15]. Another homology group of TC genes was found to be prevalent among clinical biotype 1A strains [25]. Beside a second, nonclustered tccC2 locus [15], no further factors with homology to toxin complex genes could be identified in W22703.

Secretion/transfer systems and transporters
Two distinct, chromosomally located type three secretion systems (T3SS) mark one of the most striking differences between the two genomes. While ysa of 8081 is absent in W22703, this strain harbours another T3SS, which we termed ysa2, located on contigs 1804 and 1807 ( Figure 2B). PCR targeting the flanking regions resulted in a~200 bp fragment, justifying the link of both contigs. Sequence analysis of ysa2 revealed a mosaic structure with a G/C content of 49.4% between the flanking genes YE0315 and YE0312, and a G/C content of 40.9% between YE0312 and YE0311, indicating two independent LGT events. The whole cluster is collinear to respective regions in apathogenic yersiniae such as Y. frederikseni and Y. intermedia. We found homologs of the plasmid-encoded T3SS of Y. pestis and Y. pseudotuberculosis, and partial collinearity, with respect to the left part of this 29 kb island. The right part carries yscC and yscD homologs, but no homologs of YopB, YopD or LcrV involved in translocon formation [26]. The functionality of ysa2 is unknown. Of the two type 2 secretion system (T2SS) cluster yst1 and yst2 in 8081, only yst2 responsible for a general secretion pathway (GSP) is present in strain W22703. Contig 1170 encodes factors involved in conjugal DNA-transfer, namely TraD, a MobA/MobL protein, and a putative type IV prepilin that might contribute to LGT. As iron is often a rate-limiting factor for pathogenic bacteria during infection, W22703 requires iron-scavenging systems for survival in the host. Beside the two iron and enterobactin transport systems within the PZ, those comprise a hemophore cluster for heme binding and uptake (Table 2), and a putative iron binding protein (contig 1360), the latter one absent in 8081.
We also identified eight ABC transporters in W22703, two phosphotransferase systems (PTS), four permeases, two major facilitators, a sodium:bile acid symporter, and other transporters listed in Table 2 or mentioned in the sections on metabolism and virulence. All of these are without counterparts in the genome of strain 8081. A glucitol/sorbitol-specific transporter (contig 1884; YE1093-YE1098) and a sorbose uptake system are also present in 8081, but not in Y. pestis and Y. pseudotuberculosis, and have homologs in all or most non-pathogenic species sequenced so far. A cellobiose uptake system was also identified (contig 1882). In total, our analysis identified a higher number of putative transporters with respect to strain 8081.

Plasticity zone (PZ)
The plasticity zone of strain 8081 ranges from YE3447 to YE3644 and has a total length of approximately 199 kb with 186 coding sequences (CDS). It was defined by Thomson et al. [18] as the largest region of species-specific genomic variation among Y. enterocolitica biotypes, and it is absent from Y. pestis and Y. pseudotuberculosis. Four contigs revealed to carry PZ genes. We linked contigs 1088/1891 and 1803/1802 due to the presence of truncated hypF and fepG genes, respectively, at their ends. The primer combination 5'-GTTTCTTTATGGGC GCG-3'/5'-TTGGCATGGAGGCCTG-3' hybridizing to the ends of contigs 1891 and 1803 resulted in a PCR product of approximately 1500 bp, thus allowing the linear reconstruction of the W22703-specific PZ (Figure 3). With a total length of~142,000 kb, it is significantly shorter than that of 8081 and exhibits a comparably low density of virulence genes. Many discrete functional units of the 8081 PZ are indeed missing in the W22703 genome as confirmed by BLAST search of any PZ-encoded protein against the translated shotgun sequence. The most prominent ones are (i) the pathogenicity island YAPI Ye of 66 CDS including a putative hemolysin, a toxin/antitoxin system ccdA/ccdB, and a type IV pilus operon, (ii) the T3SS ysa important for pathogenicity in an mouse oral infection model [27] and (iii) the T2SS yst1 required for full virulence [28]. Further 8081 PZ genes absent in W22703 are the two putative two component-systems (TCS) YE3561/ YE3563 and YE3578/YE3579, a chitinase (YE3576), a putative lipase (YE3614), a putative copper/silver efflux system (YE3626-YE3630), and the arsenic resistance operon. However, it is also worthy of note that PZ loci assumed or known to play a role in pathogenicity towards invertebrates or vertebrates are present in the W22703 genome. Examples are the YGI-1 mentioned above, the hydrogenase 2-(hyb-) locus, fecBCDE encoding an iron transporter, the ferric enterbactin transport system fepBDGC/fes, and proP encoding a betain/proline transporter involved in osmoprotection and osmoregulation. The recently identified flagellar cluster Flg-2 [29] within the PZ of W22703 is absent in the genome of 8081. It comprises 44 genes encoding factors for the flagellar motility apparatus and for flagellar biosynthesis, but lacks chemotaxis genes (Figure 3). A region of approximately 11,300 bp flanked by a transposase gene and the replicon of an IncF plasmid RepFIB comprises an ABC transporter and a regulatory gene; however, the functionality of this region, which is unique with respect to all Yersinia sequences available so far, is in doubt due to its low coding density.   potE, putrescine-ornithine antiporter; mdtN, multidrug resistance protein; map, methionine aminopeptidase; ydfJ, putative metabolite transporter; yjbB, putative Na + /Pi-cotransporter; proP, betain/proline transporter; ydcL, putative lipoprotein; thiM, hydroxyethylthiazole kinase; fic, filamentation induced by cAMP protein; motA/motB, flagellar motor proteins; tnp, transposase; MACPII, methyl-accepting chemotaxis protein; fes, enterochelin esterase; virF, virulence regulon transcriptional activator; rep, IncF plasmid RepFIB replicon; alkB, α-ketoglutarate-dependent dioxygenase; exbB, MotA/TolQ/ExbB proton channel family protein; asterisks: low-temperature induced genes as identified by luciferase reportering [60]. Shaded regions are absent in 8081. See text for more details.

Surface components
The main flagella and chemotaxis gene cluster I (flg-1) of 8081 is present in W22703 (contigs 1361, 1428, 1469 and 1890), but only one of two type-1 fimbrial operons was found (contig 1271; YE0782-YE0786). The surfaceexposed lipopolysaccharide (LPS) molecule constituting the serotype O:9 O-antigen is synthesized by the Opolysaccharide gene cluster (contig 1162) [30]; a second glycosyltransferase gene cluster is located on contig 1878 (Figure 2A). The role of the O-polysaccharide and the outer core hexasaccharide in resistance of Y. enterocolitica to human complement and polymyxin B has been described recently [30].

Metabolism
Several enzymatic activities common to both Y. enterocolitica genomes compared here are involved in nitrogen metabolism. Examples are the capability to catalyse urease that is encoded by seven genes on contig 1225. The assimilation of the urease product ammonia for amino acid and nucleotide synthesis is then achieved by glutamine synthase. Two ornithine decarboxylases forming putrescine, and a putrescine/ornithine antiporter are encoded on the PZ and also contribute to amino acid metabolism. Like 8081, strain W22703 carries the cel gene cluster for cellulose production (contig 1798) and the genes mdoC, mdoG and mdoH for osmoregulated periplasmic glucan (OPG) biosynthesis (contig 1967). Both the capability to produce cellulose and to synthesize OPGs have been lost or inactivated in Y. pestis and Y. pseudotuberculosis. OPG mutants exhibit deficiencies in virulence, biofilm formation and antibiotic resistance, as well as hypersensitivity towards bile salts [18].
The capability to utilize propanediol in a cobalamin (vitamin B 12 )-dependent manner is encoded on contigs 2012, 1667, 1476, 1555, 1999, and 1235, and the respective cob/cbi/pdu genes are collinear to the 8081 genes YE2707-YE2750. In line with the yersiniae core genome, ttr genes responsible for tetrathionate reduction are present (contig 1975), and the eut genes allowing B 12dependent ethanolamine utilization are absent. The mtn genes located on contig 1812 are involved in methionine salvage. This cobalamin-dependent pathway recycles methylthioadenosine derived from sperimidine, spermin and N-acylhomoserine lactone synthesis. The hydrogenases Hyd-4 encoded within the hyf locus (contigs 1162, 1947) and Hyd-2 within the hyb locus (PZ; contig 1891, Figure 3) are also present in W22703.

Distinct metabolic properties of W22703
W22703 is endowed with several metabolic enzymes that are unique in comparison to strain 8081 ( Figure  2A), among them a serine-pyruvate transaminase involved in glycine-, serine-and threonine-metabolism (contig 1812). The reductases encoded on contigs 1186 and 1976 suggest that W22703, but not 8081, is able to use nitrate and dimethylsulfoxide (DMSO) as alternative electron acceptor under anaerobic conditions. In contrast, pathways for trimethylamine and thiosulfate oxidation are present in both genomes. Beside the DMSO reductase, contig 1186 harbours another ROD encoding an ABC transporter, a putative nitrilase or cyanide hydratase that catalyzes nitrile into amino acids and ammonium or hydrogen cyanid into formamide, and a putative acetamidase or formamidase. Contig 1973 carries a gene cluster enabling W22703 to uptake N-acetylgalactosamine that is then isomerized to tagatose. In addition to YE0550A-YE0555, we identified a second operon for sucrose utilization on contig 1240.

Metabolic pathways lost in W22703
We identified few enzymes or capabilities that are missing in W22703 in comparison to 8081 (Table 3). Examples are the absence of a chitinase, and of the lipase YE3614 that is probably responsible for the lipase negative reaction of W22703 as a biotype 2 strain [31]. In addition to the arsenic resistance operon on YAPI Ye , homologs of a second operon with this function (YE3364-YE3366) are absent in W22703.

Dynamic genomes: further regions absent in W22703 or 8081
Whole genome comparison allows to follow the dynamic processes by which genomes separate from a common ancestor. In addition to genomic islands or clusters already mentioned above, the genomes of the two Y. enterocolitica strains compared here differ by a set of regions, indicating the dynamic of sequence acquisition and loss. The prophage regions YE98 (YE0854-YE0888), YE185 (YE1667-YE1693), YE200 (YE1799-YE1819) and YE250 (YE2292-YE2363) of strain 8081 are absent in strain W22703 that, however, carries another prophage of 37 CDS in contig 1796. Then, the Yersinia genome islands YGI-2, YGI-3 and YGI-4 [18] are missing in W22703. YGI-2 carries genes for the synthesis, modification, and export of an outer membrane anchored glycolipoprotein. Of this island, only homologs of YE0912 encoding a 2,5 diketo-D-gluconic acid reductase B and of YE0911 encoding a 3-oxo-acyl-(acyl carrier protein) synthase II are present in W22703. On contig 1854, we identified two homologs of YE0979, which encodes a DNA-binding protein, and the hypothetical gene YE0980 from YGI3 harbouring a putative integrated plasmid.

Discussion
The genome analysis and genome comparison performed here intended to contribute to a better understanding of the ecology, pathogenicity and evolution of  Y. enterocolitica. Adding, rearranging and reducing or losing DNA has been proposed as the general recipe for Yersinia genome evolution when eight less-pathogenic strains had been compared [17]. Several ROD shown in Figure 2A-B and Figure 3 might be recent acquisitions due to the significant deviation of their G/C content, while others might have been acquired early after separation of both strains, or indicate regions that have been lost or substituted in 8081. Beside the virulence plasmid, the PZ is a second example for the acquisition of virulence genes. Remarkably, this region is approximately 55 kb shorter in W22703 and lacks several determinants proposed or known to contribute to pathgenicity towards humans. Together with the presence of a large ancient flagellar gene cluster and a region of obvious genetic degeneration, this finding strongly reflects the lower virulence potential of W22703 in comparison to 8081 [32], and confirms the importance of this region for the manifestation of virulence properties of Y. enterocolitica [18].
Other (virulence) regions absent in W22703 might have been acquired by 8081 after separation of both strains. The novel T3SS (ysa2) that is absent in the pathogenic species Y. pestis and Y. pseudotuberculosis, but present in apathogenic species such as Y. intermedia, Y. frederiksenii and Y. kristensenii, might play a role in the interaction with non-mammalian hosts. The TC proteins have been shown to be secreted upon activity of the plasmid-encoded T3SS of Y. pestis [33]. Since we used a pYV-free W22703 derivative to demonstrate TCbased insecticidal activity of this strain [13], ysa2 is a candidate for TcaA secretion by W22703. This finding supports the assumption that T3SS are not unique vehicles for delivering anti-vertebrate factors but ancient secretion systems for the transport of effector molecules across host membranes, with the potential to play a role in a wide range of bacteria-host interactions [34].
Together with a number of putative virulence factors of W22703 (Table 2), ysa2 contributes to the pathosphere of yersiniae, a concept hypothesizing that all of the pathogenic genes shared by enteric bacteria form a "pool" [35]. The pathogen Y. enterocolitica has a complex life cycle encompassing aquatic and biological environments. Due to its known capability to interact with invertebrates and mammals, it exhibits a multiphasic phenotype upon colonizing and potentially killing more than one host species [36]. Although little is known about putative signals and regulatory circuits required to switch or modulate necessary changes from one state to the other, candidates are genes listed in Table 2 or induced at low temperature [37].
Although some of their functions remain to be experimentally confirmed, the metabolic pathways present in strain W22703 confirm the relevance of several metabolic traits for gut-adapted Y. enterocolitica. Examples are cobalamin-dependent utilization of propanediol, hydrogenase activities, cellulose production, tetrathionate reduction and ornithine decarboxylase activity, all of which are absent, lost or inactivated in systemic Y. pestis and Y. pseudotuberculosis [17,18]. Interestingly, tetrathionate that acts as a terminal electron acceptor during anaerobic degradation of 1,2-propanediol or ethanolamine is formed in the inflamed gut upon infection [38]. The hydrogenases Hyd-4 and Hyd-2 might contribute to the adaptation of yersiniae to gut environments [39,40]. Two additional reductases of W22703 allowing to use nitrate and DMSO are in line with this assumption. A potentially insect-specific resistance mechanism and/or catabolic trait of strain W22703 is provided by the ROD of contig 1186 (Figure 2A) inserted between YE0815 and YE0816. It encodes an acetamidase/formamidase, a branched chain amino acid transporter, an ABC transporter and a putative nitrilase/ cyanide hydratase. These predicted functions point to a role of this chromosomal region in the acquisition of nitrogen sources. Indeed, insects such as Zygaena filipendulae produce cyanogenic glucosides that might then be used by W22703 as nitrogen source via a metabolic route of bacteria that includes cyanide nitrilase, hydratase and formate dehydrogenase activities [41][42][43]. Although speculative so far, the functions of contig 1186 might represent a further determinant contributing to invertebrate host adaptation of strain W22703. The operon on contig 1973 including a PTS is responsible for tagatose utilization; this metabolic trait is not only common to human intestinal bacteria, but was found to be specifically induced during insect infection by P. luminescens [44,45]. Interestingly, the PZ gene bsh encodes a chologlycin hydrolase or bile salt hydrolase ( Figure 3) that catalyzes the deconjugation of conjugated bile salts to liberate amino acids and free primary bile acids [46].
Recently, a genome comparison between the genomes of the insect pathogen P. luminescens and Y. enterocolitica 8081 revealed a huge number of common genes that might contribute to the adaptation of yersiniae to invertebrate hosts [37]. Interestingly, we identified nearly all of these factors also in the genome of W22703, underlining the assumption that Y. enterocolitica strains share the capability to interact with nematodes or insects.

Conclusion
Although further genome sequences are required to learn more about the evolution of Y. enterocolitica strains, this study indicates that beside the Yersinia virulence plasmids, the highly flexible PZ indeed contributes to the acquisition of determinants that might increase the pathogenicity towards humans. On the other hand, insecticidal toxins, the novel T3SS or specific metabolic properties might play a crucial role for the adaptation of Y. enterocolitica strains to non-mammalian hosts.

Methods
Bacterial strains and growth conditions Y. enterocolitica W22703(pYVe227) is a nalidixic acidresistant (Nal R ) restriction mutant (Res-Mod') isolated from strain W227 [47]. A plasmidless isogenic derivative (W22703 pYV -) was used. To avoid contaminations and to validate the strain cultured for DNA isolation, strain W22703 pYVwas streaked from a glycerol stock on Yersinia selective agar plates (CIN agar base; Becton Dickison, Heidelberg, Germany). A single colony was used for inoculation of Luria-Bertani (LB) broth (10 g l -1 tryptone, 5 g l -1 yeast extract, and 5 g l -1 NaCl) containing 20 μg ml -1 nalidixic acid, and the culture was grown for twelve hours at a selective temperature of 15°C. When the culture had reached stationary phase, aliquots were plated in parallel on LB and Yersinia seletive agar plates. PCRs targeting W22703-specific genes tcaA, tcaC and two genes of Flg-2 were performed as a further control.

Genome sequencing and accession
High throughput sequencing of a shotgun library was done on the GS FLX system (Roche, 454 Life Sciences, Branford, USA) using the Titanium series with approximately 20-fold coverage, and assembly were performed by Eurofins MWG GmbH, Ebersberg, Germany. According to the newly defined standards for classification of genome sequences [49], the Y. enterocolitica genome sequence belongs to the category "Annotation-Directed Improvement". The EMBL accession numbers for the sequences reported in this paper are FR718488-FR718797. The raw sequence data files are deposited in the ENA trace archive as ERP000495. The annotated sequence is available under the URL address http://pedant.gsf.de.

Genome annotation and analyses
The PEDANT software system (http://pedant.gsf.de; [50]) was used for automatic genome sequence analysis and annotation [16]. Protein coding genes were predicted using the GeneMarkS software program using default settings [51]. Biochemical pathway prediction and reconstruction were performed using the KEGG [52], BRENDA [53], and Microbes online [54] databases. tRNAs were identified using tRNAscanSE [55], rRNA homologs with blastn [56]. Additional manual homology searches of predicted proteins were performed by BLAST analysis (http://www.ncbi.nlm.nih.gov/BLAST/) to ascribe a protein function or domain.
Comparison with the genome sequence of Y. enterocolitica 8081 (EMBL accession numbers are AM286415 for the chromosome and AM286416 for the virulence plasmid pYV; PEDANT database name is p3_p190_Yer_ enter) was performed using the Y. enterocolitica Blast Server from the Sanger Institute (http://www.sanger.ac. uk/cgi-bin/blast/submitblast/yersinia). The criterion applied was an 80% identity of the amino acid sequence. Incomplete proteins encoded on contig ends were considered to be present in strain W22703 if the lacking sequence could be identified on another contig. Genome sequences of Yersinia strains were obtained from the NCBI database and compared using the homepage http://www.microbesonline.org/. Protein sequence alignment was done with the ClustalW program [57]. Phylogenetic trees for ROD and T3SS have been automatically calculated using the the software PhyloGenie [58] and the default parameters according to its documentation. We used NCBI nr [59] as reference database and excluded proteins of unclassified taxa.

Additional material
Additional file 1: FR accession numbers of all W22703 contigs.
Additional file 2: W22703 gene names, locus tags, and protein accession numbers.
Additional file 3: Mauve-type genome alignment between the reference genome of strain 8081 (chromosome and plasmid; top) and draft genome of strain W22703 (contigs; bottom). Red lines indicate chromosome and contig borders. Similar regions are indicated by frames and assigned to each other by connecting lines. The degree of sequence similarity is shown within each region as similarity plot.