Complete genome sequence of the fire blight pathogen Erwinia pyrifoliae DSM 12163T and comparative genomic insights into plant pathogenicity

Background Erwinia pyrifoliae is a newly described necrotrophic pathogen, which causes fire blight on Asian (Nashi) pear and is geographically restricted to Eastern Asia. Relatively little is known about its genetics compared to the closely related main fire blight pathogen E. amylovora. Results The genome of the type strain of E. pyrifoliae strain DSM 12163T, was sequenced using both 454 and Solexa pyrosequencing and annotated. The genome contains a circular chromosome of 4.026 Mb and four small plasmids. Based on their respective role in virulence in E. amylovora or related organisms, we identified several putative virulence factors, including type III and type VI secretion systems and their effectors, flagellar genes, sorbitol metabolism, iron uptake determinants, and quorum-sensing components. A deletion in the rpoS gene covering the most conserved region of the protein was identified which may contribute to the difference in virulence/host-range compared to E. amylovora. Comparative genomics with the pome fruit epiphyte Erwinia tasmaniensis Et1/99 showed that both species are overall highly similar, although specific differences were identified, for example the presence of some phage gene-containing regions and a high number of putative genomic islands containing transposases in the E. pyrifoliae DSM 12163T genome. Conclusions The E. pyrifoliae genome is an important addition to the published genome of E. tasmaniensis and the unfinished genome of E. amylovora providing a foundation for re-sequencing additional strains that may shed light on the evolution of the host-range and virulence/pathogenicity of this important group of plant-associated bacteria.


Background
Fire blight caused by the enterobacterium Erwinia amylovora, a quarantine pathogen in Europe, is the most important global threat to pome fruit production (i.e., apple, pear) and to a wide-variety of Rosaceae (Maloideae), including amenity and forest species [1]. E. amylovora is native to the North Eastern USA, and it was the first phytopathogenic bacterium described (see [2]). This invasive pathogen was first introduced into the UK in the late 1950s and has spread across Europe over the past three decades, with continuing advance eastward that threatens the native origin of apple in Central Asia [3]. Fire blight symptoms include recurvature of shoots (shepherd's crook), necrosis, ooze, and cankers. The pathogen can enter through nectaries, hydathodes and wounds resulting in blossom, shoot or rootstock blight syndromes [4]. E. amylovora growth is highly dependent on weather conditions, which has been used to develop disease forecasting models, and it is passively vectored by flower-foraging insects [5]. Epidemics can develop rapidly and result in death of individual plants or entire orchards within a single season, leading to severe economic losses [2]. E. amylovora has poor ecological fitness away from host or surrogate plants [4].
Erwinia pyrifoliae is a newly described pathogen, closely related to the main fire blight pathogen E. amylovora. E. pyrifoliae is primarily a pathogen of Asian or Nashi pear (Pyrus pyrifolia) and is considered to have a restricted geographic distribution in East Asia (Korea and Japan) [6][7][8]. Monitoring to detect E. pyrifoliae is rarely conducted, although a quantitative PCR method based on differential 16S rRNA and 16S-23S ITS sequences [9,10] has recently been developed; thus the precise distribution of this pathogen is somewhat uncertain. E. pyrifoliae causes fire blight disease symptoms essentially indistinguishable from those of E. amylovora infection. Limited host germplasm screening with E. pyrifoliae strains indicates a slightly wider host-range than originally thought, with virulence observed on selected commercial pear (Pyrus communis) and apple (Malus domestica, 'Idared') varieties [9]. The level of virulence on non-Asian pear hosts is markedly lower than that typically observed in E. amylovora, suggesting that important genetic differences remain to be elucidated.
In contrast to the considerable genetic information available for E. amylovora [11,12], relatively little is known about the genetic basis of virulence and environmental fitness in E. pyrifoliae [13]. As for E. amylovora, E. pyrifoliae intra-species genotypic diversity appears low, with strain differences primarily observed in plasmid content [14]. Thus far, only minor genetic differences have been found between E. pyrifoliae and E. amylovora [13,15].
The objectives of our work were to sequence the complete genome of the type strain of E. pyrifoliae, strain DSM 12163 T , and then compare it to the recently sequenced genome of the non-pathogenic pome fruit epiphyte, E. tasmaniensis Et1/99 [16]. Identifying differential sequences among these related bacteria may provide useful insights into host-pathogen interactions that could eventually be exploited for fire blight control.
Annotation was completed using the genome annotation system GenDB [17], and manually optimized. In total, 4,038 open reading frames (ORFs) were predicted. The chromosome contains seven rRNA operons, and 74 tRNAs, which falls within the range observed for most free-living enterobacterial genomes. Several low G+C regions indicated the presence of horizontally acquired sequences. Inspection of some of these indicated the presence of several prophages in the chromosome.

Plasmids
Earlier reports identified four plasmids in a large group of E. pyrifoliae strains, including strain DSM 12163 T [18,19]. Recently, the complete sequences of plasmids pEP36 and pEP2.6 from E. pyrifoliae Ep1/96 were published [8]. These two plasmids were found to be present in the genome of E. pyrifoliae DSM 12163 T with slightly differing sizes (35901 bp and 2610 bp, respectively). We sequenced and annotated two additional small plasmids of sizes 4,960 bp (pEP5) and 3,070 bp (pEP3).
From the coverage of the individual plasmids, it is possible to estimate the copy number of each plasmid in E. pyrifoliae DSM 12163 T (Table 1). According to this calculation, plasmid pEP36 and pEP5 are present in 5-6 copies per cell, while the small plasmids pEP3 and pEP2.6 have a medium copy number ranging from 18-20 copies per cell. This explains the high yield of plasmids reported in earlier studies, whereas it is more difficult to obtain plasmid pEA29 from E. amylovora present at approximately two copies per cell (T.H.M. Smits and B. Duffy, unpublished data).
Plasmid pEP3 shares a region ( Fig. 2) with that of ColE1-type plasmid pEP2.6 [8], having 99.2% sequence identity over approximately 800 bp including the RNA I modulator gene but not covering the oriV region. The rest of the plasmid contains a large region that is similar (78% identity) to a part of plasmid pEP5 (see Fig. 2). The latter plasmid contains mobCABD genes similar to those on the ColE1-type plasmid pSW100 found in the phytopathogenic bacterium Pantoea stewartii subsp. stewartii SW2 [20], indicating that it may be a transmissible element. The region that is shared between pEP3 and pEP5 encodes three ORFs (see Fig. 2), two of which are hypothetical proteins and one which encodes Hcp, a putative type VI secretion system (T6SS) effector protein (see below).
rpoS RNA polymerase sigma factor deletion During genome assembly, it was observed that E. pyrifoliae DSM 12163 T has a 140 bp deletion in the rpoS gene (EPYR_03050), when compared to the E. amylovora OT1 or E. tasmaniensis Et1/99 rpoS gene sequence. This 140 bp deletion results in a frameshift and a coding interruption after the RpoS subregion 1.2 [21]. After this deletion, the remaining sequence of the rpoS gene resumes in a different frame, starting from subregion 2.4 in RpoS [21]. This deletion thus encompasses a highly conserved region that putatively interacts with the core RNA polymerase.
Using PCR primers designed in the rpoS region, we confirmed that this deletion is present in E. pyrifoliae DSM 12163 T (Fig. 3). However, the rpoS gene of Figure 1 Circular representation of the chromosome of E. pyrifoliae DSM 12163 T . Circles (from outside to inside) First: scale bar in kb; second and third: predicted coding sequences of E. pyrifoliae DSM 12163 T chromosome on the leading and lagging strand, respectively (colors according to COGs); fourth: coding sequences of the hrp/hrc T3SS (red), the inv/spa T3SS (green), the T6SS clusters (blue), the flagellar genomic island (yellow), dispersed flagellar gene clusters (purple) and the EPS biosynthetic cluster (orange); fifth, G+C content; sixth, G+C skew. E. pyrifoliae CFBP 4174 has the same size as the amplicon for E. amylovora strains CFBP 1232 T and OR29 and Erwinia billingae LMG 2613 T , indicating that the deletion is specific for E. pyrifoliae DSM 12163 T . This mutation may have occurred prior to submission to the culture collection, or represents an environmental adaptation in this strain, possibly associated with host-pathogen interactions or ecological fitness [22][23][24].

Type III secretion system operons
Type III secretion systems (T3SSs) are widely distributed among proteobacterial pathogens of plants, animals, and humans, and constitute a fundamental virulence determinant [25]. In other bacteria, T3SSs were found to be critical for the establishment of non-pathogenic host-relationships with plants [26,27] and insects [28]. Mutations or absence of T3SS genes interfere or abolish bacterial-host interaction [29]. Generally, genes encoding elements of the T3SS machinery are conserved among bacterial species, but each system includes specific effectors targeting respective hosts. Proteins secreted by the Hrp complex of E. amylovora and other plant pathogenic bacteria are required for virulence on susceptible host plants and for the elicitation of a hypersensitive response in resistant or non-host plants [25,30]. The genome of E. pyrifoliae DSM 12163 T contains two distinct T3SSs. One T3SS is closely related to the hrp/dsp cluster of E. amylovora, while the second T3SS is more similar to the inv/spa cluster of Salmonella typhimurium [31]. The hypersensitive response and pathogenicity related region (hrp) of E. pyrifoliae DSM 12163 T is composed of 25 genes (EPYR_03319-EPYR_03343) organized in four operons encoding the T3SS machinery. The number and the arrangement of these genes are similar to the known hrp pathogenicity island found in E. amylovora [32], except for the absence of ORFU1 and ORFU2 which are located between the hrpA and hrpS genes in E. amylovora. Both ORFU1 and ORFU2 are also absent in the genome of E. tasmaniensis Et1/99 [16].
The genomic arrangement of the hrp region in E. pyrifoliae DSM 12163 T is identical to the T3SS arrangement found in E. pyrifoliae strain WT3 [33]. In E. pyrifoliae DSM 12163 T the hrp region is bordered by the hrp  effector and elicitors region (HEE, EPYR_03311-EPYR_03318) and the Hrp-associated enzymes (HAE, EPYR_03344-EPYR_03349) region. Both regions are organized identically as in E. amylovora [12], while the HAE region is missing from the genome of the nonpathogenic E. tasmaniensis Et1/99 [16]. The HEE region includes the harpin genes hrpN and hrpW, dspA/E encoding a secreted effector essential for E. amylovora virulence [34], and the chaperone genes orfA, orfB, orfC, and dspB/F. The HAE region includes hrp-associated systemic virulence genes (hsvABC), the gene hrpK encoding for a putative type III translocator [35], and two genes encoding proteins of currently unknown function.
Genes of an inv/spa system are organized in a single gene cluster (EPYR_02139-EPYR_02160) in the genome of E. pyrifoliae DSM 12163 T (Fig. 4). This cluster displays similarity with those present in E. amylovora Ea273 [36], E. tasmaniensis Et1/99 [16], and the insect endosymbiont Sodalis glossidinius str. morsitans [28]. The highest sequence and structural similarity is to the PAI-3 system of E. amylovora Ea273 [36], where all the genes are also condensed into a single cluster. In E. tasmaniensis Et1/99, the inv/spa system is composed of two separate clusters complementing themselves to form a complete injection apparatus similar to that encoded by the Salmonella typhimurium pathogenicity island 1 [16]. Compared to E. amylovora Ea273 [36], only the hypothetical orf43 was absent in E. pyrifoliae DSM 12163 T . The inv/spa system of E. pyrifoliae DSM 12163 T includes genes with no currently known function in plant pathogenicity or plant associations, and which are more related to T3SSs from endosymbionts and animal pathogens. It is proposed that E. pyrifoliae DSM 12163 T may utilize this second T3SS to facilitate vectoring associations with insects. Type VI secretion system T6SS gene clusters have recently been found to be widespread among pathogenic and non-pathogenic Gramnegative bacteria [37]. Bacteria living in close association with eukaryotic cells are purported to use the T6SS as a mechanism for maintaining pathogenic or symbiotic interactions with their hosts. T6SS gene clusters consist of a set of 14 currently recognized core genes and conserved genes that vary in composition between species [38]. Two putative effector proteins in the human pathogen Vibrio cholerae, Hcp (haemolysin co-regulated protein) and Vgr (Val-Gly repeats), are secreted by a T6SS [39]. In the plant pathogenic bacterium Pectobacterium atrosepticum SCRI1043 (syn. Erwinia carotovora subsp. atroseptica), vgr and hcp genes are up-regulated when this pathogen is grown in the presence of potato extract, suggesting a potential role in virulence [40]. Two mutants, each defective in a gene of the T6SS gene cluster, showed reduced virulence compared to wildtype P. atrosepticum in potato tuber and stem virulence bioassays [41]. In the pea symbiont Rhizobium leguminosarum, a locus linked to the T6SS was discovered that negatively affects pea nodulation [42].
The T6SS gene cluster of E. pyrifoliae DSM 12163 T is composed of 30 genes (EPYR_00645-EPYR_00674; Fig. 5A), of which 15 were identified as core genes, Figure 4 Comparison of inv/spa-type Type III Secretion Systems in E. pyrifoliae DSM 12163 T and E. tasmaniensis Et1/99. For the comparison, the annotation of the E. tasmaniensis Et1/99 clusters was manually checked. Previously denoted pseudogenes were shown to have close orthologs in E. pyrifoliae DSM 12163 T , while "missing" genes in E. tasmaniensis Et1/99 [16] could be found. Blocks of related genes are shaded grey. Putative T3SS core genes are colored green, with low homology genes in light green. Effectors are colored red, regulatory genes black and chaperones blue. Genes with no homology are in white.
four identified as conserved genes between species, two identified as putative signal transducers, and nine remaining as hypothetical genes. Conserved blocks of genes were observed among the different bacterial species, and were interspaced by hypothetical genes. Both E. pyrifoliae DSM 12163 T and E. tasmaniensis Et1/99 have closely related gene organizations with two exceptions: the genes between hcp and COG3456 are different (EPYR_00656-EPYR_00659, ETA_06210-06250 respectively) and the VgrG2 of E. pyrifoliae DSM 12163 T is more closely related to VgrG1 of E. tasmaniensis Et1/99. These genes (EPYR_00656-EPYR_00659) have a lower G +C content than the backbone of the chromosome, indicating an acquired foreign origin of these genes. The genes downstream of vgrG2 in E. pyrifoliae DSM 12163 T are homologous to genes downstream of vgrG1 in E. tasmaniensis Et1/99 (EPYR_00676, EPYR_00677, EPYR_00679, ETA_06380-06400).
The Hcp encoded in the T6SS cluster of E. pyrifoliae DSM 12163 T is 91.9% identical to its ortholog in E. tasmaniensis Et1/99. Apart from this hcp gene, E. pyrifoliae DSM 12163 T possesses two additional hcp genes on the plasmids pEP3 and pEP5 (Fig. 2). The Hcp proteins encoded on the E. pyrifoliae DSM 12163 T plasmids are more related to each other (82.5% identity) than to the Hcp proteins of the large chromosomal T6SS clusters. The related enterobacterium Serratia proteamaculans 568 has a similar gene organization for its T6SS (Fig. 5A), but has no additional genes downstream of hcp1, an additional hcp gene (hcp2), and only one vgrG gene. In comparison, Hcp1 is more closely related to the Hcp genes of E. pyrifoliae DSM 12163 T and E. tasmaniensis Et1/99 than to Hcp2. The P. atrosepticum SCRI1043 T6SS gene cluster is completely different from that in E. pyrifoliae DSM 12163 T regarding its organization [37,43].
A second set of designated T6SS-related genes was found in the genomes of E. pyrifoliae DSM 12163 T (EPYR_02109-EPYR_02113) and E. tasmaniensis Et1/99 (ETA_18657-ETA_18659) (Fig. 5B) which are absent in S. proteamaculans 568. In E. pyrifoliae DSM 12163 T , three putative core genes (EPYR_02109, EPYR_02110/ 02111, EPYR_02113) and one conserved gene were found (EPYR_02112), while in E. tasmaniensis Et1/99 one of these core genes was absent. Genes in this second set show low homology to each other and to the large T6SS clusters. A frameshift in COG3523 (EPYR_02110/02111) within this T6SS cluster of E. pyrifoliae DSM 12163 T was identified which leads to an early stop-codon in the gene. It can be deduced that this gene cluster is inactivated, and prone to mutations and deletions.

Flagellar genes
Two sets of genes encoding for flagellar assembly and chemotaxis related proteins were found in the genome of E. pyrifoliae DSM 12163 T . One set is tightly clustered (EPYR_00976-EPYR_01028) and the encoded proteins show higher identity with the corresponding proteins of Salmonella and Escherichia spp. than with those of E. tasmaniensis Et1/99 [16]. This suggests that the entire region was acquired as a genomic island via horizontal genetic transfer. Only the right boundary of the genomic island is found in E. pyrifoliae DSM 12163 T , as discernible by a reduced G+C-content bordered by an IS2-type integrase (EPYR_01040) flanking smpB (EPYR_01041), a gene also found the E. tasmaniensis Et1/99 genome (ETA_09720). The left border of the genomic island does not display distinctive insertion features. The first upstream-gene with high homology to E. tasmaniensis Et1/99 is tsx (EPYR_00974), which matches a gene that encodes a nucleoside-specific channel-forming protein in E. tasmaniensis Et1/99 (ETA_09440). Notably, in E. tasmaniensis Et1/99 the region included between ETA_09440 and ETA_09720 contains quorum-sensing genes expRI which are absent in E. pyrifoliae DSM 12163 T .
A second complete set of flagellar genes is present but split among different clusters in the genome of E. pyrifoliae DSM 12163 T (EPYR_01609-EPYR_01614, EPYR_01651-EPYR_01668, EPYR_02267-EPYR_02282, EPYR_02322-EPYR_02336). This second set closely resembles the flagellar genes found in E. tasmaniensis Et1/99 and appears to be ancestral in E. pyrifoliae DSM 12163 T . Compared to the acquired flagellar assembly apparatus, the native gene cluster contains two extra genes (EPYR_01615 and EPYR_02267) encoding FliZ, a putative alternative sigma factor regulatory protein and the putative CheV chemotaxis signal transduction protein, respectively.

Sorbitol metabolism
Metabolism of sorbitol has been described as a contributing virulence factor in E. amylovora [44] on pome fruits which utilize sorbitol as a primary carbon storage compound [45]. The genome of E. pyrifoliae DSM 12163 T contains a complete sorbitol gene cluster (EPYR_00602-EPYR_00606), and the organism is able to utilize sorbitol as a sole carbon source [19]. The nonpathogenic E. tasmaniensis Et1/99 lacks a sorbitol gene cluster in its genome [16], and may thus be disadvantaged in competitive interactions in environments rich in sorbitol. The exact role of sorbitol utilization in virulence is unresolved, since sorbitol content in apple trees has no specific effect on fire blight disease severity [46]. Associations between sorbitol and fire blight [47] suggest rather an osmotic-potential effect which could be due to various carbon compounds. Transgenic apple trees with down-regulated sorbitol synthesis compensate by increasing production of other sugars (i.e., sucrose, glucose) [45], and both of these are also substrates for E. pyrifoliae and E. amylovora which carry the respective metabolic genes.
In E. amylovora, mutation of one of the exopolysaccharide genes, wceB, impaired pathogen EPS biosynthesis and reduced virulence on apple [48]. The EPS biosynthetic gene cluster of E. pyrifoliae DSM 12163 T (EPYR_01479-EPYR_01490) is more related to the gene cluster of E. amylovora Ea7/74 [49] than that of E. tasmaniensis Et1/99 [16]. The latter has different glycosyltransferases in the central part of the cluster (WceNM; Fig. 6) and is more similar to the stewartan biosynthetic cluster found in P. stewartii subsp. stewarii DC283 [50]. Overall, proteins encoded by the EPS cluster found in E. pyrifoliae DSM 12163 T are 87.7 -96.3% identical to the corresponding proteins in E. amylovora Ea7/74, while the identities to E. tasmaniensis Et1/99 range between 68.2 and 92.8%, with no match for WceN and WceM (Fig. 6). Thus the E. pyrifoliae amylovoran genes are anticipated to have a role in virulence similar to that previously demonstrated in E. amylovora.
A second EPS produced by E. amylovora strains, levan, is also reported to contribute to virulence [51]. E. pyrifoliae DSM 12163 T does not have levan biosynthetic genes and does not produce levan, which has also been reported to be lacking in the related fire blight Erwinia sp. from Japan [9]. It is proposed that the lack of levan may contribute to the restricted host-range in these species compared to E. amylovora, although levan biosynthesis in vitro has been reported in non-pathogenic E. tasmaniensis Et1/99.

Quorum-sensing apparatus
Quorum-sensing (QS) refers to the ability of bacteria to regulate gene expression in accordance with the presence of extracellular signalling molecules, or autoinducers (AI), that are produced in a cell density-dependent manner [52]. Two principal QS systems are known in Gram-negative bacteria and are defined by the chemical nature of the AI involved [53].
The AI-1 system utilizes N-acyl homoserine lactones (AHLs), produced by the LuxI family of proteins, as signal molecules. This system was first described in the marine bacterium Vibrio fisheri [54], where it controls bioluminescence production. Above a certain concentration, the secreted AHL binds the LuxR receptor and the resulting complex activates target gene transcription by binding a specific promoter site (lux box). Several AHLs have since been described in a wide-variety of Gramnegative pathogenic and non-pathogenic bacteria, differing principally in the N-acyl side chain moiety of the AHLs, with each unique AHL signal controlled by its own pair of luxRI gene orthologs [55].
In E. pyrifoliae, proteins encoded by one gene set (EPYR_02385-EPYR_02386) have low sequence identity with the autoinducer biosynthetic and receptor proteins PhzI/PhzR of Pseudomonas chlororaphis (22 and 26% respectively) and ExpI/ExpR of P. atrosepticum SCRI 1043 (17% and 22%). However, EPYR_02386 also shows 45% identity with the DNA-binding transcriptional activator SdiA, which regulates cell division in Escherichia coli and S. typhimurium [56], and which is known to respond to AHLs generated by other microbial species. For SdiA, no cognate gene encoding a corresponding signal-generating enzyme is present in either E. coli or in S. typhimurium [57]. Thus, SdiA may have a similar activity/non-activity in E. pyrifoliae and other Erwinia species.
Given the very low identity of EPYR_02385 to known AHL synthesis protein and the failure of E. pyrifoliae DSM 12163 T to induce a positive reaction in AHL-biosensors Chromobacterium violaceum CV026 and Agrobacterium tumefaciens NTL/pZLR4 (data not shown), it is hypothesized that the gene set composed of EPYR_02385 and EPYR_02386 contributes to production and detection of a yet unknown extracellular signal not obviously related to known QS signals. A second QS system, based on production of an AI-2 signal molecule and controlled by the LuxS protein, is also widespread among Gram-negative and Grampositive bacteria [58], and putatively involved in cross-species bacterial communication [59]. LuxS, a S-ribosylhomocysteine lyase, is however also known to be a central component of the activated methyl cycle (AMC), a metabolic cycle responsible for recycling methionine and generation of the major methyl donor S-adenosyl-L-methionine (SAM) in bacterial cells [60,61]. It is thus critical that the decision of whether or not the AI-2 system is present in certain species is not only based on the presence of luxS, but also on presence and functionality of the genes coding for the AI-2 receptors [61]. E. pyrifoliae DSM 12163 T has a functional luxS gene (EPYR_03027) but it lacks orthologs for both of the two known AI-2 receptors: the LuxPQ-receptor of Vibrio harveyi [62] and the Lsr ABC-transporter of Salmonella typhimurium [63]. Thus, as in E. amylovora Ea273 [64], which also carries luxS but lacks AI-2 receptors, this gene probably has a metabolic rather than QS role in E. pyrifoliae.

Iron uptake determinants
Iron is an essential nutritional factor for plant pathogenic bacteria, which have high-affinity iron acquisition systems in order to supply this need [65]. Iron regulation of a wide range of genes is coordinated by Fur (encoded by EPYR_02663), which specifically regulates biosynthesis and uptake of iron-affinity siderophores [66]. The genome of E. pyrifoliae DSM 12163 T contains four TonB-dependent receptor genes, one of which is also a putative copper receptor (oprC; EPYR_01920).
Another TonB-dependent receptor gene (EPYR_ 03492) encodes an ortholog of the E. amylovora ferrioxamine-receptor FoxR [67]. The gene product is directly involved in uptake of ferrioxamines into the periplasm. In E. amylovora, foxR mutants retain ability to synthesize desferrioxamine E, but lose their ability to utilize this as an iron substrate, are impaired in growth under iron-limited conditions such as found on flower surfaces, have reduced virulence, and are less resistant to plant defence responses [67,68]. Adjacent to foxR, E. pyrifoliae also has a biosynthetic gene cluster dfoJAC (EPYR_03489-EPYR_03491) containing an ortholog of dfoA, one of the biosynthetic genes for desferrioxamine in E. amylovora CFBP1430 [67], indicating that E. pyrifoliae DSM 12163 T synthesizes desferrioxamine E and has an iron-acquisition system similar to E. amylovora [69]. In E. pyrifoliae DSM 12163 T , a ferrihydroxamate TonB-dependent receptor and ABC transport system fhuACDB; EPYR_00903-EPYR_00906) is also present, which may contribute to uptake of ferrihydroxamate siderophores (Stockwell et al. 2008).
A fourth TonB-dependent receptor (EPYR_01969) found in E. pyrifoliae DSM 12163 T is predicted to belong to the Transport Classification DataBase (TCDB; http://www.tcdb.org/) group TC 1.B.14.2.-, which consists primarily of heme and porphyrin uptake systems. An inactivated homolog of the TonB-dependent receptor yncD (EPYR_02731-EPYR_02732) is also present in the E. pyrifoliae DSM 12163 T genome, but is inactivated by frameshifts. The ent-fep-fec gene cluster encoding enterobactin biosynthesis and transport in many Enterobacteriaceae, is absent in the genome of E. pyrifoliae DSM 12163 T , as has been observed for E. tasmaniensis Et1/99 [16].

Comparative genomics to non-pathogenic Erwinia tasmaniensis Et1/99
The chromosome of E. tasmaniensis Et1/99 is 3,883,467 bp [16], compared to the 4,026,286 bp chromosome of E. pyrifoliae DSM 12163 T . Genomic comparison of the complete genomes of the two related bacteria was performed using a Mauve progressive alignment [70], and large gaps were examined against the annotation (Fig. 7). With the exception of the plasmids, the two genomes are essentially collinear, with only a few genomic blocks exchanged.
The difference in genome size between E. pyrifoliae DSM 12163 T and E. tasmaniensis Et1/99 is largely due to the insertion of mobile genetic elements in E. pyrifoliae DSM 12163 T . At least three regions contain phage genes, indicating insertion of prophages or remnants thereof. Additionally, a large number of genomic islands have inserted into the genome of E. pyrifoliae DSM 12163 T , as indicated by the large number of possibly active or inactivated transposases identified. Of the remaining areas, two regions encode restriction systems and one cluster carries a complete additional set of flagellar genes (EPYR_00976-EPYR_01028; see above). In contrast, E. tasmaniensis Et1/99 contains a large number of fimbrial gene clusters spread across the chromosome that are lacking in E. pyrifoliae DSM 12163 T . Additionally, the nas/nir cluster is present in E. tasmaniensis but absent in E. pyrifoliae DSM 12163 T , which explains the inability of E. pyrifoliae to generate nitrite from nitrate [7,19]. We observed that the chromosome of E. tasmaniensis Et1/99 encodes part of the central region of E. pyrifoliae plasmid pEP36 carrying the thiOSGF and the betB genes, but not the entire plasmid. E. tasmaniensis Et1/99 also carries genes for the ferric citrate uptake system fecRABCDE and for achromobactin uptake, which are all lacking in E. pyrifoliae DSM 12163 T . From the alignment (Fig. 7), it was observed that E. pyrifoliae DSM 12163 T contains the same set of T6SS genes as found in E. tasmaniensis Et1/99 but not previously reported. The incomplete inv/spa T3SS cluster, located close to the mutS gene in E. tasmaniensis Et1/99 (Fig. 4), is absent in E. pyrifoliae DSM 12163 T .
Overall, the large-scale analysis of genomic differences between E. pyrifoliae DSM 12163 T and E. tasmaniensis Et1/99 does not clearly reveal why E. pyrifoliae and not E. tasmaniensis is pathogenic. Mutational analysis of different genomic features is needed to determine their role in pathogenicity compared with non-pathogenic E. tasmaniensis. Our results also suggest that comparison of the genome sequences of E. pyrifoliae and E. amylovora, and different strains within each species, could reveal host-range determinants.

Conclusions
Compared to E. amylovora, the genome of E. pyrifoliae DSM 12163 T encodes many of the same virulence factors, including two T3SSs, sorbitol metabolism, exopolysaccharides, and desferrioxamine biosynthesis. However, in E. pyrifoliae levan production and a third T3SS cluster are absent. Whether these factors contribute to the reduced host-range of this pathogen remains to be elucidated. Comparison to the genome of the non-pathogenic E. tasmaniensis Et1/99 indicated an additional flagellar gene cluster, absence of AHL biosynthesis genes, and a modified range of iron acquisition systems which may play a role in pathogenicity. As more of the species that have been identified with fire blight-like symptoms or as epiphytes on Rosaceae [9,13,18,36,[71][72][73][74][75][76][77] are sequenced and can be compared on a wholegenome scale, further clues to the evolution and origin of necrotrophic Erwinia, and insights into host-pathogen interactions, can be anticipated.

Whole-genome sequencing
Genomic DNA was isolated using the Wizard Genomic DNA Purification Kit (Promega, Madison WI, USA).
Whole-genome sequencing was performed by GATC (Konstanz, Germany) using both a 454 GS-FLX and a Solexa sequencer.
From a single run with the 454 GS-FLX, a total of 533,966 high-quality filtered sequence reads were generated with an average read length of 232 bp (Table 2). Coverage was equivalent to 31 times. Quality filtered sequences from whole-genome shotgun-sequencing were assembled using the 454 Newbler assembler. In total, 727 contigs were generated, of which 171 contigs larger than 500 bp. The average size of the large contigs was 5,480 bp. Reassembly from the individual data sets of each of the regions on the PicoTitrePlate (Roche) generated less contigs that were longer at average, and assembly of all contig sets allowed the exclusion of bad or contaminant sequences, that mainly constituted the majority of small contigs, under the assumption that the newly generated contigs should be covered by assemblies from all data sets.
A total of 4,835,468 high-quality filtered sequence reads with an average read length of 36 bp were generated by Solexa sequencing (Table 2). Average coverage was equivalent to 43.5×. Reads were assembled using the E. pyrifoliae Ep1/96 chromosome, plasmid pEP36 and pEP2.6. All reads were additionally de novo assembled using the program EDENA [78] with manually optimized settings. This yielded 3,162 contigs (N 50 = 22,267; mean length = 1,206; maximum length = 12,612; minimum length = 100). The assembled contigs were aligned to the sequence using the SeqMan program of the LASERGENE package (DNASTAR, Madison, WI, USA).

Assembly
Assembly of the E. pyrifoliae DSM12163 T genome sequence was done in two steps. For the chromosome, GenBank accession FP236842 was used as reference. The sequences for plasmids pEP36 and pEP2.6 (AY123045, AY123048) were published [8], and used as a template for assembling these sequences. In a first step, the longer pyrosequencing reads were mapped to the available references using Roche's gsMapper application. When both sequence coverage and quality indicated a difference between the two strains with high confidence, the reference sequence was modified to reflect this change. Afterwards, both 454 and the shorter Solexa reads were mapped to the modified sequences with several custom scripts. In most cases, the Solexa reads supported the information from the 454 sequences and in some cases the Solexa data could be used to close gaps in the references where no 454 data was available. Based on the mapping information from both 454 and Solexa reads, contigs were created where coverage was high enough and at least 75% of all mapped reads agreed about a base for a given position. When no consensus base could be determined, an 'N' was inserted.
The reference sequence was also used as a template for the previously assembled contigs using SeqMan. Most contigs aligned to the template, and the nonaligned contigs were mainly due to false assembly in an earlier phase and could be resolved by realignment. The leftover contigs were checked by BlastN versus the template, and two contigs that were not covered by any part of the template were extended by several rounds of assembly versus the 454 reads until they were closed. This assembly was then confirmed by assembly using the Solexa reads, and by shifting the zero-point in the newly assembled plasmid sequences (pEP5 and pPE3).

Genome annotation
Genes were predicted using a combined strategy [79] based on the CDS prediction programs Glimmer [80] and Critica [81]. Subsequently, the potential function of each predicted gene was automatically assigned using the GenDB annotation pipeline [17]. The resulting genome annotation was curated manually, and metabolic pathways were identified using the KEGG pathways [82] tool in GenDB.

Software
Routine sequence manipulations were done using several subroutines of the LASERGENE package. Wholegenome comparisons were done using the progressive alignment option of the Mauve comparison software (Version 2.0 [70]).
For the detailed analysis of the E. pyrifoliae DSM 12163 T genome sequence, the Transport Classification DataBase (TCDB) http://www.tcdb.org/ was used for improving the automated annotation of transporters. For comparative classification of the T3SS effectors, the Pseudomonas syringae Hop database http://pseudomonas-syringae.org/Hop_database.xls was blasted on protein level versus the annotation of the E. pyrifoliae DSM 12163 T genome.
Authors' contributions THMS and SJ conducted the genome assembly. THMS performed the annotation, comparative genomics and wrote the manuscript. SJ, AG, FR and TK participated in the genome annotation, pathway identification and in writing the manuscript. JEF contributed to the project conception. BD conceived of and supervised the project and participated in writing the manuscript. All authors read and approved the final manuscript.