An improved, high-quality draft genome sequence of the Germination-Arrest Factor-producing Pseudomonas fluorescens WH6

Background Pseudomonas fluorescens is a genetically and physiologically diverse species of bacteria present in many habitats and in association with plants. This species of bacteria produces a large array of secondary metabolites with potential as natural products. P. fluorescens isolate WH6 produces Germination-Arrest Factor (GAF), a predicted small peptide or amino acid analog with herbicidal activity that specifically inhibits germination of seeds of graminaceous species. Results We used a hybrid next-generation sequencing approach to develop a high-quality draft genome sequence for P. fluorescens WH6. We employed automated, manual, and experimental methods to further improve the draft genome sequence. From this assembly of 6.27 megabases, we predicted 5876 genes, of which 3115 were core to P. fluorescens and 1567 were unique to WH6. Comparative genomic studies of WH6 revealed high similarity in synteny and orthology of genes with P. fluorescens SBW25. A phylogenomic study also placed WH6 in the same lineage as SBW25. In a previous non-saturating mutagenesis screen we identified two genes necessary for GAF activity in WH6. Mapping of their flanking sequences revealed genes that encode a candidate anti-sigma factor and an aminotransferase. Finally, we discovered several candidate virulence and host-association mechanisms, one of which appears to be a complete type III secretion system. Conclusions The improved high-quality draft genome sequence of WH6 contributes towards resolving the P. fluorescens species, providing additional impetus for establishing two separate lineages in P. fluorescens. Despite the high levels of orthology and synteny to SBW25, WH6 still had a substantial number of unique genes and represents another source for the discovery of genes with implications in affecting plant growth and health. Two genes are demonstrably necessary for GAF and further characterization of their proteins is important for developing natural products as control measure against grassy weeds. Finally, WH6 is the first isolate of P. fluorescens reported to encode a complete T3SS. This gives us the opportunity to explore the role of what has traditionally been thought of as a virulence mechanism for non-pathogenic interactions with plants.


Background
Pseudomonas fluorescens is a diverse species of bacteria that is found throughout natural habitats and associated with plants. Contributing to their diverse lifestyles is their ability to produce an equally diverse array of secondary metabolites that affect interactions with hosts and other inhabitants of their ecosystems. Some isolates benefit plants by producing growth promoting hormones or antimicrobial compounds to control against pathogens [1]. Others are deleterious and have the capacity to synthesize and secrete novel compounds that negatively affect growth of plants [2][3][4].
The physiological diversity of P. fluorescens is mirrored by its tremendous genetic diversity. However, the genetic diversity may reflect the possibility that P. fluorescens is not a single species, but rather a complex of at least two lineages. Molecular phylogenetic studies of 16 isolates suggested P. fluorescens should be represented by the P. chlororaphis and P. fluorescens lineages [5]. Alternatively or additionally, P. fluorescens may have an open pan genome [6,7]. Finished genome sequences are available for the SBW25, Pf-5, and Pf0-1 isolates of P. fluorescens [8,9]. Their genomes exceed 6.4 megabases and their relatively large sizes are not unexpected for free-living bacteria [10]. Comparative analyses of the three isolates of P. fluorescens revealed substantial variation in diversity of genome content and heterogeneity in genome organization [9]. Each genome has 1,000 to nearly 1,500 unique genes when compared to each other.
Plant-associated isolates of P. fluorescens potentially have mechanisms for interacting with plants. Many Gram-negative bacteria use a type III secretion system (T3SS) to interact with their hosts [11]. The T3SS is the most complex of the bacterial secretion systems and is typically encoded by a large cluster of genes arranged as a single superoperon. Its function is to inject type III effector proteins directly into host cells [11,12]. These type III effectors are important host-range determinants of plant pathogenic bacteria because they perturb and potentially elicit plant defenses [13].
It is unclear as to how prevalent T3SS-encoding regions are in P. fluorescens. Nearly 60% of a surveyed collection of P. fluorescens strains had a homolog of rscN, which encodes the ATPase of the T3SS [14]. However, it is not known whether all genes necessary to complete the T3SS are present in these isolates. Of the three completed genomes, genes encoding the T3SS are present only in SBW25. Several important or typically conserved genes are missing or truncated in the T3SS-encoding locus of SBW25 [15]. Despite the cryptic appearance of the T3SS, when constitutively expressing the transcriptional regulator of the T3SS, SBW25 could deliver a heterologous type III effector into plant cells, suggesting the T3SS may still be functional [15].
The role of the T3SS for the lifestyle of P. fluorescens is still unclear. In SBW25, despite the cryptic T3SS, single mutants of some but not all the remaining T3SSencoding genes were reduced in fitness in the rhizosphere of sugar beets [16]. This is not unheard of, as mutants of seemingly cryptic T3SS in pathogens are compromised in virulence [17]. However, in the case of SBW25, the T3SS mutants were also compromised in growth in vitro [16]. A T3SS mutant of the biocontrol isolate P. fluorescens KD was compromised in its ability to protect cucumbers against damping-off disease caused by Pythium ultimum [18]. This may be a result of KD requiring a functional T3SS to elicit host defenses, thereby indirectly protecting against P. ultimum or potentially as a direct mechanism against the pathogen.
We are interested in exploiting P. fluorescens for control of grassy weeds. We have previously reported the selection, isolation, and characterization of five strains of P. fluorescens that inhibit germination of seeds of grassy weeds [19]. Further characterizations led to the identification of Germination-Arrest Factor (GAF) produced by these isolates. GAF is a small, extremely hydrophilic secreted herbicide that reacts with ninhydrin and possesses an acid group, suggestive of a small peptide or amino acid analog [4,20]. In particular, the high specificity of GAF towards grasses and inhibitory activity at only certain developmental stages during seed germination provides promise for its potential as a natural herbicide for the control of grassy weeds in grass seed production and turf management settings.
We selected P. fluorescens WH6 as the archetypal GAF-producing isolate. WH6 was extracted from the rhizosphere of Poa sp. and Triticum aestivum at the Hyslop Research Farm in Benton County, Oregon, USA [19]. We sequenced and developed an improved highquality draft sequence for WH6 using a hybrid Illumina and 454-based sequencing approach. This standard is considered sufficient for our purposes of assessing gene inventory and comparing genome organization [21].
Comparative genomic analysis showed a high number of orthologous genes and strong similarity in genome organization between WH6 and SBW25. Phylogenomic analysis supported this observation and placed WH6 in the same lineage as SBW25, or the proposed P. fluorescens lineage. The high similarity in orthology and genome organization is in contrast to previous observations of P. fluorescens and in comparisons of WH6 to Pf-5 or Pf0-1 [9]. From a non-saturating Tn5-mutagenesis screen of WH6, we previously identified two mutants compromised in GAF activity (WH6-2::Tn5 and WH6-3::Tn5; [4]). Mapping of DNA sequences flanking the two mutants revealed genes encoding proteins with potential functions in regulation and biosynthesis of GAF. Finally, inspection of the WH6 genome revealed several candidate host-association mechanisms, including what appears to be a complete type III and two complete type VI secretion systems.

Results and Discussion
Sequencing and developing an improved, high-quality draft genome sequence We used an Illumina and a 454 FLX GS LR70 to sequence the genome of WH6 (Table 1). The theoretical coverage using all filtered reads was estimated to be 316× assuming a genome size of approximately 6.5 megabases. We employed a number of steps to meet the standards of an improved, high-quality draft genome sequence of WH6 for comparative purposes. We used Velvet 0.7.55, combinations of short-reads, and a variety of parameter settings to de novo assemble the short reads to generate approximately 75 different assemblies [22]. We developed and used ad hoc Perl scripts with an associated visualization tool to compare each of the different assemblies to each other. This step enabled us to eliminate entire assemblies with large contigs not supported by any other assembly.
We identified a single high-quality de novo assembly based on nearly 24 million reads from all three sequencing methods ( Table 1). The Velvet parameters were hash length of 31, expected coverage of 104, and a coverage cutoff of 20. Actual coverage of this assembly based on the total number of used reads was 65~120×, which was less than one-third the theoretical coverage. This assembly had 189 contigs greater than one kb and a total of 256 contigs greater than 100 bp. The largest contig was 264 kb and the N50 number and size were 26 contigs and 78 kb, respectively.
We used experimental and in silico approaches to improve the draft assembly by reducing the number of physical gaps. Of the 189 contigs greater than one kb, 139 contigs (74%) had significant homology to a reference sequence shared by the end of another contig. These 139 contigs potentially flanked 111 physical gaps (See Additional file 1: Table S1). We were able to amplify across 86 (77.5%) of the gaps using PCR. Physical gaps were subsequently resolved by reassembling the nearly 24 million short-reads with the 86 Sanger reads. Of the remaining scaffolds, we associated more based on in silico evidence. Some contigs shared long-range synteny to a reference genome (see below) and their ends had fifteen or more basepairs of sequence with 100% overlap to each other. This phenomenon is a result of Velvet failing to extend the contig because of low coverage. Secondly, some contigs could be paired together because their ends had partial coding regions with homology to a common reference gene. In total, nineteen more contigs were associated, resulting in a final assembly of 115 scaffolds greater than 100 bp. The largest scaffold was 814 kb and the N50 number and size were 8 and 203 kb, respectively.
The improved, high-quality draft genome sequence had 67 sequence gaps totaling 258,650 Ns. There were 45 large sequence gaps with more than 300 Ns of which eight had more than 10,000 Ns each. We presumed these were artifacts of the Velvet assembly because the fragment size of our paired-end library was no larger than 300 bp. We corrected the sizes for 31 gaps to their corresponding length found in homologous reference sequences. In the other 14 cases, we simply reduced the number of Ns in the region to 300 bp, to reflect the maximum size of our paired-end library. Both approaches to correct the size of sequence gaps were validated using PCR of randomly selected regions (data not shown). In total, we reduced the number of Ns to 6,049 or~2% of the original number of Ns.
The release of the finished genome sequence of SBW25 fortuitously coincided with our efforts of improving the draft genome sequence of WH6 [9]. We noted that nearly 90% of the homologous sequences we found in the NCBI nt dataset using our BLASTN-based approaches were to P. fluorescens SBW25. We therefore surmised that the genome of WH6 would be similar to the finished genome of P. fluorescens SBW25 and used it as a reference for Mauve Aligner to reorder the 115 WH6 scaffolds [9,23].
The genome of WH6 is presumed to be a single circular chromosome (Figure 1). A total of 53 scaffolds greater than one kb could be ordered using Mauve Aligner. The remaining 62 contigs could not be reordered and were excluded from our circular representation of the genome. These 62 contigs were all smaller than one kb and their sum total was only 13 kb. Attempts to use Pf0-1 or Pf-5 as a reference for Mauve Aligner were largely unsuccessful, supporting our observation that WH6 and SBW25 had higher synteny than previously detected in P. fluorescens and suggesting our WH6 de novo assembly was of high quality. We found no evidence of plasmids in the genome of WH6.
This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AEAZ00000000. The version described in this paper is the first version, AEAZ01000000. The WH6 genome sequence and its associated tools can also be accessed from our website at: http://changbugs.cgrb.oregonstate. edu/microbes/org_detail.html?org=WH6-G3. One challenge with de novo assembly is dealing with repeated sequences [24]. Small repeated sequences are present in genomes of P. fluorescens but were not expected to have a large effect on our ability to assemble the WH6 genome because of the size of our paired-end fragments [9]. Larger repeats, however, could not be resolved. We only observed one rRNA operon in the genome of WH6. We suspect that WH6 has five rRNA operons similar to SBW25 and Pf-5, but they collapsed into one contig. There was approximately 5× more coverage for the contig containing the one rRNA operon of WH6 compared to the other contigs. Similarly, nonribosomal peptide synthases (NRPSs) are encoded by large genes with repeated modules [25,26]. The modular domains either collapsed on one another in the assembly, or were assembled into short contigs that we could not extend. A large fraction of these partial NRPS-encoding genes were found in the small contigs that we could not reorder using Mauve Aligner. Here too, we noticed higher coverage than the other scaffolds.

Comparative and phylogenomic analyses of P. fluorescens
At a large scale, the genome of WH6 was similar to the genomes of the other P. fluorescens isolates ( Table 2). The size of the genome is slightly smaller, which may be a consequence of the draft nature of our genome assembly. Nonetheless, the 5876 predicted coding sequences (CDSs) and 89.2% coding capacity were very similar. Figure 1 Circular representation of the improved, high-quality draft genome sequence of WH6. The outer scale designates the coordinates in half million base pair increments. The red ticks indicate physical gaps. Circles 2 and 3 show the predicted coding regions of WH6 on the positive and negative strands, respectively. Coding regions are colored to highlight orthologous (blue) and 1567 unique (red) coding regions of WH6 relative to the other sequenced P. fluorescens. Circles 4, 5, and 6 show orthologs (BLASTP e-value ≤ 1 × 10 -7 ) of SBW25, Pf-5, and Pf0-1, respectively. The extent of homology relative to WH6 is depicted using a heat map of arbitrarily chosen bins; dark blue: orthologs with greater than 80% homology over the length of the gene; green: orthologs with between 60-80% homology over the length of the gene; pink: orthologs with between 20-60% homology over the length of the gene; white: no homology (less than 20% homology over the length of the gene). The positions of loci of interest are also denoted (see corresponding text for more details). Circles 7 and 8 show GC% (gold >60.6% average; gray < 60.6% average) and GC-skew.
Previous analyses of P. fluorescens found SBW25, Pf-5, and Pf0-1 to be divergent, with only~61% of the genes shared and little long-range synteny [9]. We used HAL to carry out similar analyses to determine the effect of the WH6 genome on the phylogenetic relationship of the P. fluorescens species and potential changes to the size of its pan genome [27]. HAL uses a Markov Clustering algorithm based on e-values from reciprocal allby-all BLASTP analysis to create clusters of orthologs. Core sequences from each species are concatenated and the super alignment is used in phylogenomic analysis. Using a core of 1966 translated sequences common to P. fluorescens, representative strains of P. syringae, and P. aeruginosa PAO1, HAL clustered the different species of Pseudomonas as expected [8,9,[28][29][30]. Further, HAL clearly defined two separate lineages for P. fluorescens, placing WH6 with SBW25 ( Figure 2).
Within the P. fluorescens species as presently defined, 3115 genes formed the core and represented 53%, 52.6%, 50.7%, and 54.3% of the genomes of WH6, SBW25, Pf-5, and Pf0-1, respectively ( Figure 3). This was nearly a 10% reduction relative to previous analysis of three genomes [9]. A large fraction of the core genes was assigned to categories with general cellular processes such as energy production and conversion, amino acid transport and metabolism, translation, and transcription ( Figure 4). Approximately 90% of the 3115 core genes clustered with orthologs sharing identical COG designations suggesting our automated annotation pipeline was accurate. There were some exceptions but their rarity and subtle differences did not warrant manual curation. For example, one cluster of orthologs had genes annotated as "arabinose efflux permeases" (COG2814) for genes from the published isolates of P. fluorescens but "permease of the major facilitator superfamily" (COG0477) for the ortholog of WH6.
A total of 4309 of the translated products of WH6 had an orthologous sequence in another isolate of P. fluorescens. Almost 69% of the WH6 genes had an orthologous sequence in SBW25, as compared to Pf-5 and Pf0-1 with 62% and 59%, respectively (Figures 1 and 3). We found similar levels of overlap using reciprocal BLASTP (data not shown). The 69% orthology between WH6 and SBW25 is much higher than previously observed between isolates of P. fluorescens [9]. These levels were still lower than those between different pathovars of P. syringae, which had greater than 80% orthology    [29,31,32]. Therefore, the generalization that P. fluorescens have highly variable genomes still holds true.
The genomes of WH6 and SBW25 also showed extensive long-range synteny ( Figure 5). This amount of synteny was unexpected given previous comparisons [9]. When compared to Pf-5 or Pf0-1, we found little longrange synteny, which tended to be near the origin of replication. Synteny rapidly degraded away from the origin with an increase in inversions between the genomes [9]. Taken together these lines of evidence all suggest WH6 and SBW25 to be similar and support, though perhaps prematurely, a redefinition of the P. fluorescens species [5,9].
It could be argued that the high level of synteny we found with SBW25 was an artifact of using SBW25 to reorder the WH6 scaffolds. Though we cannot exclude this possibility, we highlight several points that suggest otherwise. We used a de novo approach to assemble the genome of WH6. The long-range synteny to the SBW25 genome was observed within each and across the de novo assembled scaffolds of WH6 ( Figure 5). Furthermore, synteny with SBW25 was also supported by our

COG Functional Category
Core WH6  ability to use SBW25 to successfully and substantially reduce the number of WH6 scaffolds and improve the WH6 genome sequence (Figure 1). Finally, analysis of GC skew gave higher confidence in the reordering of WH6 scaffolds (Figure 1, track 8). Genomes often have a bias of guanine in the leading strand [33,34]. Inversions of GC skew in regions distant from the replication origins and termini are indicative of a recent recombination event [35]. Barring these events, inversions of GC skew could also potentially indicate large-scale misassemblies or incorrect reordering of contigs. For the most part, the genome of WH6 showed the expected bias of guanine in the leading strand; there are perhaps two small inversions in GC skew flanked by physical gaps between scaffolds near the terminator. Our use of SBW25 as a reference for reordering scaffolds is therefore acceptable and the observed synteny between WH6 and SBW25 appeared to reflect true similarities in genome organization.
More than 30% of the WH6 coding regions were unique (Figures 1 and 3). Examinations of their annotated functions suggested greater diversity in metabolic and hostassociation functions such as carbohydrate transport and metabolism, inorganic ion transport and metabolism, secondary metabolite biosynthesis, transport and catabolism, intracellular trafficking, secretion and vesicular transport, as well as defense mechanisms (Figure 4).
Examples of CDSs unique to WH6 and enriched in these functional categories include 35 candidate permeases of the major facilitator superfamily, a large and diverse superfamily of secondary active transporters that control movement of substrates across membranes (COG0477; [36]). WH6 also had 12 unique CDSs that encode for putative TonB-dependent receptors, involved in uptake of iron and potentially other substrates (COG1629; [37]; see also section entitled, "Regulators of gene expression"). Restriction modification (RM) systems are widespread defense mechanisms that protect prokaryotes from attack by foreign DNA [38]. RM systems are diverse and can vary dramatically in numbers. WH6 has at least 30 CDSs with annotated functions or domains common to proteins of RM systems. PFWH6_5037-5039, for example, encode for a type I RM system that appears to be unique to WH6. Finally, other CDSs unique to WH6 and of direct interest to us are described in the following sections. The greater than 1500 genes unique to WH6 were dispersed throughout its genome with only a slight bias in location closer to the terminators (Figure 1). This bias was previously generalized for P. fluorescens [9].

Mapping GAF mutants
We previously identified two WH6 mutants from a nonsaturating Tn5-mutagenesis screen for those affected in arresting the germination of Poa seeds [4]. We cloned, sequenced and mapped their flanking sequences to identify the disrupted genes. Mutant WH6-3::Tn5 had an insertion in PFWH6_3687. This CDS is annotated as a "predicted transmembrane transcriptional regulator (anti-σ factor)". Its closest homolog, with 94% similarity is PrtR encoded by P. fluorescens LS107d2 [39]. The Tn5 element had inserted at nucleotide position 417 within codon Asp139. Because loss of prtR led to a loss of GAF activity, PrtR is likely an activator rather than a repressor, as was the case in P. fluorescens LS107d2 [39].
Just upstream of prtR in WH6 is prtI, which encodes a candidate ECF σ 70 factor. This arrangement is reminiscent of many sigma-anti-sigma factor pairs and suggests that the genes are potentially co-regulated and both may have roles in regulating GAF gene expression [40]. It is peculiar that we failed to identify an insertion in prtI but one obvious explanation is that our screen was not saturating. Regardless, it will be important to examine the necessity of PrtI for GAF activity to resolve its role.
Mutant WH6-2::Tn5 had an insertion in PFWH6_5256, a gene encoding a candidate aminotransferase class III. The identification of an aminotransferase as necessary for GAF supports our previous findings suggesting that GAF contains an amino group and may be a small peptide or amino acid analog [4]. Aminotransferases are pyridoxal phosphate (PLP)-dependent enzymes that catalyze the transfer of an amino group from a donor group (commonly an amino acid) to an acceptor molecule [41]. The Tn5 element had inserted at nucleotide position 1124 within codon Lys375. Based on comparisons to the acetyl ornithine aminotransferase family the insertion is distal to the conserved residues that compose pyridoxal 5'-phosphate binding sites, the conserved residues that compose inhibitor-cofactor binding pockets, and the catalytic residue [42]. Further characterization of WH6-2::Tn5 is necessary to examine its enzymatic properties and role in biosynthesis of GAF.

Regulators of gene expression
Bacteria with large genomes tend to have complex regulatory networks to integrate and respond to a multitude of environmental signals. The extracytoplasmic function (ECF) σ 70 factors are a class of important transcriptional regulators of cell-surface signaling systems. Using a Hidden Markov Model (HMM) for ECF-encoding genes, we found 19, 26, 28, and 22 ECFs in WH6, SBW25, Pf-5 and Pf0-1, respectively [43]. Of the 19 identified in WH6, ten are part of the core set common to all four sequenced P. fluorescens isolates and included prtI and prtR, which we identified as necessary for GAF activity. Because we had previously shown that Pf-5 and Pf0-1 do not have GAF activity, these results suggest that the putative PrtI/R-regulon may be different between the different isolates of P. fluorescens [4]. Four of the 19 ECF-encoding genes were exclusive to the plant-associated strains WH6, SBW25, or Pf-5. Two of these were only shared with SBW25, of which one was rspL (see below). The other two lacked sufficient annotations to speculate on their functions. The remaining five ECFs were unique to WH6 and all are potentially coexpressed with genes encoding outer membrane receptors involved in iron perception or uptake (chuA, fhuA, and fhuE).

Virulence factors
Pseudomonads produce a wide-range of secondary metabolites with potential benefit or detriment to plants and microbes [25,44]. Many are synthesized by non-ribosomal peptide synthases (NRPS) or polyketide synthases [25,26,44]. We found evidence for several NRPS-encoding genes. Because of their modular architecture, most NRPS-encoding genes of WH6 were fragmented and found on small contigs that failed to assemble or reorder. Therefore, it was not possible to determine the structure of the repeats or infer functions based on homology. We were, however, able to identify several other candidate toxins and virulence factors (Table 3).
We identified several secretion systems in WH6 unique to host-associated bacteria and/or necessary for full virulence of pathogenic bacteria. WH6 appears to encode a complete and functional type III secretion system (PFWH6_0718-0737; Figure 6a). We named its genes according to the nomenclature first proposed for SBW25 [15,45]. There is strong homology and synteny between the T3SS-encoding regions of WH6 and P. syringae, raising the possibility of a recent acquisition of the T3SS-encoding locus by WH6, similar to KD [14]. Phylogenetic analyses of rscN, however, placed WH6 with the group 8 of biocontrol isolates of P. fluorescens (data not shown; [14]). Additionally, 15 kb of sequences on either side of the T3SS-encoding region of WH6 were highly syntenic to regions flanking the T3SSencoding region of SBW25 with the exception of the type III effector gene, ropE. Together, these data argue against a recent acquisition of the T3SS-encoding region by WH6.
There were some differences between T3SS-encoding regions of WH6 and P. syringae. The rspR, rspZ, and rspV genes of WH6 were not present and we failed to detect any homology between the rspF/hrpF, rspA/hrpA, and rspG/hrpG genes. Data, however suggest these differences likely have little to no effect on T3SS function. HrpR and HrpS are highly similar and are functionally redundant. In some Erwinia strains, HrpS by itself is demonstrably sufficient for T3SS function [46,47]. Deletion mutants of hrpZ are still functional and HrpV functions as a negative regulator of the T3SS [48][49][50]. HrpF and HrpA are homologous to each other and are structural components of the T3SS. They are the most polymorphic proteins encoded within the T3SS-cluster and the absence of significant homology between rspF and rspA to their counterparts of P. syringae was therefore not surprising [51,52]. Our automated annotation  [95] 0396 -0398 *TAT-encoding region Secretion apparatus [96,97] 4428, 2727, 0116-0120 Synthesis of mangotoxin Antimetabolite toxin [98] 2331-2334 Synthesis of hydrogen cyanide Inhibitor of cytochrome c oxidase [25] approach failed to identify rspG but upon visual inspection, we noted a small CDS that encodes a potential product of 63 amino acids. BLAST searches failed to detect homology to hrpG, but given its position in the T3SS-encoding region and similarity in size to the translated product of hrpG, we have annotated it as rspG. In total, these data support the notion that WH6 encodes a complete and functional T3SS, although, its role in the lifestyle of WH6 remains unknown.

Candidate type III effectors of WH6
We used a homology-based approach to search for type III effector genes in the genome of WH6. Our database of type III effectors included those from T3SS-using phytopathogens and some mammal pathogens. We only identified one translated sequence with homology to PipB from Salmonella, and another with homology to HopI1 from P. syringae (e-value < 1 × 10 -7 , > 33% identity; [53]). However, neither appeared to be strong candidates for a type III effector. We identified a homolog of pipB in the genome of Pf-5, which does not encode the T3SS. HopI1 encodes a J domain and its homolog in WH6 was annotated as the molecular chaperone, dnaJ [54]. These results suggest that if WH6 does encode type III effectors, they are very divergent in sequence. SBW25, in contrast, had at least five genes with

C l p V t s s G t s s F t s s E t s s C t s s B t s s A p p k A p p p A t s s M t s s K h c p V g r G f h a 1 t s s L t s s J t a g F t a g J 1
WH6 PAO1 B C A G * * * * * ** ** σ Figure 6 Schematic Representations and Comparisons of Type III and Type VI Secretion Systems. A) The type III secretion system of WH6 (top) compared to that of P. syringae (middle), and SBW25 (bottom). Co-transcribed genes and orthologous transcriptional units are colored similarly. *No detectable homology but inferred based on location in the T3SS-encoding locus. **Truncated coding regions. B) The type VI secretion system-1 of WH6 (top) compared to HSI-I of P. aeruginosa PAO1. C) The type VI secretion system-2 of WH6 (top) compared to a candidate type VI secretion system of P. syringae pv tomato DC3000 (bottom). Orthologous transcriptional units are colored similarly. For A-C, the directions of transcription are represented. Gray boxes highlight homologous regions. homology to known type III effectors, of which two were expressed to sufficiently high levels and delivered by a heterologous T3SS-encoding bacterium [55].
Computational approaches have been successfully used to identify candidate type III effectors from P. syringae, based in part on identifying a cis-regulatory element upstream of their genes and also some genes of the T3SS [56][57][58]. This so-called hrp-box is recognized by HrpL, an extracytoplasmic function (ECF) σ 70 factor encoded within the T3SS-encoding region of P. syringae [59]. We therefore used a Hidden Markov Model (HMM) trained using 38 known HrpL-regulated genes of P. syringae pv tomato DC3000 to mine the genome of WH6 for hrp-boxes [56,60].
We found 115 hrp-boxes in the genome of WH6 (bit score ≥ 3.0) but only 24 were within 500 bp of a CDS. Two were located upstream of rspF and rscR in the T3SS-encoding region, with bit scores of 7.9 and 3.2, respectively. We also identified a hrp-box upstream of rspJ but it had a lower bit score of 1.2. Fifteen of the CDSs downstream of candidate hrp-boxes had annotated functions not typically associated with type III effectors and we did not list them as possible candidates (data not shown). The remaining eight CDSs downstream of hrpboxes were annotated as hypothetical proteins and the five with the highest bit scores for their corresponding hrp-boxes were not present in the genomes of Pf-5 and Pf0-1; all but PFWH6_1942 were unique to WH6 (Table  4). Further investigation of their first amino-terminal residues indicated that three have characteristics suggestive of T3SS-dependent secretion [56,61,62].
Our two computational approaches yielded very few candidate type III effectors. One possible explanation is that because RspL and HrpL have only 50% identity (70% similarity), they recognize slightly different cis-regulatory sequences and our HMM was not adequately trained for the cis-regulatory sequence recognized by RspL. This is an unlikely scenario. Three sequences with strong similarity to the hrp-box of P. syringae were found in the T3SS-encoding regions for WH6 and SBW25 [15]. Additionally, it has been observed that all HrpL-dependent phytopathogenic bacteria share an identical motif in the hrp-box despite having as little as 52% similarity [63]. Furthermore, in σ 70 factors, DNA binding specificity is conferred by the helix-turn-helix domain 4.2 [64,65]. Domain 4.2 of the WH6 RspL is highly similar (82.5%) to the corresponding domain of HrpL. An alternative explanation is that WH6 encodes very few type III effectors with little homology to those that have been identified. This is not unheard of. P. aeruginosa for example, has only three type III effectors [66,67].

Type VI secretion systems
The type VI secretion system (T6SS) is another secretion apparatus that is common to host-associated bacteria. Computational approaches suggest the T6SS may also be in P. fluorescens [68,69]. We found evidence for two complete and functional T6SSs in WH6. We have named these two systems T6SS-1 (Figure 6b; PFWH6_5796-5812) and T6SS-2 (Figure 6c; PFWH6_3251-3270). It is not uncommon for organisms to possess multiple T6SSs that are of different lineages and acquired independently [68]. Additionally, in other strains that have been characterized, different T6SSs appear to be independently regulated, suggesting each T6SS may have functions specific to different aspects of the lifestyle of the bacterium [70]. Whether this is also the case with WH6 awaits further characterization.
T6SS-1 belongs to the group A lineage and shares homology and synteny to HSI-I of P. aeruginosa PAO1 [68]. We therefore named the corresponding genes in WH6 according to the nomenclature established for HSI-I (Figure 6b). Synteny extended beyond the T6SSencoding region and included the tagQRST genes bordering ppkA. We did not, however, find evidence for tagJ1 in WH6 [68,71]. T6SS-2 is a group B secretion system [68]. Less is understood about the group B Only coding sequences with no predicted homology or identifiable orthologs in Pf-5 or Pf0-1 are shown. *PFWH6_# unless otherwise noted;^N-terminal translocation signal scores were assigned based on "A"; >10% serine in first 50 amino acids, "B"; absence of aliphatic amino acids in position 3 or 4, and/or "C"; absence of negatively-charged amino acids in the first 12 amino acids [62].
secretion systems but T6SS-2 showed strong homology and synteny to a corresponding T6SS encoded in the genome of the phytopathogen P. syringae pv tomato DC3000 (Figure 6c; [28]).
There are few proteins that are demonstrable type VI effectors. Three homologs of VgrG and Hcp have been shown to be secreted by the T6SS but both likely have functions for the T6SS itself [72][73][74]. We found four vgrG genes, of which only one was associated with T6SS-1. The other three genes were found elsewhere in the genome. Whether products from these latter three are secreted proteins of the T6SS or merely homologous in sequence is unknown. Both T6SSs of WH6 had a homolog of hcp. Recently, three additional proteins from P. aeruginosa PAO1 were shown to be secreted by the T6SS, but their orthologs were not found in WH6 [75].

Conclusions
P. fluorescens is a genetically and physiologically diverse species found in many habitats. We sequenced the genome of the isolate WH6 because it produces Germination-Arrest Factor (GAF), an herbicide that specifically arrests seed germination of graminaceous species. Comparisons of the WH6 genome to genomes of SBW25, Pf-5, and Pf0-1 helped better define this species, with WH6 and SBW25 forming one lineage. Comparative studies revealing substantial similarity in gene inventory and synteny supported its placement and the argument of at least two major lineages of P. fluorescens [5].
With the genome sequence, we were able to deduce potential functions for two genes necessary for GAF activity. One encoded a candidate anti-sigma factor. Our previous results suggest that PrtR is an activator and suggests it has a role in regulating expression of genes necessary for GAF. The second gene encoded a candidate aminotransferase, which tentatively supports our previous speculation that GAF is a small peptide or amino acid analog. Further studies are required to confirm their functions. A less labor-intensive and saturating screen will also be necessary for a fuller understanding of the pathway controlling GAF expression and biosynthesis. The genome sequence will certainly facilitate such future endeavors.
We also identified a number of mechanisms that potentially affect plant health and some typically associated with host-associated bacteria. One of the more extensively characterized mechanisms is the type III secretion system. WH6 appears to encode the necessary repertoire of genes for a complete and functional T3SS. We also identified two T6SSs in WH6. Further studies are necessary to identify the role these secretion systems and their effectors play in the lifestyle of WH6.

Sequencing DNA flanking Tn5-insertions
To determine the sites of Tn5 insertion, genomic DNA from the two GAF mutants, WH6-2::Tn5 and WH6-3:: Tn5 was digested with BamHI or PstI, respectively. We used Southern blotting with a biotinylated probe of the Tet R gene from pUTmini-Tn5gfp to identify the fusion fragments between the Tet R gene and flanking WH6 DNA [76]. DNA fragments of corresponding size were cloned into pBluescript SK+ (Stratagene, La Jolla, CA), transformed into E. coli DH5∝, selected based on tetracycline resistance, isolated, and sequenced outwards using primers to the Tet R gene.

P. fluorescens WH6 Genome Sequencing
We used the ZR Fungal/Bacterial DNA Kit to isolate genomic DNA from P. fluorescens WH6 grown overnight in LB at 28°C (Zymo Research, Orange, CA). Purity and concentration were determined using a Nanodrop ND1000 (Thermo Scientific, Waltham, MA). For Illumina-based sequencing, we prepared the DNA according to the instructions of the manufacturer and sequenced the DNA fragments on the Illumina GA I and II using 36-cycle (4 channels) and paired-end 76cycle (1 channel) sequencing, respectively (Illumina, San Diego, CA). The Sanger and Illumina sequencing was done at the Center for Genome Research and Biocomputing Core Labs (CGRB; Oregon State University, Corvallis, OR). We also sequenced genomic DNA using the 454 FLX GS LR70 (454, Branford, CT). Preparation and sequencing by 454 was done at the Consortium for Comparative Genomics (University of Colorado Health Sciences Center, Denver, CO).

Short-read assembly
For Illumina-derived reads, the last four and six bases were trimmed from the 36 mer and 76 mer reads, respectively. We filtered out all Illumina-derived short reads that had ambiguous bases. For the paired-end reads, both reads were filtered out if one read of a pair had ambiguous bases. We used Velvet 0.7.55 to de novo assemble the reads [22]. We assembled short reads from the different sequencing platforms independently, as well as in combination. We wrote ad hoc shell scripts to test different Velvet parameters of hash length, coverage cutoff, and expected coverage. In total, we generated approximately 75 different genome assemblies of WH6. Shell scripts are available for download (http://changlab.cgrb.oregonstate.edu).

Improvements to the high-quality draft assembly
We developed ad hoc Perl scripts to use BLASTN to compare between each of the WH6 assemblies and used congruency in contigs from the various assemblies to cull those with potential misassemblies (see next section for description of scripts; data were visualized using blast_draw.pl). We used Tablet 1.10.01.28 to inspect the remaining genome assemblies for depth of coverage and potential misassemblies [77]. Finally, we used Mauve Aligner 2.3 and the genome sequence of P. fluorescens SBW25 as a reference to reorder WH6 contigs greater than 100 bp from our assembly with highest confidence [9,23]. Default settings were used for Mauve Aligner 2.3.

Physical and sequence gap closure
To identify contigs that potentially flanked a physical gap, we wrote and used Contig_end_blast_A.pl, to extract 300 bp of sequence from the ends of each contig greater than one kb in size and use the contig ends as queries in a BLASTN search against the NCBI nt database. We also wrote and used Contig_end_blast_B.pl to find contig ends that shared significant homology (evalue ≤ 0.02) to the same reference sequence but aligned to different regions no more than one kb apart. The contigs corresponding to these ends were thus predicted to be physically linked in the genome of WH6. PCR using contig-specific primers and subsequent Sanger sequencing were used to close the physical gaps (See Additional file 2: Table S2). To correct the sizes for sequence gaps larger than 300 bp, we used a similar approach. PCR was used to validate our corrections for sequence gaps.

Genome Annotation
We used a custom pipeline to annotate the improved high-quality draft assembly of WH6 as previously described [78]. The only exceptions were that we used Glimmer 3.02 rather than Glimmer 2 to predict coding regions and gene models were trained using the "longorfs" option ( [79]; http://www.cbcb.umd.edu/software/ glimmer/).

Bioinformatic analyses
For analysis of synteny, we first parsed the genomes of SBW25, Pf-5 and Pf0-1 into all possible 25 mers and identified their unique 25 mer sequences. Next, we used CASHX to align all unique 25 mers from each of three genomes to both strands of a formatted database from the WH6 genome sequence [80]. Only perfect matches were allowed. We identified the corresponding genome coordinates for each 25 mer and the matching 25 mer in the WH6 genome and used R to plot the start coordinates of each matching pair in an XY graph [81].
Phylogenomic relationships were determined using HAL ( [27]; http://aftol.org/pages/Halweb3.htm). HAL uses an all-by-all reciprocal BLASTP to create a similarity matrix from e-values. These are then used to group proteins into related clusters using a Markov Clustering algorithm. Clusters containing one protein sequence from each genome that identified each other as best hits were extracted, concatenated within each proteome, and used to infer phylogenetic relationships. Phylogenetic trees were visualized using the Archaeopteryx & Forester Java application ( [82]; http://www.phylosoft.org/archaeopteryx/).

Additional material
Additional file 1: Table S1: De novo assembled contigs of WH6 that are candidates for physical gap closure.
Additional file 2: Table S2: Sequences of oligonucleotides used to close physical gaps.