Genomic analysis and relatedness of P2-like phages of the Burkholderia cepacia complex

Background The Burkholderia cepacia complex (BCC) is comprised of at least seventeen Gram-negative species that cause infections in cystic fibrosis patients. Because BCC bacteria are broadly antibiotic resistant, phage therapy is currently being investigated as a possible alternative treatment for these infections. The purpose of our study was to sequence and characterize three novel BCC-specific phages: KS5 (vB_BceM-KS5 or vB_BmuZ-ATCC 17616), KS14 (vB_BceM-KS14) and KL3 (vB_BamM-KL3 or vB_BceZ-CEP511). Results KS5, KS14 and KL3 are myoviruses with the A1 morphotype. The genomes of these phages are between 32317 and 40555 base pairs in length and are predicted to encode between 44 and 52 proteins. These phages have over 50% of their proteins in common with enterobacteria phage P2 and so can be classified as members of the Peduovirinae subfamily and the "P2-like viruses" genus. The BCC phage proteins similar to those encoded by P2 are predominantly structural components involved in virion morphogenesis. As prophages, KS5 and KL3 integrate into an AMP nucleosidase gene and a threonine tRNA gene, respectively. Unlike other P2-like viruses, the KS14 prophage is maintained as a plasmid. The P2 E+E' translational frameshift site is conserved among these three phages and so they are predicted to use frameshifting for expression of two of their tail proteins. The lysBC genes of KS14 and KL3 are similar to those of P2, but in KS5 the organization of these genes suggests that they may have been acquired via horizontal transfer from a phage similar to λ. KS5 contains two sequence elements that are unique among these three phages: an ISBmu2-like insertion sequence and a reverse transcriptase gene. KL3 encodes an EcoRII-C endonuclease/methylase pair and Vsr endonuclease that are predicted to function during the lytic cycle to cleave non-self DNA, protect the phage genome and repair methylation-induced mutations. Conclusions KS5, KS14 and KL3 are the first BCC-specific phages to be identified as P2-like. As KS14 has previously been shown to be active against Burkholderia cenocepacia in vivo, genomic characterization of these phages is a crucial first step in the development of these and similar phages for clinical use against the BCC.


Background
The Burkholderia cepacia complex (BCC) is a group of at least seventeen species of Gram-negative opportunistic pathogens. Although these organisms can infect patients with a broad range of chronic conditions, the majority of infections occur in those with cystic fibrosis (CF) [1,2]. Because the lungs of these individuals contain thick mucus that cannot be cleared by the mucociliary escalator, they are susceptible to pulmonary infections by microorganisms such as Pseudomonas, Staphylococcus, Haemophilus and Burkholderia [3,4]. The prevalence of BCC infection in American CF patients was 3.1% in 2005 [5]. Although this prevalence is low compared to that of Pseudomonas aeruginosa (56.1% in 2005), there are three reasons why the BCC is a serious problem for the CF population [5]. First, BCC bacteria cause severe and potentially fatal respiratory infections. When compared to patients infected with Pseudomonas, those with BCC infections have reduced lung function and, depending on the species present, increased mortality [6,7]. In approximately 20% of cases, these individuals develop a rapidly fatal condition called 'cepacia syndrome,' which is characterized by lung abscesses and septicemia [2,8]. Second, BCC bacteria can spread from person-to-person. It has been shown that at least five BCC species can be transmitted in this manner: Burkholderia cepacia, Burkholderia multivorans, Burkholderia cenocepacia, Burkholderia dolosa and Burkholderia contaminans [9][10][11]. Because of the potential for these organisms to spread among a susceptible population, BCC culture-positive patients are isolated from other individuals with CF, a measure that has serious social and psychological implications [12,13]. Finally, BCC bacteria are resistant to most antibiotics. These species have a variety of resistance mechanisms including β-lactamases, efflux pumps and biofilm formation [14][15][16]. The most effective anti-BCC antibiotics -ceftazidime, meropenem and minocycline -only inhibit between 23-38% of clinical isolates [17].
Because conventional antibiotics are largely ineffective against the BCC, phage therapy is being explored as a possible alternative treatment. Phage therapy is the clinical administration of bacteriophages (or phages) to prevent and/or to treat bacterial infections [18]. Although phages have been used therapeutically for almost a century, this treatment fell out of favor in North America and Western Europe when penicillin and other chemical antibiotics became widely available in the 1940s [18]. However, there has been renewed interest in this field following the emergence of multidrug resistant bacteria such as those of the BCC [18]. Three recent studies have shown that phages are active against the BCC in vivo. Seed and Dennis showed that treatment of B. cenocepacia-infected Galleria mellonella larvae with phages KS14, KS4-M or KS12 increased survival 48 hours post-infection, even when treatment with the latter two phages was delayed for 6 to 12 hours [19]. Carmody et al. showed that intraperitoneal administration of phage BcepIL02 to B. cenocepacia-infected mice decreased bacterial density in the lungs and led to decreased expression of the pro-inflammatory cytokines MIP-2 and TNF-α [20]. Lynch et al. published the first description of an engineered BCC phage and showed that this mutant (a repressor knockout of phage KS9) was able to increase survival of B. cenocepacia-infected G. mellonella 48 hours post-infection [21].
Before a phage can be safely used clinically, its complete genome sequence must be determined to assess whether the phage is obligately lytic or temperate, and to determine by homology whether the phage genome encodes any putative virulence factors. This report describes the genome sequence of three novel BCC phages and their relatedness to enterobacteria phage P2. P2 is a temperate myovirus that was isolated from E. coli strain Li by Bertani in 1951 [22]. P2 has recently been classified as part of a novel subfamily, placing it in the order Caudovirales, family Myoviridae, subfamily Peduovirinae and genus "P2-like viruses" [23]. This genus includes phages P2, W , 186 and PsP3 of enterobacteria, L-413C of Yersinia, Fels-2 and SopE of Salmonella, -MhaA1-PHL101 of Mannheimia, CTX of Pseudomonas, RSA1 of Ralstonia, E202 of Burkholderia thailandensis and 52237 and E12-2 of Burkholderia pseudomallei [23]. Based on sequence analysis, it is proposed that the BCC-specific phages KS5 (vB_BceM-KS5 or vB_BmuZ-ATCC 17616), KS14 (vB_BceM-KS14) and KL3 (vB_BamM-KL3 or vB_BceZ-CEP511) should also be classified as part of this genus [24].
KL3 was isolated from a single plaque on a lawn of B. cenocepacia CEP511, an Australian CF epidemic isolate [29]. Phage induction from CEP511 was stochastic, as treatment with inducing agents such as UV or mitomycin C was not necessary. On LMG 17828, KL3 forms small turbid plaques 0.5-1.0 mm in diameter. KL3 has a narrow host range, infecting B. ambifaria LMG 17828.
Electron microscopy of KS5, KS14 and KL3 indicates that these phages belong to the family Myoviridae (Figure 1). These three phages exhibit the A1 morphotype, with icosahedral capsids and contractile tails [30]. KS5, KS14 and KL3 have similarly sized capsids, each 65 nm in diameter (Figure 1). In contrast, their tails vary in length: 140 nm for KS14, 150 nm for KS5 and 160 nm for KL3 (Figure 1). These sizes correspond to the length of the tail tape measure protein for each of these three phages: KS14 gp12 is 842 amino acids (aa) in length, KS5 gp15 is 920 aa and KL3 gp17 is 1075 aa (Tables 1,  2 and 3).

Genome characterization KS5
The KS5 genome is 37236 base pairs (bp) in length and encodes 46 proteins (including the transposase of a predicted insertion sequence, discussed below) ( Table 1). This genome has a 63.71% G+C content. Forty-three of the start codons are ATG, 2 are GTG and 1 is TTG ( Table 1). As KS5 was isolated from an environmental sample, it was predicted that this phage might be obligately lytic [25]. However, KS5 encodes an integrase and a repressor and is found as a prophage in chromosome 2 of the fully sequenced B. multivorans strain ATCC 17616 (GenBank:NC_010805.1; BMULJ_03640-BMULJ_03684, bp 477496-514731) ( Table 1). Because of this similarity, the possibility exists that KS5 originated from ATCC 17616 or a closely related strain found in the soil enrichment. Excluding the ATCC 17616 prophage, KS5 is most similar to a putative prophage element in Burkholderia multivorans CGD1. Twenty-three of 46 KS5 proteins are most closely related to a protein from CGD1, with percent identities ranging from 72-99% (Table 1).

KS14
The KS14 genome is 32317 bp in length and encodes 44 proteins (Table 2). This genome has a 62.28% G+C content. Forty-one of the start codons are ATG, 2 are GTG and 1 is TTG ( Table 2). All predicted KS14 proteins show similarity to at least one protein in the database (as determined by a BLASTP search) except for gp38 and gp42. The protein with the most similarity to others in the database is the terminase large subunit, gp35, which has 75% identity with a protein of unknown function DUF264 of Burkholderia sp. CCGE1001. Aside from gp38 and gp42, the least similar protein is the hypothetical protein gp39, which has 29% identity with the flagellar hook-associated protein FlgK of Acidovorax ebreus TPSY (Table 2).

KL3
The KL3 genome is 40555 bp in length and encodes 52 proteins (Table 3). This genome has a 63.23% G+C content. Fifty-one of the start codons are ATG and 1 is GTG (Table 3). Similarly to KS14, all predicted KL3 proteins show similarity to at least one protein in the database except for gp43. The proteins with the most similarity to others in the database are the terminase large subunit (gp41) and the portal protein (gp42) that have 99% identity with Burkholderia glumae BGR1 proteins and the hypothetical protein gp50 which has 99% identity with a B. ambifaria MEX-5 protein. Aside from gp43, the least similar protein is the hypothetical protein gp14, which has 50% identity with the hypothetical protein BuboB_27112 of Burkholderia ubonensis Bu (Table 3).

Similarity to P2
KS5, KS14 and KL3 all show similarity to enterobacteria phage P2 [GenBank:NC_001895.1]. A four-way comparison of the P2, KS5, KS14 and KL3 genomes prepared using PROmer/MUMmer/Circos is shown in Figure 3. In this comparison, regions of similarity on the same strand are shown in green, while regions of similarity on the opposite strand are shown in red. The majority of similar regions among these phages are on the same strand, except for a short conserved region in KS5 and KL3 containing DNA methylase genes (KS5 20 and KL3 28, discussed below) on the minus strand in KS5 and on the plus strand in KL3 (Tables 1 and 3 (Table 4). In addition, KS5 gp8 and KL3 gp9 are similar to Ogr (transcriptional activator), KS5 gp28 is similar to Old (phage immunity protein), KS14 gp17 is similar to G (tail fiber assembly protein) and KS14 gp26/gp25 and KL3 gp32/gp31 are similar to LysBC (Rz/Rz1-like lysis proteins, discussed below) ( Table 4). The percent identity of the similar proteins ranges from 25-64% in KS5, 24-64% in KS14 and 31-62% in KL3 ( Table 4).
The genes in common between P2 and the P2-like BCC phages are almost exclusively limited to structural genes involved in virion formation (Table 4). Other P2 genes, such as those involved in DNA replication, phage immunity, lysogeny and lysis are dissimilar among these phages. A similar pattern is observed (with some exceptions) following CoreGenes analysis of the P2-like phages E202 of B. thailandensis and 52237 and E12-2 of B. pseudomallei (data not shown) [23]. A likely explanation for this pattern is that, while phage structural components predominantly interact with each other, components from other phage systems may interact with host-specific proteins (such as those involved in transcription and DNA replication) [31,32]. KS5, KS14 and KL3 appear to have retained P2 modules for the closely interacting capsid and tail proteins, while acquiring new modules for carrying out Burkholderia hostspecific processes. These genes replace P2 genes at the right end of the P2 genome (the TO-region), P2 Z/fun (the Z-region) and P2 orf30 (Table 4) [33]. As it is very common for genes not found in P2 to be identified in these three regions in other P2-like phages, it is predicted that these loci contain genes that have been acquired via horizontal transfer [33].
Although a phage may show relatedness to a wellcharacterized phage such as P2, specific guidelines must be used to determine both the degree of relatedness of two phages and if the novel phage can be classified as a "P2-like virus" in a strict taxonomic sense. Lavigne et al. proposed the use of the comparison program CoreGenes to aid in phage taxonomic analysis [34]. This program can be used to compare the proteomes of two or more phages [34]. If a phage shares at least 40% of its proteins (those with a BLASTP score ≥ 75) with a reference phage such as P2, then these two phages can be considered as part of the same genus, while if it shares 20-39% of its proteins with a reference phage, then they can be considered as part of the same subfamily [34]. When KS5, KS14 and KL3 were analyzed with CoreGenes using P2 as a reference genome, the percentage of proteins in common with respect to P2 were 51.16%, 53.49% and 53.49%, respectively. These are similar to the percentages for E202 (55.81%), 52237 (51.16%) and E12-2 (48.84%) [23]. Based on these results, KS5, KS14 and KL3 can be classified as members of the Peduovirinae subfamily and "P2-like viruses" genus [23].

Integration site characterization
In E. coli, P2 is able to integrate at over 10 different loci, but certain sites may be used more commonly than others [35]. None of the three P2-like BCC phages characterized here were found to integrate into a locus similar to that of P2. Only KL3 was found to have a previously characterized integration site. Following PCR amplification and sequencing from the B. cenocepacia CEP511 chromosome (where KL3 is carried as a prophage), it was determined that, like many phages, KL3 integrates into a tRNA gene. Specifically, it integrates into the middle of a threonine tRNA gene: bp 1 of the  [36][37][38]. KL3 integration should not affect threonine tRNA synthesis as bp 1-45 of KL3 has an identical sequence to bp 32-76 of the tRNA gene. In both B. multivorans ATCC 17616 and B. cenocepacia C6433, KS5 integrates into the 3' end of an AMP nucleosidase gene. AMP nucleosidases convert AMP into adenine and ribose 5-phosphate [39]. This gene has not been previously identified as a phage integration site. KS5 bases 1-815 (including the integration site and the integrase gene sequence) show similarity to sequences encoding pairs of adjacent AMP nucleosidase and integrase genes in several Burkholderia genomes. For example, in B. pseudomallei K96243 chromosome 2, the AMP nucleosidase (BPSS1777) and integrase (BPSS1776) genes are adjacent to genes annotated as encoding a putative phage capsid related protein (fragment) (BPSS1775) and putative phage-related tail protein (fragment) (BPSS1774A). Similarly, in B. pseudomallei 1106a chromosome 2, the AMP nucleosidase (BURPS1106A_A2416) and integrase (BURP-S1106A_A2415) genes are adjacent to genes annotated as encoding a phage portal domain protein (BURP-S1106A_A2414) and phage tail completion protein (BURPS1106A_A2413). The identification of phage related genes at this site in other Burkholderia genomes suggests that the AMP nucleosidase gene may be a conserved integration site among some Burkholderia-specific temperate phages.
KS14 is different from other P2-like phages in that it does not encode a tyrosine integrase. Most temperate phages use a tyrosine recombinase (or, in rare cases, a serine recombinase) to facilitate recombination between the phage attP site and the host attB site [40]. KS14 encodes a serine recombinase (gp6), but this protein is unlikely to mediate prophage integration for three reasons. First, gp6 is more closely related to invertases such as Mu Gin (49% identity, E-value: 8e -44 ) and P1 Cin (49% identity, E-value: 7e -43 ) than to integrases such as those from Streptomyces lividans phage C31 (29% identity, E-value: 1.2) and Mycobacterium smegmatis phage Bxb1 (29% identity, E-value: 3e -4 ) [41][42][43][44]. Second, gp6 lacks the conserved cysteine-rich and leucine/isoleucine/valine/methionine-rich regions found in other serine integrases [45]. Third, gp6 is only 225 aa in length, which is substantially smaller than the serine integrases that are typically between 450-600 aa in length [45]. We did not believe KS14 to be obligately lytic because it encodes a putative repressor protein (gp5) and because previously collected KS14-resistant C6433 isolates were predicted to be lysogenized based on PCR-positivity with KS14-specific primers (Figure 4) [19].
Phages such as P1, P7 and N15 of enterobacteria, 20 of Bacillus anthracis, BB-1 of Borrelia burgdorferi, LE1 of Leptospira biflexa, pGIL01 of Bacillus thuringiensis and pKO2 of Klebsiella oxytoca lysogenize their hosts as plasmids [46][47][48][49][50][51][52][53]. Because KS14 gene 39 encodes a putative ParA protein (involved in partitioning in other plasmid prophages), we predicted that the KS14 prophage might exist as a plasmid [54,55]. To test this hypothesis, we used a standard protocol for the QIAprep Spin Miniprep plasmid isolation kit with cells of C6433 (a KS14 host), ATCC 17616 (a KS5 lysogen), CEP511 (a KL3 lysogen), K56-2 (a lysogen of KS10, a previously characterized BCC-specific phage) and five putatively lysogenized KS14-resistant C6433 isolates [19,56]. These preparations were then treated with EcoRI and the resulting fragments were separated using agarose gel electrophoresis. For each of the four control strains, no distinct bands were observed ( Figure 5, left). In contrast, preparations from each of the five putatively lysogenized strains contained identical distinct bands ( Figure 5, right). Furthermore, these bands were the same size as those predicted and observed for an EcoRI digest of KS14 DNA (with predictions based on a circular genome sequence) ( Figure 5, far right) and sequences from selected bands matched the KS14 genome sequence. Based on these results, we predict that KS14 is a temperate phage that, in contrast to other P2-like phages, lysogenizes host strains as a plasmid.
It is important to note that, although one of these phages has been shown to be active in vivo, temperate  Figure 4 Detection of lysogeny in KS14-resistant B. cenocepacia C6433 isolates [19]. Bacterial genomic DNA was amplified using KS14-specific primers. phages are generally considered to be suboptimal for use in a phage therapy protocol [19,21]. In contrast to obligately lytic phages, temperate phages are associated with superinfection immunity, lysogenic conversion and specialized transduction [reviewed in] [21]. In a previous study, we have shown that temperate BCC-specific phages can be engineered to their lytic form by inactivating the repressor gene [21]. This strategy could potentially be used with the three phages described here, thus making them more appropriate candidates for clinical use.

Morphogenesis genes
As discussed above, the KS5, KS14 and KL3 structural genes are related to those from P2 and function to construct a P2-like myovirus with a contractile tail. The only virion morphogenesis genes of P2 that these phages lack are G (encoding the tail fiber assembly protein, missing in KS5 and KL3) and H (encoding the tail fiber protein) ( Table 4). Because the tail fibers are involved in host recognition, it is expected that these proteins would be dissimilar in phages infecting E. coli and those infecting the BCC.
A commonly identified characteristic in tailed phages is the expression of two tail proteins from a single start codon via a translational frameshift [57]. These proteins (encoded in a region between the genes for the tail tape measure and the major tail protein) share the same Nterminus but have different C-termini due to stop codon readthrough in the -1 frame [57]. In P2, this -1 frameshift occurs at a TTTTTTG sequence and produces the 91 aa protein E and the 142 aa protein E+E' from the same translational start site ( Figure 6) [57,58]. KS5, KS14 and KL3 all encode proteins similar to both E and E+E' with percent identities ranging from 49-59% (Table 4). Despite the relatively low degree of similarity, the P2 frameshift site appears to be conserved amongst these phages, suggesting that they likely use a similar frameshifting mechanism ( Figure 6). In rare cases, RNA secondary structure can be identified downstream of the phage frameshift sequence [21,57]. When the KS5, KS14 and KL3 E+E' sequences 60 bp downstream of the TTTTTTG sequence were screened for secondary structure, no predicted hairpins were identified (data not shown). This result was anticipated based upon the absence of these structures in the P2 phage E+E' gene [57].

Lysis genes
In P2, the lysis module consists of five genes: Y (holin), K (endolysin), lysA (antiholin), lysB (Rz) and lysC (Rz1) [59,60]. The P2-like BCC phages are predicted to encode endolysins, holins and antiholins that are dissimilar to those of P2 (Table 4). KS5 gp33, KS14 gp27 and KL3 gp33 are putative endolysins as they all have the conserved domain pfam01471 (PG_binding_1, putative peptidoglycan binding domain; E-values: 3e -11 , 3e -10 and 9e -10 , respectively) and show similarity to other phage endolysins. P2 Y is a type I holin with three transmembrane domains [61]. Although KS5 34, KS14 28 and KL3 34 are dissimilar to P2 Y, it is predicted that these three genes encode holins because they are each immediately upstream of a putative endolysin gene and they each encode proteins that a) have three transmembrane domains based on OCTOPUS analysis and b) show similarity to other phage holins.
Antiholins such as P2 LysA inhibit holin activity and delay lysis of infected cells in order to optimize the phage burst size [59,62]. Although some phages such as λ express antiholins from a second translational start site two codons upstream of the holin start codon, phages such as P2 and O1205 of Streptococcus thermophilus encode an antiholin from a separate gene [63,59,64]. The location of the putative antiholin genes KS5 35, KS14 29 and KL3 35 is similar to that in O1205, in which the holin and antiholin genes are adjacent immediately upstream of the endolysin gene (as opposed to P2, in which gene K separates Y and lysA) [64,59]. Based on OCTOPUS analysis, KS5 gp35 has three transmembrane domains, while KS14 gp29, KL3 gp35 and P2 LysA have four. Based on gene organization and protein transmembrane structure, it is predicted that the P2-like BCC phages have separate antiholin genes in their lysis modules.
P2 encodes two proteins, LysB and LysC, that are predicted to function similarly to λ Rz and Rz1 [60]. Rz is an inner membrane protein with an N-terminal transmembrane domain and Rz1 is a proline-rich outer membrane lipoprotein [65]. Rz/Rz1 pairs fuse the inner and outer membranes following holin and endolysin activity and facilitate phage release [65]. The P2 lysC start codon is in the +1 frame within the lysB gene, while the lysC stop codon is out of frame in the downstream tail gene R [66]. In contrast, the Rz1 gene in λ is entirely contained within the Rz gene [67]. KS14 and KL3 LysBC pairs (gp26/gp25 and gp32/gp31, respectively) are similar to that of P2 (Table 4). In KS14 and KL3, the lysC genes start approximately 160 bp upstream from the lysB stop codon and extend into the first 8 bp of R (gene 24 in KS14 and 30 in KL3) ( Figure  7). Both KS14 and KL3 LysC proteins are predicted to have a signal peptidase II cleavage site between positions 20 (alanine) and 21 (cysteine). Signal peptidase II cleavage would produce a 72 aa lipoprotein with 7 prolines (9.7% proline) for KS14 LysC and a 74 amino acid lipoprotein with 7 prolines (9.5% proline) for KL3 LysC.
In contrast to the P2-like lysBC gene organization found in KS14 and KL3, the KS5 genes 32/31 have a similar organization to λ Rz/Rz1. KS5 Rz1 is encoded in the +1 frame within the Rz gene (Figure 7). It is predicted to have a signal peptidase II cleavage site between positions 18 (alanine) and 19 (cysteine), which would produce a 46 amino acid lipoprotein with 12 prolines (26.1%). The differences in both gene organization and proline content between the P2-like KS14 and KL3 LysC proteins and the λ-like KS5 Rz1 protein suggest that KS5 may have acquired genes 31 and 32 -and potentially the entire lysis module -through horizontal transfer from a phage similar to λ.
Sequence elements unique to KS5 and/or KL3 Insertion sequences Insertion sequences (ISs) are short genetic elements that can insert into nonhomologous regions of DNA [68]. These elements, comprised of a transposase gene and inverted repeats, create flanking direct repeats following insertion [68]. Many mutants of well-characterized phages have been found to carry ISs, including λ and Mu [69,70]. However, it is relatively rare for wildtype phages to carry ISs because they can interfere with gene expression [71]. Sakaguchi et al. determined the genome sequence of the Clostridium botulinum phage c-st and determined that it carries 12 ISs (5 of which are incomplete) [71]. Of the 284 genomes sequenced at the time, one IS was found in each of eight phages: Burkholderia phages E125 and Bcep22, enterobacteria phages P1 and HK022, Lactobacillus phages AT3 and LP65, Rhodothermus phage RM378 and Shigella phage Sf6 [71].
A novel insertion sequence (named ISBmu23 in vB_BmuZ-ATCC 17616) is found in the KS5 genome between gene 12, encoding a membrane protein and gene 13, encoding the tail protein D (Table 1). This IS does not appear to disrupt any putative ORFs and so may not have any significant effect on phage gene expression. ISBmu23 is 1210 bp in length and contains two imperfect 16 bp inverted repeats (Table 1, Figure  8). In KS5, it is flanked by two copies of a 5 bp direct repeat, CCTAA. ISBmu23 encodes a 330 aa transposase that has the conserved domain COG3039 (transposase and inactivated derivatives, IS5 family; E-value: 8e -29 ). This protein is most similar to the transposase of ISBmu2 (85% identity), an IS5-like IS present in nine copies in ATCC 17616 [72]. ISBmu2 and ISBmu23 are very similar as they a) are present in the same genome, b) are both 1210 bp in length, c) encode similar 330 aa transposases, d) have similar 16 bp inverted repeats (the right inverted repeats of ISBmu2 and ISBmu23 are identical, while the left repeats differ by 3 bp) and e) preferentially integrate into CTAA sequences (Figure 8). Ohtsubo et al. found that the transposition of ISs in ATCC 17616 increased when the cells were grown at high temperatures [72]. Because these temperatures are similar to what the cell may encounter during infection of an animal or human, it is suggested that this change may provide a selective advantage to ATCC 17616 by modifying its genome under in vivo conditions [72]. Further experiments are required to determine if ISBmu23 transposition is affected by temperature and if this IS may provide a selective advantage to KS5 lysogens in vivo.

Reverse transcriptases
Reverse transcriptases (RTs) are RNA-dependent DNA polymerases most commonly associated with retroviruses and retrotransposons [73]. RTs have also been identified in several phage genomes, including those of P2-like phages [74][75][76]. One function of these proteins was extensively characterized in Bordetella bronchiseptica phage BPP-1. This phage has the ability to change its host range by making amino acid substitutions in its tail fiber protein, Mtd (major tropism determinant) [77]. This switch requires the phage-encoded RT Brt (Bordetella RT) that synthesizes a DNA copy of a 134 bp locus (the template repeat, TR) that has 90% identity with a 134 bp region of the mtd gene (the variable repeat, VR) [77,74]. Adenines in the reverse transcribed copy of TR are mutagenized and the altered DNA integrates or recombines at VR by an unknown mechanism, generating a tail fiber gene with multiple base substitutions [74,75].
A second function associated with phage RTs is phage exclusion. In Lactococcus lactis, expression of the putative RT AbiK lowers the efficiency of plating of infecting phages by an unknown mechanism (potentially involving single-strand annealing recombinases) [78]. Expression of Orf570, an RT identified in the P2-like enterobacteria prophage P2-EC30, was found to inhibit T5 infection of E. coli [76]. When a region of Orf570 containing an RT conserved motif was deleted, T5 infection was no longer inhibited [76].
KS5 encodes a putative RT, gp44. This protein has the conserved domain cd03487 (RT_Bac_retron_II, reverse transcriptases in bacterial retrotransposons or retrons; E-value: 2e -45 ). It is unlikely that gp44 and Brt have the same function: the two proteins show minimal similarity (21% identity, E-value: 7e -4 ), gene 44 is located distal to the tail fiber gene (in contrast to brt and mtd), neither nucleotide substitutions in the tail fiber gene nor variations in KS5 tropism were observed and no repeated sequences were identified in the KS5 genome longer than 28 bp [77]. When compared to Orf570, gp44 shows almost no relatedness (41% over 12/546 amino acids; E-value: 2.7) but is found at the same locus (in the prophage, both orf570 and 44 would be located proximal to the portal protein gene Q). Further experiments are required to determine if the KS5 RT is involved in tropism modification, phage exclusion or some uncharacterized function.

DNA methylation, restriction and repair
DNA methylase and endonuclease genes are commonly found in phage genomes. Methylases modify the DNA such that it becomes resistant to bacterial restriction systems [79]. Although P2 does not encode any putative methylases, such proteins are encoded by both KS5 and KL3 (KS5 gp20 and KL3 gp28 and gp47) (Tables 1 and  3). All three methylases are predicted to belong to the AdoMet_MTase superfamily (cl12011; S-adenosylmethionine-dependent methyltransferases). KS5 gp20 is most similar to a DNA methylase N-4/N-6 domain protein of B. ambifaria MEX-5 (89% identity). KL3 gp28 is most similar to a site-specific DNA methyltransferase of B. pseudomallei K96243 (78% identity). Both of these proteins have the conserved domain pfam01555 (N6_N4_Mtase, DNA methylase; KS5 gp20 E-value: 5e -22 , KL3 gp28 E-value: 4e -25 ). Because this domain is associated with both N-4 cytosine and N-6 adenine methylases, these proteins may have either cytosine or adenine methylase activity [80]. KL3 gp47 shows similarity to a modification methylase EcoRII from several bacterial species, with E-values as low as 4e -114 . This protein has the conserved domain cd00315 (Cyt_C5_DNA_methylase, Cytosine-C5 specific DNA methylases; E-value: 6e -68 ) and so can be classified as a cytosine-C5 methylase. KS5 gp20 and KL3 gp28 are likely involved in protecting the phage DNA from BCC restriction systems. As discussed below, the function of KL3 gp47 is likely to protect the phage DNA from a phage-encoded restriction enzyme.
Phage nucleases have a number of functions, including degradation of the bacterial DNA (to both inhibit the host and provide nucleotides for the phage), phage exclusion and DNA processing [81]. KL3 encodes two endonucleases, gp45 and gp46. Gp45 is most similar to a type II restriction endonuclease, EcoRII-C domain protein of Candidatus Hamiltonella defensa 5AT (Acyrthosiphon pisum) (77% identity). This protein has the conserved domain pfam09019 (EcoRII-C, EcoRII C terminal; E-value: 6e -65 ). Gp46 is most similar to a DNA mismatch endonuclease Vsr of Burkholderia graminis C4D1M (77% identity). This protein has the conserved domain cd00221 (Vsr, Very Short Patch Repair [Vsr] endonuclease; E-value: 9e -38 ).
The organization of genes 45-47 (encoding an EcoRII-C endonuclease, Vsr endonuclease and EcoRII methylase, respectively) in a single module suggests that the proteins that they encode are functionally related. The EcoRII-C endonuclease (which has a CCWGG recognition sequence where W = A or T) is likely to degrade either bacterial DNA to inhibit the host during the KL3 lytic cycle or superinfecting phage DNA [81,82]. KL3 DNA would be protected from this cleavage by EcoRII methylation at the second position in the EcoRII-C recognition sequence (forming CC m WGG where C m = 5-methylcytosine) [83]. Expression of the Dcm methylase, which has an identical recognition sequence and methylation site as EcoRII methylase, is mutagenic in E. coli because 5-methylcytosines are deaminated to thymines, causing T/G mismatches [84,85]. EcoRII methylase expression would presumably cause mismatched sites in KL3 with the sequence C(T/G)WGG. In E. coli, these mismatches are repaired by very short patch (VSP) repair which starts with the recognition and nicking of the sequence C(T/G)WGG by a Vsr endonuclease [86]. As KL3 expresses a Vsr endonuclease, it could repair post-methylation T/G mismatches using the same mechanism.
The proposed model for methylase and endonuclease interaction during the KL3 lytic cycle is as follows. Unmethylated host DNA (or, alternatively, superinfecting phage DNA) is degraded by gp45. KL3 DNA is protected from gp45 degradation by gp47-mediated conversion of cytosine to 5-methylcytosine. These 5methylcytosine bases are deaminated to thymine, but the resulting T/G mismatches are cleaved by gp46 and fixed using VSP repair. Although further experiments are required to test the validity of this model, KL3 appears to encode an elegant system for degradation of bacterial or superinfecting phage DNA, protection of the phage genome and repair of resulting mutations.

Conclusions
This study is the first to identify and characterize P2like phages of the BCC. Like other previously characterized P2-like Burkholderia phages, KS5, KS14 and KL3 share structural genes with P2 but encode dissimilar accessory proteins. KS5, a 37236 bp prophage of B. multivorans ATCC 17616, integrates into an AMP nucleosidase gene, has a λ-like Rz/Rz1 cassette, carries an ISBmu2-like insertion sequence and encodes a reverse transcriptase. KS14, a 32317 bp phage previously shown to be active against B. cenocepacia in vivo, encodes a serine recombinase but is maintained as a plasmid prophage [19]. KL3, a 40555 bp prophage of B. cenocepacia CEP511, integrates into a threonine tRNA gene and encodes a series of proteins capable of degrading bacterial or superinfecting phage DNA, methylating the phage genome and repairing methylation-induced mismatches. As KS14 has already been shown to be active in vivo, characterization of these three related phages is an important preliminary step in the development of a phage therapy protocol for the BCC.

Methods
Bacterial strains and growth conditions BCC strains used for phage isolation and propagation were obtained from Belgium Coordinated Collection of Microorganisms LMG Bacteria Collection (Ghent, Belgium) and the Canadian Burkholderia cepacia complex Research and Referral Repository (Vancouver, BC). Many of the strains used are from the Burkholderia cepacia complex experimental strain panel and updated experimental strain panel [29,87]. Strains were grown aerobically overnight at 30°C on half-strength Luria-Bertani (LB) solid medium or in half-strength LB broth with shaking. Transformations were performed with chemically-competent DH5α (Invitrogen, Carlsbad, CA), plated on LB solid medium containing 100 μg/ml ampicillin and grown aerobically overnight at 37°C. Strains were stored in LB broth containing 20% glycerol at -80°C .

Electron microscopy
To prepare samples for transmission electron microscopy, phage lysates were filter sterilized using a Millex-HA 0.45 μm syringe driven filter unit (Millipore, Billerica, MA), incubated on a carbon-coated copper grid 5 minutes at room temperature and stained with 2% phosphotungstic acid for 2 minutes. Micrographs were taken with the assistance of the University of Alberta Department of Biological Sciences Advanced Microscopy Facility using a Philips/FEI (Morgagni) transmission electron microscope with charge-coupled device camera at 140,000-fold magnification.

Phage isolation, propagation and DNA isolation
Isolation of KS5 from onion soil and KS14 from Dracaena sp. soil has been described previously [25,19]. KL3 was isolated from a single plaque on a lawn of B. cenocepacia CEP511. The plaque was isolated using a sterile Pasteur pipette, suspended in 1 ml of suspension medium (50 mM Tris-HCl [pH 7.5], 100 mM NaCl, 10 mM MgSO 4 , 0.01% gelatin solution) with 20 μl CHCl 3 and incubated 1 hour at room temperature to generate a KL3 stock. KL3 was propagated on B. ambifaria LMG 17828 in soft agar overlays: 100 μl of phage stock and 100 μl of liquid culture were incubated 20 minutes at room temperature and 3 ml of soft nutrient agar was added to this mixture, poured onto half-strength LB solid medium and incubated overnight at 30°C. Phage genomic DNA was isolated using a modified version of a λ proteinase K/SDS lysis protocol [88]. Half-strength LB agarose plates (prepared with soft nutrient agarose) showing confluent phage lysis were overlaid with 3 ml of suspension media and incubated for 6 hours at 4°C on a platform rocker. The lysate was pelleted by centrifugation at 10 000 × g for 2 minutes and filter-sterilized using a 0.45 μm filter. 10 ml of lysate was treated with 10 μl DNase I/10 μl DNase I buffer and 6 μl RNase I (Fermentas, Burlington, ON) and incubated 1 hour at 37°C. Following addition of 0.5 M EDTA (pH 8.0) to 20 mM, proteinase K to 50 μg/ml and SDS to 0.5%, the solution was mixed and incubated 1 hour at 37°C. Standard phenol:chloroform extraction and ethanol precipitation were then used to purify the phage DNA. Samples were resuspended in TE (pH 8.0) and quantified using a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, Waltham, MA).

Sequencing and bioinformatics analysis
Preliminary sequence analysis was performed using a shotgun cloning protocol. Phage DNA was digested using EcoRI (Invitrogen), separated on 0.8% (wt/vol) agarose gels, purified using the GeneClean II kit (Qbiogene, Irvine, CA), ligated into pUC19 or pGEM-7Z and transformed into DH5α (Invitrogen). Following bluewhite selection on LB solid medium containing 100 μg/ ml ampicillin, constructs with phage DNA inserts were isolated using a QIAprep Spin Miniprep kit (Qiagen), digested using EcoRI and viewed using gel electrophoresis. Inserts were sequenced with the assistance of the University of Alberta Department of Biological Sciences Molecular Biology Service Unit using an ABI 3730 DNA analyzer (Applied Biosystems, Foster City, CA). Sequences were edited using EditView and aligned using AutoAssembler (Perkin-Elmer, Waltham, MA). For completion of the three genomes, DNA samples were submitted for pyrosequencing analysis (454 Life Sciences, Branford, CT). Gaps between the assembled sequences were filled following PCR amplification and cloning using primers (Sigma-Genosys) designed to amplify across the gaps, TopTaq DNA polymerase and buffers (Qiagen) and the CloneJET PCR cloning kit (Fermentas). The complete genome sequences of KS5, KS14 and KL3 were deposited in GenBank with the accession numbers GU911303, HM461982 and GU911304, respectively.
To identify the KS5 prophage insertion site in ATCC 17616, the assembled KS5 sequence was compared to the vB_BmuZ-ATCC 17616 sequence in a BLASTN search and the left prophage junction was determined.