UFV-P2 as a member of the Luz24likevirus genus: a new overview on comparative functional genome analyses of the LUZ24-like phages

Background Phages infecting spoilage microorganisms have been considered as alternative biocontrol agents, and the study of their genomes is essential to their safe use in foods. UFV-P2 is a new Pseudomonas fluorescens-specific phage that has been tested for its ability to inhibit milk proteolysis. Results The genome of the phage UFV-P2 is composed of bidirectional modules and presented 75 functionally predict ORFs, forming clusters of early and late transcription. Further genomic comparisons of Pseudomonas-specific phages showed that these viruses could be classified according to conserved segments that appear be free from genome rearrangements, called locally collinear blocks (LCBs). In addition, the genome organization of the phage UFV-P2 was shown to be similar to that of phages PaP3 and LUZ24 which have recently been classified as a Luz24likevirus. Conclusions We have presented the functional annotation of UFV-P2, a new Pseudomonas fluorescens phage. Based on structural genomic comparison and phylogenetic clustering, we suggest the classification of UFV-P2 in the Luz24likevirus genus, and present a set of shared locally collinear blocks as the genomic signature for this genus.

Pseudomonas fluorescens bacteriophage UFV-P2 [5], is a virus with a high ability to reduce casein proteolysis in milk. Milk proteolysis is caused by thermo-resistant enzymes produced by psychrotrophs and is responsible for serious losses in the dairy industry due to negative effects on the quality and reduced shelf life of dairy products. In this environment, Pseudomonas spp. are prevalent contaminants [6][7][8], mainly P. fluorescens [9,10]. The use of phages in biocontrol has been suggested as an alternative to the use of chemicals. For example, P. fluorescens-specific phages had been studied to control Pseudomonas population and as sanitation agents to efficiently remove bacterial biofilms on stainless steel surfaces similar to those used in food industries, where these contaminants are common [11][12][13]. However, they must be used with caution. In addition to proteolysis reduction and biofilm inhibition studies and, their host range determination, it is necessary to understand phages' genome and proteome to make possible their use as biocontrol agents.
To expand our understanding about the P. fluorescensspecific phage UFV-P2, we present in detail the analysis of its structural genome and its comparisons to other phage genomes.

Sampling
The phage UFV-P2 was isolated from wastewater of a dairy industry in Minas Gerais, Brazil, and propagated at 30°C in LB medium in a strain of P. fluorescens 07A, courtesy of Laboratory of Food Microbiol, located at the Federal University of Viçosa, Brazil.

Genome extraction and composition
Phages were propagated in LB medium containing the bacteria in exponential phase. After incubation at 30°C for 8 h, particle assemble was induced with mitomicin and the virions were recovered by centrifugation and filtration. Phage suspensions were incubated with 75 μg/mL of proteinase K in the presence of 0.01% SDS at 56°C for 90 min. Proteins were removed by extraction with phenol, phenol:chloroform (1:1), followed by chloroform. Genetic material was concentrated with an equal volume of isopropanol and resuspended in 30 μL of distilled water. For analysis of viral genome composition, 5 μL of the genomic extracts were submitted to digestion assays with enzymes DNase I (50 μg/mL) or RNaseA (100 μg/mL) for 60 min at 37°C, followed by 1% agarose gel electrophoresis and visualization by staining with GelRed (Biotium, USA).
Genomic DNA sequencing and assembly UFV-P2 genome was sequenced using an Illumina Genome Analyzer II by CD Genomics (New York, USA) and was assembled and analyzed using CLC Genomics Workbench version 5.1 (CLC bio, Cambridge, MA, USA). The sequence reads were assembled into contigs using stringent parameters, in which 90% of each read had to cover the other read with 90% identity. The data are available in GenBank database under accession number JX863101.
For comparative purposes at the genomic level EMBOSS Stretcher [20] and progressive Mauve [21] were employed; while at the proteomic level we used CoreGenes [22,23]. Seventeen genomic reference sequences of phages were downloaded from GenBank (Table 1) and compared to UFV-P2 genome.

Phylogenetic clustering
For clustering UFV-P2 phage in an evolutionary way, a phylogenetic hypothesis was inferred by Bayesian inference (BI) using MrBayes v3.2.2 [24]. The genomic sequences of phages were aligned using ClustalW [25], and a pairwise distance matrix was calculated MEGA version 5 [26] ( Table 1). The alignment was manually inspected, and the sites with gaps were excluded. To expedite the construction of phylogenetic trees, a model of nucleotide substitution was estimated using the jModelTest 2 program [27]. The GTR + G substitution model was selected as the best DNA evolution model for genomic sequences, according to the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
The BI phylogenetic tree was calculated using the Bayesian Markov Chain Monte Carlo (MCMC) method, in two runs with 5,000,000 generations. The convergence of the parameters was analyzed in TRACER v1.5.0 (http://beast.bio.ed.ac.uk/tracer), and the chains reached a stationary distribution after 50,000 generations. Then, a total of 1% of the generated trees was burned to produce the consensus tree. To root the phylogenetic tree, the Enterobacteria phage T7 (NC_001604) was selected as outgroup taxa.

Results and Discussion
Transmission electron microscopy of the UFV-P2 virions (data not shown) showed that this virus has isometric capsids and very short tails, with morphological similarity to the P. aeruginosa phages Pap3 and MR299-2. Thus, UFV-P2 can be inserted in the Podoviridae family, order Caudovirales.

Functional genomic organization
The viral genome was extracted and sequenced The phage UFV-P2 has a linear 45,517 bp DNA genome with a GC content of 51.5%, and was sequenced with coverage of 30,655 fold. One of the interesting characteristics of members of the Luz24likevirus genus is the presence of localized single-stranded breaks associated with the consensus sequence TACTRTGMC [28]. Fourteen of these sequences were found in the top strand of the tf DNA, while the genome of UFV-P2 contains 15.
At first, bioinformatics analyses had showed that the UFV-P2 genome has a bidirectional organization with 92 predicted open reading frames (ORFs) larger than 100 bp, but only 41 ORFs (44.75%) could be identified as coding sequences (CDS) by similarity searches against known proteins in the GenBank and UniProt databases [5]. However, we propose a new annotation of the genome of this virus based on different tools, which were able to functionally predict 75 ORFs also bidirectionally oriented and forming clusters of early and late transcription ( Figure 1 and Table 2).
The searches for consensus sequences of transcriptional promoters revealed the presence of seven promoters, five in the positive strand initiating the transcription of ORFs that encode early proteins, which is a common feature of viral genomes that need bacterial transcription factors to start their infection cycle. The two other promoters are located in late genes modules. These genes are usually transcribed by viral transcription factors.
Three rho-independent transcription terminators were predicted using ARNold, one in the positive and two in the negative strand ( Figure 1). A bidirectional termination region was found in the region from 25,922 to 25,964. Interestingly, this pattern of termination is also found in the genomes of the phages PaP3 [1] and LUZ24 [2]. The last terminator sequence is located at the terminal end of the gene encoding the major head protein. The low number of sequences of rho-independent terminators compared to the number of predicted ORFs may be due to the existence of other types of terminators or the presence of transcriptional modules and the generation of polycistronic mRNAs, a very common feature of viral genomes.
The predicted UFV-P2 genes were functionally classified as its promoters, predicted order of transcription, and its annotated functions.

Nucleotide biosynthesis and DNA replication (positive-stranded ORFs)
Fifty-five genes (ORFs 01-55) involved in nucleotide biosynthesis and viral replication process were found in the UFV-P2 genome positive strand, named early genes ( Figure 1). Among viral replication genes, ORF31 encodes a primase/helicase; ORF44, a DNA-binding protein; ORF48, a 5′-3′ exonuclease; ORF50, a putative endonuclease; and ORFs 32 and 43 encode the two exons of the viral DNA polymerase, between which there is an ORF encoding a putative holin with three transmembrane domains similar to those from the phages tf and LUZ24. Holins are small membrane proteins that accumulate in the membrane until, at a specific time that is "programmed" into the holin, the membrane suddenly becomes permeabilized to the fully folded endolysin [29]. In addition, the UFV-P2 genome contains two endonucleases encoded by ORF24 and ORF50. The first is a HNH endonuclease, a group I homing endonuclease, which may be related to the presence of introns in the UFV-P2 genome [30], like those between the two parts of DNA polymerase. Other enzymes predicted in the positive strand include ORFs 23, 25 and 28, which encode, respectively, an amidoligase, a glutamine amidotransferase and an ATP-grasp enzyme. The other 45 proteins of the early genes module are hypothetical proteins.
Virion assembly and host lysis (negative-stranded ORFs) Twenty genes (ORFs 56-75) related to the composition and assembly of the viral particle, DNA packaging, and host lysis were found in the UFV-P2 genome negative strand, named late genes ( Figure 1). Two transcriptional modules were found based on predicted terminators. The first is located in the regions comprising the ORFs 75-69, and the second module corresponding to the ORFs 75-56.
In the first module, ORF75 and ORF73 encode the small and large terminase subunits, respectively. The terminase is the motor component that assists the translocation of viral genomic DNA to the inner of the capsid during packaging via ATP hydrolysis. There is an ongoing discussion about the role of terminase structure in determining the points for cleavage of the viral DNA, which would influence the entire viral genome organization [31]. Recently, Shen and coworkers [32] functionally identified the two genes encoding PaP3 terminase subunits, located in ORFs 1 and 3, respectively, which have high sequence similarity with ORFs 75 and 73 of the UFV-P2 genome. The PaP3 genome have been annotated as opposing transcriptional gene clusters in relation to the UFV-P2 genome, what explains the difference observed for the numbering of similar ORFs. The same occurred for the earlier annotation of phage UFV-P2 [5], which is revised in this work to correspond to the annotation of phage LUZ24, which represents the genus.
ORF72 encodes the portal protein; ORF69 encodes the major head protein; and ORF70 encodes a scaffolding protein, which is a chaperone possibly related to viral particle assembly. In the second module, beyond the ORFs from the first, the ORFs 57-61, 64 and 67 encode particle/ structural proteins; ORF65 encodes the tail fiber protein; and the other six ORFs encode hypothetical proteins.
ORF74 encodes a lysozyme that is used in the process of host cell breakage through the lysis of the peptidoglycan layer. The occurrence of a lysin, not associated with its cognate holin, is unusual but also found in other members of the Luz24likevirus genus.

Structural genomic comparisons and evolutionary clustering
Pairwise genomic comparisons has been a useful approach for genotyping and classification of viruses like Circoviridae [33] and Geminiviridae [34]. The alignment of phages genomic sequences and pairwise comparisons revealed that vb_PaeP_p2-10_Or1, vb_PaeP_C1-14_Or, LUZ24, PaP4, PaP3, MR299-2 and tf are the phages most closely related to UFV-P2. Genomic sequences of these phages presented an identity to the UFV-P2 genome ranging from 49.5% to 57.5% (see Table 1).    The structural genomic comparisons in Mauve showed that these phages shared a set of conserved locally collinear blocks (LCB) (Figure 2 and Additional file 1: Figure S2). LCBs are conserved segments that appear be free from genome rearrangements, since the orthologous regions of genomes can be reordered or inverted by recombination processes [21]. In addition, a specific comparison between UFV-P2 and LUZ24 showed colinearity across their genomes (Figures 2 and 3).
Phages LUZ24, PaP4, and UFV-P2 present a conserved bidirectional genomic organization, which is showed by the shared LCBs (blocks 3-9) (Figure 2). Phage tf also presents this organization, but with some differences in the shared LCBs. On the other hand, phages MR299-2, PaP3, vb_PaeP_p2-10_Or1, and vb_PaeP_C1-14_Or present an inverted set of LCBs (blocks 9-3), representing an opposing arrangement of the gene modules. Proteins of these seven phages were the top hits with the UFV-P2 sequences ( Table 2) and can collaborate with each other's functional annotations. In addition to genomic comparisons, a search for direct terminal repeats (DTRs) indicated the presence of patterns at the ends of  the UFV-P2 genome, as described for the phages LUZ24, tf, and vB_PaeP_C1-14_Or1. These repeats are responsible for the recognition and cleavage of the phage genome at the end of the repeat region during packaging. Interestingly, one of the unique features of this group of phages is that PaP3 possesses 20 bp 5′-protuding cohesive ends [1], while LUZ24 has 184 bp DTRs, yet there does not appear to be a significant difference in the amino acid sequence of their terminases. As suggested by the structural genomic comparisons, phylogenetic tree of genomic sequences grouped the phages according the shared LCBs ( Figure 2). Phages PaeP_p2-10_Or1, vb_PaeP_C1-14_Or, LUZ24, PaP4, PaP3, MR299-2, tf, and UFV-P2 were included in a distinct monophyletic clade in BI phylogenetic tree, which possibly represents the Luz24likevirus genus. The shared LCBs, blocks 3-9 (Figure 2), may be considered as a genomic signature for this genus. In UFV-P2 genome (Figure 1), as for the other phages, the genes for biosynthesis and DNA replication are included in blocks 5 and 6; genes for virion structure and assembly are in blocks 7 and 8; and genes for host lysis are block 9. In blocks 3 and 4 are included only hypothetical genes. Then, we propose the classification of the phage UFV-P2 in the Luz24likevirus genus. In fact, these analyzes showed that other viruses were also grouped in distinct monophyletic clades or according to specific shared locally collinear blocks (LCB), as those from the T7likevirus (blocks 16 and 17) and Phikmvlikevirus (blocks 22, 24, and 25) genera, beyond a possibly genus including the phages PaP2 and 199X (blocks 4 and 11-15).

Conclusions
We have presented the functional annotation of UFV-P2, a new Pseudomonas fluorescens phage. Based on structural genomic comparison and phylogenetic clustering, we suggest the classification of UFV-P2 in the Luz24likevirus genus, and present a set of shared locally collinear blocks as the genomic signature for this genus. Figure 3 Comparison of the genomes of the phages UFV-P2 and LUZ24. The collinearity between genomes is represented by the conserved locally collinear block (left) and Dot plot alignment (right). Dot plot alignment was calculated using Nucleic Acid Dot Plots (http://www.vivo. colostate.edu/molkit/dnadot/index.html), considering a window size of 13 and a mismatch limit of 0.