Phage cluster relationships identified through single gene analysis

Background Phylogenetic comparison of bacteriophages requires whole genome approaches such as dotplot analysis, genome pairwise maps, and gene content analysis. Currently mycobacteriophages, a highly studied phage group, are categorized into related clusters based on the comparative analysis of whole genome sequences. With the recent explosion of phage isolation, a simple method for phage cluster prediction would facilitate analysis of crude or complex samples without whole genome isolation and sequencing. The hypothesis of this study was that mycobacteriophage-cluster prediction is possible using comparison of a single, ubiquitous, semi-conserved gene. Tape Measure Protein (TMP) was selected to test the hypothesis because it is typically the longest gene in mycobacteriophage genomes and because regions within the TMP gene are conserved. Results A single gene, TMP, identified the known Mycobacteriophage clusters and subclusters using a Gepard dotplot comparison or a phylogenetic tree constructed from global alignment and maximum likelihood comparisons. Gepard analysis of 247 mycobacteriophage TMP sequences appropriately recovered 98.8% of the subcluster assignments that were made by whole-genome comparison. Subcluster-specific primers within TMP allow for PCR determination of the mycobacteriophage subcluster from DNA samples. Using the single-gene comparison approach for siphovirus coliphages, phage groupings by TMP comparison reflected relationships observed in a whole genome dotplot comparison and confirm the potential utility of this approach to another widely studied group of phages. Conclusions TMP sequence comparison and PCR results support the hypothesis that a single gene can be used for distinguishing phage cluster and subcluster assignments. TMP single-gene analysis can quickly and accurately aid in mycobacteriophage classification.


Background
Mycobacteriophages infect Mycobacterium species such as the clinically important Mycobacterium tuberculosis and the nonpathogenic M. smegmatis. Mycobacteriophages are the most studied of all bacteriophages with 2,413 mycobacteriophages isolated, more than 344 genomes fully sequenced (http://phagesdb.org/) and approximately 223 full phage genome sequences available on GenBank, making the analysis of these phages a model for bacteriophage research. The number of mycobacteriophages isolated and sequenced in recent years has led to the identification of genetic relationships and subsequent assignment of phages into 17 clusters and 30 subclusters based on whole genome comparison [1][2][3]. The genomes vary in size from between 41,441 and 164,602 bp [3]. Comparison of phages within and between clusters has revealed genes in rapid genetic flux and regions that are more likely to have undergone horizontal exchange in relatively recent evolutionary time [3,4]. This genetic mosaicism contributes to the high level of diversity observed between phages and complicates phylogenetic analysis. Thus, identifying viable genome comparison methods that reflect the multifaceted evolutionary history of phage is fraught with challenges [3,[5][6][7][8]. For example, the differences phages exhibit in the number and location of genes and the variety of genomic length results in the inability to utilize maximum likelihood and other traditional methods that require positional homology for determining phylogenic relationships. For mycobacteriophage cluster and subcluster assignment, whole genomes are currently compared primarily by dotplot, but pairwise average nucleotide identities (ANI), pairwise genome maps, and gene content analysis are all considered [7].
This study demonstrates that a single gene can group mycobacteriophages into the same clusters and subclusters proposed by whole genome dotplot analysis. The ability to predict phylogenetic assignment allows researchers to focus on particular phages during the initial isolation and amplification before whole genome sequencing and may facilitate analysis of complex samples [7]. The Tape Measure Protein (TMP; [9,10]) which is typically encoded by the longest gene of a phage genome was selected, and the nucleotide and amino acid sequences of TMP were analyzed in 247 mycobacteriophages representing more than 42 subclusters. TMP is also used to identify mycobacteriophage cluster and subcluster by dotplot comparison and by maximum likelihood methods. In addition, PCR evidence suggests identification of cluster-specific sequence similarity in TMP is sufficient for cluster prediction. The Gepard dotplot analysis of TMP is applied to a subset of known coliphages and demonstrates that the single-gene method identifies phage relationships whether the entire genome or the single TMP gene is used for the comparison. Thus, single-gene analysis for phylogenetic prediction is feasible for the two most highly studied groups of phages, those that infect Mycobacteria and those that infect Escherichia coli. Due to the highly mosaic nature of phages, subsequent full genome sequence analysis is appropriate to ensure proper taxonomic assignment reflecting the complex evolutionary history of the phages.
These data support that a single gene can predict phage cluster and subcluster specific classification when properly compared. More specifically, the clusters observed using a single gene maximum likelihood comparison or Gepard dotplot alignment reflect the same clustering that is observed when whole genome comparison is used.

Results and discussion
Dotplot comparison of a single gene can identify clusters similar to whole genome dotplot Hatfull et al. [7,11] demonstrated grouping patterns for mycobacteriophage clusters and subclusters A through O based on nucleotide sequence dotplots [12,13]. All fully sequenced mycobacteriophages available from GenBank or the mycobacteriophage repository www.phagesdb.org have been previously assigned to a subcluster primarily by dotplot analysis of fully sequenced genomes [7,11]. Dotplots are two-dimensional matrices with the sequences being compared along the horizontal and vertical axes.
The matrix is shaded based on regions of homology, thus identical sequences appear as diagonal black lines across the regions where they are compared. The Gepard dotplot in Figure 1A includes 79 entire genome nucleotide sequences of representative phages from clusters A through O. It demonstrates the clustering pattern of the phages into their preassigned [7,11] clusters and subclusters. To determine whether a single gene could be used to identify the same clusters, the Tape Measure Protein (TMP) and the Major Capsid Protein (MCP) nucleotide and amino acid sequences were used to produce dotplots for the same 79 phages [12,13] (Figure 1B-E). TMP and MCP were chosen due to the ubiquitous nature of these mycobacteriophage genes [12,14], a necessity of single gene comparison. In addition, these genes are likely to have limited transfer to phages from diverse evolutionary origins due to their involvement in multiple protein-protein interactions within phages [15][16][17]. The dotplots illustrate that the same clustering of mycobacteriophages occurs when using TMP, MCP or whole genomes ( Figure 1). All of the clusters and subclusters are recovered for each of the 79 phages whether using nucleotide or amino acid sequences for TMP or MCP, supporting the use of single-gene dotplots in recovering a known phylogeny. In addition to recovering clusters, single-gene dotplots also reveal similarities between phage clusters evident in the whole genome dotplots. For example, TMP of G cluster phages Halo and Hope is similar to the K1-3 subcluster phages Adephagia, Angelica, CrimD, TM4, Pixie, MacnCheese, Fionbharth and Larva. In addition, MCP from F cluster phages RockyHorror and Che9 is similar to the same K1-3 subcluster phages. These examples demonstrate that the K subcluster phage genomes are similar to part of the G phage genomes and part of the F phage genomes ( Figure 1E).
The TMP gene is approximately 3000 bp (2200-6800 bp), making it the longest and most easily recognized gene in Siphoviridae mycobacteriophages. While this size is nearly 20 times smaller than the entire genome , the TMP plot reflects the same clustering as the entire genome. The MCP gene is approximately 1,250 bp (800-1600 bp), much smaller than TMP, yet clustering is still evident. Clustering by single gene amino acid sequences ( Figure 1D, E) is slightly stronger than the nucleotide plots ( Figure 1B, C), which reflects the conservation of protein structure when silent mutations occur in the nucleotide sequence. Whole genome amino acid sequence comparisons are not feasible because genes exist in different frames and orientation across the genome.
The TMP method for cluster identification was then expanded to 247 complete mycobacteriophage genomes currently available in GenBank and from http://phagesdb.org. All of these mycobacteriophages have been previously assigned to clusters through whole genome analysis [7,11] and cluster assignment is available at http://phagesdb.org.
Remarkably, the majority of the 247 phage (244/247 or 98.8%) are recovered to their assigned cluster by either TMP nucleotide or amino acid dotplot analysis as demonstrated in Figure 2. Of the 247 phage, Armid, Benedict and Rey were the only three phages where the subcluster assignment was not apparent using TMP Gepard analysis. The genomes of Armid and Benedict are highly similar to one another sharing 90-95% identity and 75-80% with their assigned A5 subcluster. By TMP analysis, these phages would form their own new cluster because TMP shares no identity with other phages. The third phage, Rey, appears as a singleton with TMP-only analysis. Rey shares only 10% TMP similarity with other phages in its assigned cluster M, while 30% of its whole genome is similar to cluster M phages. Of the 244 phages recovered to the correct cluster, three phages differ in their subcluster assignment with the TMP analysis namely AnaL29, Pukovnik, and Squirt. AnaL29 is assigned as an A1 phage but its TMP is similar to A2 phage. Pukovnik, is assigned as an A2 phage but whose TMP is similar to A5 phages. Also, Squirt is an F3 phage whose TMP is similar to F1 phages. Interesting the TMP gene of Dori, a singleton, shows significant identity to B2 cluster phages (almost 50%). These data indicate that mycobacteriophages can be correctly preassigned to clusters with an accuracy of 98.8±1.36%, or subclusters with an accuracy of 97.6±1.92%, by TMP sequence prior to whole genome sequencing. The low error rate of 2.4±1.9% may be due to genetic exchange between mycobacteriophages. These data support the use of a single gene dotplot analysis to predict whole genomebased cluster relationships of phages.

Use of a single gene allows global alignment and maximum likelihood comparisons
Bacteriophage genomes pose unique challenges to determining phylogenetic relationships by whole genome analysis because of the mosaic nature of phage genomes. For instance, a common and powerful method of determining genetic relationships is to utilize a global alignment of sequences in question and perform a maximum likelihood comparison. This method is ineffective with entire phage genomes because global alignment cannot be made on entire genomes and sometimes not even reliably among coding sequences; they exhibit many differences in genome length, gene content and gene synteny [2,5]. Since the TMP gene simulated the whole genome dotplot relationships of the phages, a global alignment and maximum likelihood comparison performed on TMP alone may demonstrate the appropriate phage clustering. Figure 3 shows a phylogeny inferred from a TMP alignment using both Maximum Likelihood (ML) and Bayesian Inference (BI). The ML phylogenetic tree Blast searches when necessary. Gepard [12] was used to generate dotplots of TMP nucleotide and amino acid sequences.
was constructed using ClustalW alignment of TMP and the maximum composite likelihood of Mega4 software [18]. Using this method, TMP genes segregated phages into their pre-assigned clusters and subclusters [7,11] with substantial fidelity. Without exception, every subcluster is located within a clade (color coded for ease). The phylogeny was also inferred using BI as the optimality criterion, which resulted in a nearly identical topology (branching patterns) and similar nodal support compared to ML (bootstrap proportions were largely correlated to posterior probability values as indicated by the first and second numbers at each node). ML and BI phylogenies were compared quantitatively by estimating the Matching Splits metric, where both phylogenies differed only by 21.3% (100% different estimated against a star phylogeny). Differences in topology were noted at deeper levels in the phylogenies but not at the subcluster level where clades were successfully recovered under both inference methods.
In Figure 3, all A subclusters extended from the same branch and form consistent and well supported clades. This relationship is also true for the B subclusters. By contrast, the phylogenetic tree reveals a larger distance between the F subclusters as they were not recovered as a monophyletic group. For instance, subcluster F1 branches with I, E, and N clusters, while the F2 subcluster branches with K and G clusters. The similarity between F2, G and K was identified by dotplot analysis as discussed above ( Figure 1). This difference suggests that the F1 and F2 subclusters may be their own distinct cluster if utilizing TMP for determining the cluster relationships. Based on these data, single gene global alignment for cluster identification provides further evidence that a single gene can be used to predict phage clusters.

A single gene can distinguish subclusters
Dotplots of mycobacteriophages from entire clusters are capable of determining subclusters and identifying the subcluster assignment of an individual phage. The TMP nucleotide and amino acid sequences were used to generate a Gepard dotplot of the B cluster phages ( Figure 4A and 4B). The plots accurately reflect the B subclusters published previously [1,7]. The dotplot comparison of TMP from a single phage against phages of various subclusters should also allow for subcluster prediction. To demonstrate this, Figure 4C and 4D plots were generated using the TMP sequence of the B1 subcluster phage KLucky39 against phages in each of the B subclusters. KLucky39 aligned with the B1 phages in the comparison, but the relationship became weaker when comparing the KLucky39 sequence with the B2, B3, B4, and B5 subclusters. These data support the use Fasta files of whole genome sequences were downloaded from GenBank or the http://phagesdb.org website and TMP nucleotide and amino acid sequences were identified by auto-annotating using DNA Master (http://cobamide2.bio.pitt.edu) when necessary. Gepard [12] was used to generate dotplots of TMP nucleotide and amino acid sequences.
of a single gene, such as TMP, to predict mycobacteriophage phylogeny beyond cluster into a subcluster.

Subcluster-conserved sequences within a single gene are identifiable
The relationship between the TMP sequence and phage clustering merited the search of short conserved sequences within the gene that were subcluster specific. Figure 5 illustrates the sporadic regions of similarity among TMP genes from phages of all subclusters ( Figure 5A). However, alignment of the TMP gene sequence from phages in a single cluster identifies regions of unique similarity ( Figure 5B) not found in other clusters. Consequently, we posited that a PCR primer set can be designed specifically for a single Figure 3 Cluster relationships are identifiable using TMP by Maximum Likelihood comparison and Bayesian Inference. The phylogenetic tree generated from TMP nucleotide sequences for 79 mycobacteriophages provides evidence that a single gene reflects the same clustering identification as entire genome comparisons published previously [7,8]. Both Maximum Likelihood (ML) and Bayesian Inference (BI) recovered largely the same clades. Nodal support is shown as bootstrap proportions (from ML)/posterior probabilities (from BI). Clades labeled only with bootstrap proportions signify clades from ML that were not recovered in BI analysis.
cluster or subcluster ( Figure 5C). Table 1 demonstrates the overall degree of identity between TMP from phages within a single subcluster. Short conserved sequences in TMP were found to occur at the level of subcluster and nonsubdivided clusters, allowing for subcluster-specific PCR primers to be designed as listed in Table 2. In many cases, degenerate primers were selected to allow for silent mutation differences. It is notable that while all subclusters yielded regions of similarity, no conserved sequences were found between subclusters of a same cluster (such as any of the A subclusters or the B subclusters). These data are useful indicators of the robustness of TMP as a single gene to predict mycobacteriophage clustering.

PCR amplification of TMP verifies phage cluster identity
Each subcluster primer set was tested on several phage samples from the appropriate subcluster and yielded accurate bands of expected amplicon size ( Figure 6). Primer sets were also tested against DNA from phages of all other subclusters to verify their specificity and no cross-reactivity was observed. In addition, we tested the ability to use the primers on DNA extracted via simplified methods, such as boiling a diluted sample from a spot test.
The primers successfully amplified appropriate band size amplicons from DNA samples extracted by three different methods including purified DNA extracted with a commercial DNA extraction kit, DNA extracted from different concentrations of a diluted boiled spot test and DNA extracted using a high titer lysate that was diluted and boiled ( Figure 6B). The PCR data confirm that subcluster-specific primer sets can amplify the target sequences and that TMP can be used to distinguish phage clusters. In addition, the PCR from diluted boiled spot tests worked remarkably well allowing subcluster identification in the initial stages of mycobacteriophage isolation with minimal effort.

Alignment-free TMP phylogeny does not distinguish myobacteriophage clusters
As mentioned previously, gene content and genetic identity are highly heterogeneous between phages and thus prevent the application of traditional phylogenetic methods using whole genome sequences. New methods of phylogenetic comparisons have been developed that determine relationships based on the frequency of 'words' or 'features' so that there is no need to rely on positional homology [19][20][21]. These feature frequency profile (FFP) approaches allow for alignment-free phylogenetic inferences. When comparing long genome sequences, the small feature length of FFP allows for relationships to be determined regardless of variety in genome length or gene content in the comparative samples. Recently, Sousa et al. demonstrated the ability of alignment-free methods to uncover the known phylogeny of T7 phage variants, all of which were similar in that they were evolved from a parental T7 phage [22]. In contrast to the highly similar T7 phage variants, mycobacteriophages are highly diverse with low sequence identity and novel gene order and content. The diversity could potentially hamper alignment-free analysis; therefore, an FFP alignment-free method was applied to the 79 diverse mycobacteriophage genome dataset with a 20-base feature length.
Since the alignment-free phylogeny using FFP is stronger when longer sequences are being compared, a whole genome should yield a more definitive relationship than a single gene. This method was applied to both whole phage genomes and TMP gene sequences and nearly all clusters and subclusters were identified using whole genomes but, as anticipated, it failed to identify clusters or subclusters using TMP only (Figure 7). Using the genealogical sorting index (gsi) as a quantitative measure reflecting monophyly, the results indicated that only L1-L2, J, and A6 remained in identifiable clades when TMP was used. No other clusters or subclusters were identifiable using TMP in this method ( Figure 7C). The Matching Splits (MS) metric was used to address the distance between phylogenies. Comparison between the genome and a completely unresolved phylogeny (star phylogeny) yielded a MS value of 722 (100% different), compared to 582 (81% different) when comparing genome and TMP phylogenies.
Altogether, these results reflected a loss of resolution and cluster structure between genome and TMP trees suggesting that the FFP method requires longer sequences (such as whole phage genomes) in the case of mycobacteriophages for reliable relationship determination by FFP. In summary, mycobacteriophage cluster relationships may be determined using either whole genomes in an alignment-free FFP analysis or predicted using single genes (such as TMP) in a global-alignment maximum likelihood analysis.

Single gene comparison of coliphages also yields identifiable clusters
After investigating the analysis methods and abilities of a single gene to identify mycobacteriophage subclusters, we applied the single gene comparison method to siphophages of another highly studied and diverse group, those that infect E. coli (for a recent review see [23]). Siphophages were chosen due to the presence of TMP. Gepard dotplots of genomes from 24 annotated siphophages that infect E. coli yielded similar relationships whether using whole genome nucleotide or TMP nucleotide sequences (Figure 8). From either the whole genome or the single gene plots, eight groups of coliphages were evident and at least two of these groups appeared to have subcluster properties (Table 3). It should be noted that TMP is not ubiquitous in enterobacteriophages, thus other ubiquitous genes must be explored for use for these phages, such as portal proteins or coat proteins [24]. Unfortunately, portal or coat proteins will be dramatically shorter than TMP, and may not lend the same strength of predictability as is possible with Siphoviridae. These data suggest that single genes may be used to predict relationships within many phage groups, not just mycobacteriophages.

Conclusions
With the explosion of recently isolated mycobacteriophages, we have access to a large data set of defined clusters and subclusters based on whole-genome analysis (344 mycobacteriophages), but an even larger number of phages have been isolated which are not yet sequenced (2,413 mycobacteriophages) (www.phagesdb.org). Our data confirm the use of a single, ubiquitous, semi-conserved gene for the prediction of mycobacteriophage cluster, which is particularly useful when a full genome sequence is unavailable. Irrespective of potential recombination events in the selected TMP gene, global alignment ( Figure 1) and Maximum Likelihood or Bayesian Inference ( Figure 3) TMP sequences were compared using ClustalW [32] within MEGA4 software [18]. Some subclusters were combined. The % Identical Sites indicates the number of identical nucleotides aligned over the entire length of the gene. The % Pairwise Identity indicates the number of identical nucleotides of aligned and unaligned lengths within the gene and gives a more accurate indication of similarity.
of this single gene accurately recovered phage cluster and subcluster categorization already recognized by the whole-genome methods. Gepard dotplot analysis of TMP proved to be the most reliable method for determining phage relationships, capable of recovering 98.8±1.36% of 247 assigned mycobacteriophage clusters and distinguishing phages beyond cluster, down to the subcluster level with an accuracy of 97.6±1.92%. This predictive ability is most likely due to the algorithms within the dotplot that allow for alignment of Figure 6 Phage subclusters can be identified by PCR using subcluster-specific TMP primers. PCR products of the predicted size are amplified using cluster-specific primers as indicated in this example (A) which includes phages from subclusters A1 (lanes 2-3), A2 (4-5), A4 (6-7), B1 (8-9), B3 (11)(12), D (13)(14), E (15)(16), G (17)(18), and J (19). DNA ladder is in lane 1 and 10. Subcluster specific TMP primers were designed using Geneious software [33] and specific primer sequences are reported in Table 2. DNA can be obtained for PCR amplification from various sources (B), including DNA extraction kits (lane 2), boiled spot test using 10 ul, 50 ul, 100 ul (4-6), or from a boiled dilution of high titer lysate (7). A negative control is in lane 3. sequences with a high mosaic nature, both in sequence and orientation. Caution must be used with the single-gene approach to determine phage phylogeny. Alignment-free methods, which account for high variability in genome length and gene content, are not designed for single-gene datasets and, accordingly, were not able to reconstruct mycobacteriophage clusters even when a large gene (TMP) was used. This inability reflects the requirement of the FFP method to use much longer sequences in order to capture the phylogenetic relationship among phages. With a whole genome sequence, the FFP method could reliably be used for phage classification, but the method should not be used with a single gene.
Using a single gene to describe evolutionary relationships was recognized as a problem very early in molecular phylogenetics literature [25][26][27]. Evolution is not linear and molecular and population events such as horizontal gene transfer [28], incomplete lineage sorting, and gene duplication/extinction [29] can and do affect our ability to equate gene trees to species trees [30,31]. This genetic exchange is even more pronounced in phages, which have rapid rates of gene transfer and are thus, highly mosaic [3,[5][6][7][8]. Cluster assignment is a simplification of evolutionary history for ease in categorization. For example, although similar phage groups appear using either whole genome sequence or TMP sequence for either mycobacteriophages ( Figure 1A vs. 1B) or coliphages (Figure 8), whole genome sequence provides more detailed evolutionary relationships indicative of horizontal gene transfer. Only very weak relationships are seen between coliphage lambda and mEp234 when TMP alone is used in dotplot analysis, while over half the genome shows similarity in the whole genome dotplot.
Despite genome mosaicism, a single-gene that is ubiquitous and highly conserved may provide insight into evolutionary history of phages. Hardies et al. reported that, in a 215 kb phage genome, the genes encoding TMP, TMP chaperonins, and phage tail properties are evolutionarily stable [32]. Belcaid et al. furthered the study of TMP in respect to evolutionary relationships and reported identification of repeated units and markers within TMP that could be used to assess evolutionary relationships Figure 7 Alignment-free phylogenetic inference can determine subcluster assignments of phages only when using entire genome sequences. As predicted, a feature frequency profile (FFP) can identify subclusters when given sufficient nucleotide sequences for the analysis, such as entire phage genomes (A); however, the TMP gene sequence is too short for the feature frequency profile to identify relationships (B). The geneological sorting index (gsi) for clades indicates subclusters are identified well in the whole genome analysis and poorly or not at all in the TMP analysis (C). The mycobacteriophage genomes used were identical to the 79 genomes used throughout this study, which represent 30 mycobacteriophage subclusters. Feature frequency profiles [20] were used to infer phylogenetic relationships [19][20][21] using Bacillus cereus PBC1 phage as outgroup. The neighbor-joining method was used to infer a phylogeny which was bootstrapped 10,000 times to assess nodal support. A 50% majority-rule consensus tree was obtained using Paup* 4.0 [34] and annotated in FigTree 1.3.1 (http:// tree.bio.ed.ac.uk/software/figtree). [7]. In addition, Casjens et al. show high conservation of enterobacteriophage head coat proteins [24]. Thus, for phages, structural genes may be the best option for a single, ubiquitous, semi-conserved gene that would reflect evolutionary relationships similar to 16S rRNA sequencing for bacterial species. This study is the first to include such a large number of known phage genomes and the ability of the TMP gene to reflect genomic relationships down to cluster and subcluster. Thus, horizontal DNA transfer is not happening at a rate that obscures the existence of mycobacteriophage clusters and subclusters. The data indicate that a TMP gene tree reconstructed using a Maximum Likelihood or Bayesian Inference methods reflect current categorization of phages and thus can be used for a fast and reliable initial phage assignment.
Single-gene categorization of phages is a valuable simplification for research. For instance, a key drawback to conventional methods of determining phage phylogeny is the necessity of whole genome sequence. Whole genome sequencing generally requires purification and amplification of a phage that can be costly, time-consuming and challenging. This study reveals several computational strategies that are able to predict phage relationships based on a singe gene. The ability to rely on a single gene for initial prediction allows phylogenetic analysis of phages from complex samples without extensive effort or cost. Another advantage of a single-gene approach to phage phylogeny is the ability to determine phage relationships easily during phage isolation by PCR. PCR results confirmed that subcluster-specific primers successfully determined subclusters from diluted and boiled spot tests as well as DNA extracted using a high titer lysate that was diluted and boiled. Thus, this analysis could be performed on very crude phage samples prior to amplification and sequencing, allowing the researcher to focus on phages of particular interest, answer specific ecological questions or simply validate the purity of a sample.
The proposed use of single-gene phage phylogeny prediction can extend to other phage groups beyond mycobacteriophages as evidenced by our single-gene dotplot analysis of siphovirus coliphages. The single-gene dotplots yielded identical phage clustering when compared Figure 8 Cluster relationships are evident in Gepard dotplot alignments of whole genome and TMP sequences from 24 Siphoviridae coliphages. Using the single-gene comparison method, a Gepard dotplot of TMP demonstrates that clusters are identifiable in coliphages based on whole genome comparisons (A) and TMP nucleotide sequences (B). Whole genome and TMP sequences were downloaded from GenBank and Gepard [12] was used to generate dotplots. to the whole genome dotplots (see Figure 8). Thus, the singe-gene approach works for two highly studied phages, the mycobacteriophages and the siphoviridae coliphages. The TMP prediction of relationships is particularly powerful for mycobacteriophages because there are no Podoviridae, 91% are Siphoviridae, and even the Myoviridae of mycobacteriophages contain TMP (Cluster C). Other groups of phages, such as enterobacteriophages, include Podoviridae which lack TMP. Thus a single-gene approach for such phages must utilize an alternative conserved, ubiquitous gene rather than TMP. It is noteworthy that mycobacteria, an acid-fast genus, and E. coli, a gram-negative bacteria, are very different bacterial hosts entertaining phages with little relationship to one another. It is remarkable that TMP could accurately reflect phylogenetic groupings among both mycobacteriophages and coliphages. Full genome analysis is appropriate for phylogenetic verification due to the rapid rate of gene exchange, especially among highly related phages. These results strongly suggest that if a single, ubiquitous, semi-conserved gene can be identified for a group of phages, simple single-gene phylogeny prediction may greatly expand our ability to identify and understand the complexity and vast society of bacteriophages.

DNA extraction and PCR amplification
DNA samples were obtained using three different methods. First, a Promega Wizard® DNA extraction kit was used to purify DNA from a high titer lysate. Second, a 1:21 dilution of a high titer lysate was boiled at 95°C for 10 min. Third, the boiling method was used to isolate DNA obtained from a plaque rather than from a high titer lysate. For direct plaque isolation, a micropipette tip was gently touched to a plaque then placed in 20μl of phage buffer (10 mM Tris (pH7.5), 10 mM MgSO 4 , 0.074 M NaCl) prior to boiling.
PCR primers were obtained from Eurofins MWG Operon (Huntsville, AL) and dissolved in sterile, nuclease-free water to 100 nM. The following PCR conditions were used: 5 μl reaction buffer, 1 μl dNTP's, 0.2 Taq DNA polymerase (Invitrogen® Taq DNA Polymerase (recombinant)), 2 μl MgCl 2 , 1 μl template DNA, 2.5 μl forward primer and 2.5 μl reverse primer and sterile nuclease-free water to a final volume of 25 μl. Reactions were run in an Applied Biosystems GeneAmp PCR System 9700 Thermocycler using an initial 5 min. denaturation at 94°C followed by 30 cycles of 30 sec. denaturation at 94°C, 30 sec. annealing at 55°C, 45 sec. extension at 72°C, and a final extension of 72°C for 5 min. A 5 μl aliquot of each PCR reaction was diluted to 10 μl and loaded in wells of a 2% agarose gel prepared with 1X TAE (0.04M Tris-acetate, 0.001M EDTA). A 100 bp ladder was used as a standard and the samples were electrophoresed at 100 V for 60 min. The gel was visualized and documented using a UVP M-20