Phage cluster relationships identified through single gene analysis
© Smith et al.; licensee BioMed Central Ltd. 2013
Received: 20 February 2013
Accepted: 12 June 2013
Published: 19 June 2013
Phylogenetic comparison of bacteriophages requires whole genome approaches such as dotplot analysis, genome pairwise maps, and gene content analysis. Currently mycobacteriophages, a highly studied phage group, are categorized into related clusters based on the comparative analysis of whole genome sequences. With the recent explosion of phage isolation, a simple method for phage cluster prediction would facilitate analysis of crude or complex samples without whole genome isolation and sequencing. The hypothesis of this study was that mycobacteriophage-cluster prediction is possible using comparison of a single, ubiquitous, semi-conserved gene. Tape Measure Protein (TMP) was selected to test the hypothesis because it is typically the longest gene in mycobacteriophage genomes and because regions within the TMP gene are conserved.
A single gene, TMP, identified the known Mycobacteriophage clusters and subclusters using a Gepard dotplot comparison or a phylogenetic tree constructed from global alignment and maximum likelihood comparisons. Gepard analysis of 247 mycobacteriophage TMP sequences appropriately recovered 98.8% of the subcluster assignments that were made by whole-genome comparison. Subcluster-specific primers within TMP allow for PCR determination of the mycobacteriophage subcluster from DNA samples. Using the single-gene comparison approach for siphovirus coliphages, phage groupings by TMP comparison reflected relationships observed in a whole genome dotplot comparison and confirm the potential utility of this approach to another widely studied group of phages.
TMP sequence comparison and PCR results support the hypothesis that a single gene can be used for distinguishing phage cluster and subcluster assignments. TMP single-gene analysis can quickly and accurately aid in mycobacteriophage classification.
Mycobacteriophages infect Mycobacterium species such as the clinically important Mycobacterium tuberculosis and the nonpathogenic M. smegmatis. Mycobacteriophages are the most studied of all bacteriophages with 2,413 mycobacteriophages isolated, more than 344 genomes fully sequenced (http://phagesdb.org/) and approximately 223 full phage genome sequences available on GenBank, making the analysis of these phages a model for bacteriophage research. The number of mycobacteriophages isolated and sequenced in recent years has led to the identification of genetic relationships and subsequent assignment of phages into 17 clusters and 30 subclusters based on whole genome comparison [1–3]. The genomes vary in size from between 41,441 and 164,602 bp . Comparison of phages within and between clusters has revealed genes in rapid genetic flux and regions that are more likely to have undergone horizontal exchange in relatively recent evolutionary time [3, 4]. This genetic mosaicism contributes to the high level of diversity observed between phages and complicates phylogenetic analysis. Thus, identifying viable genome comparison methods that reflect the multifaceted evolutionary history of phage is fraught with challenges [3, 5–8]. For example, the differences phages exhibit in the number and location of genes and the variety of genomic length results in the inability to utilize maximum likelihood and other traditional methods that require positional homology for determining phylogenic relationships. For mycobacteriophage cluster and subcluster assignment, whole genomes are currently compared primarily by dotplot, but pairwise average nucleotide identities (ANI), pairwise genome maps, and gene content analysis are all considered .
This study demonstrates that a single gene can group mycobacteriophages into the same clusters and subclusters proposed by whole genome dotplot analysis. The ability to predict phylogenetic assignment allows researchers to focus on particular phages during the initial isolation and amplification before whole genome sequencing and may facilitate analysis of complex samples . The Tape Measure Protein (TMP; [9, 10]) which is typically encoded by the longest gene of a phage genome was selected, and the nucleotide and amino acid sequences of TMP were analyzed in 247 mycobacteriophages representing more than 42 subclusters. TMP is also used to identify mycobacteriophage cluster and subcluster by dotplot comparison and by maximum likelihood methods. In addition, PCR evidence suggests identification of cluster-specific sequence similarity in TMP is sufficient for cluster prediction. The Gepard dotplot analysis of TMP is applied to a subset of known coliphages and demonstrates that the single-gene method identifies phage relationships whether the entire genome or the single TMP gene is used for the comparison. Thus, single-gene analysis for phylogenetic prediction is feasible for the two most highly studied groups of phages, those that infect Mycobacteria and those that infect Escherichia coli. Due to the highly mosaic nature of phages, subsequent full genome sequence analysis is appropriate to ensure proper taxonomic assignment reflecting the complex evolutionary history of the phages.
These data support that a single gene can predict phage cluster and subcluster specific classification when properly compared. More specifically, the clusters observed using a single gene maximum likelihood comparison or Gepard dotplot alignment reflect the same clustering that is observed when whole genome comparison is used.
Results and discussion
Dotplot comparison of a single gene can identify clusters similar to whole genome dotplot
The TMP gene is approximately 3000 bp (2200–6800 bp), making it the longest and most easily recognized gene in Siphoviridae mycobacteriophages. While this size is nearly 20 times smaller than the entire genome (40–110 Kbp), the TMP plot reflects the same clustering as the entire genome. The MCP gene is approximately 1,250 bp (800–1600 bp), much smaller than TMP, yet clustering is still evident. Clustering by single gene amino acid sequences (Figure 1D, E) is slightly stronger than the nucleotide plots (Figure 1B, C), which reflects the conservation of protein structure when silent mutations occur in the nucleotide sequence. Whole genome amino acid sequence comparisons are not feasible because genes exist in different frames and orientation across the genome.
Use of a single gene allows global alignment and maximum likelihood comparisons
In Figure 3, all A subclusters extended from the same branch and form consistent and well supported clades. This relationship is also true for the B subclusters. By contrast, the phylogenetic tree reveals a larger distance between the F subclusters as they were not recovered as a monophyletic group. For instance, subcluster F1 branches with I, E, and N clusters, while the F2 subcluster branches with K and G clusters. The similarity between F2, G and K was identified by dotplot analysis as discussed above (Figure 1). This difference suggests that the F1 and F2 subclusters may be their own distinct cluster if utilizing TMP for determining the cluster relationships. Based on these data, single gene global alignment for cluster identification provides further evidence that a single gene can be used to predict phage clusters.
A single gene can distinguish subclusters
Subcluster-conserved sequences within a single gene are identifiable
Tape Measure Protein (TMP) sequence identity between mycobacteriophages within subclusters
% identical sites
% pairwise identity
Phages included in the comparison for primer design
U2, Switzer, jc27, kssjeb
D29, Che12, Trixis, RedRock
Vis, BXZ2, Microwolf, JHC117, Methuselah, Rocklstar, HelDan
Eagle, Backyardigan, Peaches, LHTSCC
George, Airmid, Benedict, Cuco
DaCinci, Gladiator, Hammer
Harvey, Colbert, Hertubise
Ares, Hedgerow, Rosebush, Arbiter, Qyrzula
Daisy, Kamiyu, Piperfish
Stinger, Zemanar, ChrisnMich, Nigel, Frederick, Cooper
Plot, PBI1, Gumball
Kostya, Lilac, Henry
Fruitloop, RockyHorror, Dotproduct
Halo, BPs, Hope
Predator, Konstantine, Barnyard
Brujita, Island3, Babsiella, Che9c
BAKA, LIttleE, Omega
Angelica, Adephagia, CrimD
Upie, LeBron, Faith
PCR Primers designed on conserved regions of TMP for subcluster mycobacteriophages
PCR amplification of TMP verifies phage cluster identity
Alignment-free TMP phylogeny does not distinguish myobacteriophage clusters
As mentioned previously, gene content and genetic identity are highly heterogeneous between phages and thus prevent the application of traditional phylogenetic methods using whole genome sequences. New methods of phylogenetic comparisons have been developed that determine relationships based on the frequency of ‘words’ or ‘features’ so that there is no need to rely on positional homology [19–21]. These feature frequency profile (FFP) approaches allow for alignment-free phylogenetic inferences. When comparing long genome sequences, the small feature length of FFP allows for relationships to be determined regardless of variety in genome length or gene content in the comparative samples. Recently, Sousa et al. demonstrated the ability of alignment-free methods to uncover the known phylogeny of T7 phage variants, all of which were similar in that they were evolved from a parental T7 phage . In contrast to the highly similar T7 phage variants, mycobacteriophages are highly diverse with low sequence identity and novel gene order and content. The diversity could potentially hamper alignment-free analysis; therefore, an FFP alignment-free method was applied to the 79 diverse mycobacteriophage genome dataset with a 20-base feature length.
Altogether, these results reflected a loss of resolution and cluster structure between genome and TMP trees suggesting that the FFP method requires longer sequences (such as whole phage genomes) in the case of mycobacteriophages for reliable relationship determination by FFP. In summary, mycobacteriophage cluster relationships may be determined using either whole genomes in an alignment-free FFP analysis or predicted using single genes (such as TMP) in a global-alignment maximum likelihood analysis.
Single gene comparison of coliphages also yields identifiable clusters
Coliphage groups identified by TMP alignment of 24 Siphoviridiae
Phages included in the proposed grouping
HK75, HK633, mEpX1, HK97, mEp234, HK446
HK022, HK140, mEpX2, mEp235
HK225, N15, mEp237
HK629, lambda, HK630
mEp 043 c-1, mEp213
With the explosion of recently isolated mycobacteriophages, we have access to a large data set of defined clusters and subclusters based on whole-genome analysis (344 mycobacteriophages), but an even larger number of phages have been isolated which are not yet sequenced (2,413 mycobacteriophages) (http://www.phagesdb.org). Our data confirm the use of a single, ubiquitous, semi-conserved gene for the prediction of mycobacteriophage cluster, which is particularly useful when a full genome sequence is unavailable. Irrespective of potential recombination events in the selected TMP gene, global alignment (Figure 1) and Maximum Likelihood or Bayesian Inference (Figure 3) of this single gene accurately recovered phage cluster and subcluster categorization already recognized by the whole-genome methods. Gepard dotplot analysis of TMP proved to be the most reliable method for determining phage relationships, capable of recovering 98.8±1.36% of 247 assigned mycobacteriophage clusters and distinguishing phages beyond cluster, down to the subcluster level with an accuracy of 97.6±1.92%. This predictive ability is most likely due to the algorithms within the dotplot that allow for alignment of sequences with a high mosaic nature, both in sequence and orientation.
Caution must be used with the single-gene approach to determine phage phylogeny. Alignment-free methods, which account for high variability in genome length and gene content, are not designed for single-gene datasets and, accordingly, were not able to reconstruct mycobacteriophage clusters even when a large gene (TMP) was used. This inability reflects the requirement of the FFP method to use much longer sequences in order to capture the phylogenetic relationship among phages. With a whole genome sequence, the FFP method could reliably be used for phage classification, but the method should not be used with a single gene.
Using a single gene to describe evolutionary relationships was recognized as a problem very early in molecular phylogenetics literature [25–27]. Evolution is not linear and molecular and population events such as horizontal gene transfer , incomplete lineage sorting, and gene duplication/extinction  can and do affect our ability to equate gene trees to species trees [30, 31]. This genetic exchange is even more pronounced in phages, which have rapid rates of gene transfer and are thus, highly mosaic [3, 5–8]. Cluster assignment is a simplification of evolutionary history for ease in categorization. For example, although similar phage groups appear using either whole genome sequence or TMP sequence for either mycobacteriophages (Figure 1A vs. 1B) or coliphages (Figure 8), whole genome sequence provides more detailed evolutionary relationships indicative of horizontal gene transfer. Only very weak relationships are seen between coliphage lambda and mEp234 when TMP alone is used in dotplot analysis, while over half the genome shows similarity in the whole genome dotplot.
Despite genome mosaicism, a single-gene that is ubiquitous and highly conserved may provide insight into evolutionary history of phages. Hardies et al. reported that, in a 215 kb phage genome, the genes encoding TMP, TMP chaperonins, and phage tail properties are evolutionarily stable . Belcaid et al. furthered the study of TMP in respect to evolutionary relationships and reported identification of repeated units and markers within TMP that could be used to assess evolutionary relationships . In addition, Casjens et al. show high conservation of enterobacteriophage head coat proteins . Thus, for phages, structural genes may be the best option for a single, ubiquitous, semi-conserved gene that would reflect evolutionary relationships similar to 16S rRNA sequencing for bacterial species. This study is the first to include such a large number of known phage genomes and the ability of the TMP gene to reflect genomic relationships down to cluster and subcluster. Thus, horizontal DNA transfer is not happening at a rate that obscures the existence of mycobacteriophage clusters and subclusters. The data indicate that a TMP gene tree reconstructed using a Maximum Likelihood or Bayesian Inference methods reflect current categorization of phages and thus can be used for a fast and reliable initial phage assignment.
Single-gene categorization of phages is a valuable simplification for research. For instance, a key drawback to conventional methods of determining phage phylogeny is the necessity of whole genome sequence. Whole genome sequencing generally requires purification and amplification of a phage that can be costly, time-consuming and challenging. This study reveals several computational strategies that are able to predict phage relationships based on a singe gene. The ability to rely on a single gene for initial prediction allows phylogenetic analysis of phages from complex samples without extensive effort or cost. Another advantage of a single-gene approach to phage phylogeny is the ability to determine phage relationships easily during phage isolation by PCR. PCR results confirmed that subcluster-specific primers successfully determined subclusters from diluted and boiled spot tests as well as DNA extracted using a high titer lysate that was diluted and boiled. Thus, this analysis could be performed on very crude phage samples prior to amplification and sequencing, allowing the researcher to focus on phages of particular interest, answer specific ecological questions or simply validate the purity of a sample.
The proposed use of single-gene phage phylogeny prediction can extend to other phage groups beyond mycobacteriophages as evidenced by our single-gene dotplot analysis of siphovirus coliphages. The single-gene dotplots yielded identical phage clustering when compared to the whole genome dotplots (see Figure 8). Thus, the singe-gene approach works for two highly studied phages, the mycobacteriophages and the siphoviridae coliphages. The TMP prediction of relationships is particularly powerful for mycobacteriophages because there are no Podoviridae, 91% are Siphoviridae, and even the Myoviridae of mycobacteriophages contain TMP (Cluster C). Other groups of phages, such as enterobacteriophages, include Podoviridae which lack TMP. Thus a single-gene approach for such phages must utilize an alternative conserved, ubiquitous gene rather than TMP.
It is noteworthy that mycobacteria, an acid-fast genus, and E. coli, a gram-negative bacteria, are very different bacterial hosts entertaining phages with little relationship to one another. It is remarkable that TMP could accurately reflect phylogenetic groupings among both mycobacteriophages and coliphages. Full genome analysis is appropriate for phylogenetic verification due to the rapid rate of gene exchange, especially among highly related phages. These results strongly suggest that if a single, ubiquitous, semi-conserved gene can be identified for a group of phages, simple single-gene phylogeny prediction may greatly expand our ability to identify and understand the complexity and vast society of bacteriophages.
DNA extraction and PCR amplification
DNA samples were obtained using three different methods. First, a Promega Wizard® DNA extraction kit was used to purify DNA from a high titer lysate. Second, a 1:21 dilution of a high titer lysate was boiled at 95°C for 10 min. Third, the boiling method was used to isolate DNA obtained from a plaque rather than from a high titer lysate. For direct plaque isolation, a micropipette tip was gently touched to a plaque then placed in 20μl of phage buffer (10 mM Tris (pH7.5), 10 mM MgSO4, 0.074 M NaCl) prior to boiling.
PCR primers were obtained from Eurofins MWG Operon (Huntsville, AL) and dissolved in sterile, nuclease-free water to 100 nM. The following PCR conditions were used: 5 μl reaction buffer, 1 μl dNTP’s, 0.2 Taq DNA polymerase (Invitrogen® Taq DNA Polymerase (recombinant)), 2 μl MgCl2, 1 μl template DNA, 2.5 μl forward primer and 2.5 μl reverse primer and sterile nuclease-free water to a final volume of 25 μl. Reactions were run in an Applied Biosystems GeneAmp PCR System 9700 Thermocycler using an initial 5 min. denaturation at 94°C followed by 30 cycles of 30 sec. denaturation at 94°C, 30 sec. annealing at 55°C, 45 sec. extension at 72°C, and a final extension of 72°C for 5 min. A 5 μl aliquot of each PCR reaction was diluted to 10 μl and loaded in wells of a 2% agarose gel prepared with 1X TAE (0.04M Tris-acetate, 0.001M EDTA). A 100 bp ladder was used as a standard and the samples were electrophoresed at 100 V for 60 min. The gel was visualized and documented using a UVP M-20 Benchtop Transilluminator and BioDoc-It Imaging System (UVP, Upland, CA).
Software and comparison methods
Seventy-nine full genomes were collected from GenBank representing a large extent of diversity of phages infecting Mycobacterium spp. The phage genome, TMP and MCP sequences were collected from GenBank and from http://phagesdb.org phage. The Mycobacteriophages used in the 79-phage comparison included three representative phage per cluster when possible. This was accomplished for clusters A1, A2, A3, A4, A5, A6, B1, B2, B3, B4, D, E, F1, F2, G, I1, J, K1, L1, L2, N, O, but only two of H1, K3, K5 and M, and only one of B5, H2, I2, K2, and K4. GenBank accession numbers [Whole genome, TMP, MCP] for 74 of the 79 phages included: Acadian (B5) [JN699007, AER48941, AER48927], Adephagia (K1) [JF704105, AEJ95790, AEJ95782], Airmid (A5) [JN083853, AEJ93508, AEJ93499], Angelica (K1) [NC_014458, ADL71110, ADL71102], Arbiter [JN618996, AEN79530, AEN79518], Ares (B2) [JN699004, AER48651, AER48637], Avani (F2) [JQ809702], Babsiella (I1) [JN699001, AER48393, AER48384], Backyardigan (A4)[JF704093, AEJ94512, AEJ94502], Baka (J) [JF937090, AEK08089, AEK08068], Barnyard (H2) [NC_004689, AAN02087, AAN02075], Benedict (A5) [JN083852, AEJ93417, AEJ93408], Bongo (M) [JN699628, AER26079, AER26071], BPs (J)[NC_010762, ACB58175, ACB58166], Brujita (I1) [FJ168659, ACI06230, ACI06221], Bxz2 (A3) [NC_004682, AAN01780, AAN01770], Charlie (N) [JN256079, AEL19944, AEL19934], Che12 (A2)[NC_008203, ABE67347, ABE67336], Che9c (I2)[NC_004683, AAN12575, AAN12566], Che9d (F2)[NC_004686, AAN07935, AAN07925], ChrisnMich (B4) [JF704094, AEJ94590, AEJ94580], Colbert (B1)[GQ303259, ACU41174, ACU41158], Cooper (B4) [NC_008195, ABD58142, ABD58129], Corndog (O) [NC_004685, AAN01989, AAN01973], CrimD (K1) [NC_014459, ADL71367, ADL71359], Cuco (A5) [JN408459, AEL17672, AEL17663], Daisy (B3) [JF704095, AEJ94700, AEJ94686], DaVinci (A6) [JF937092, AEK08472, AEK08462], DotProduct (F1) [JN859129, AER14061, AER14053], Eagle (A4) [HM152766, ADL71284, ADL71274], Faith1 (L20 [NC_015584, AEF57198, AEF57190], Fionnbharth (K4)[JN831653, AER26314, AER26306], Firecracker (O)[JN698993 , AER47481, AER47465], Fruitloop (F1)[NC_011288, ACI12328, ACI12320], Gladiator (A6)[JF704097, AEJ95030, AEJ95020], Gumball (D1) [NC_011290, ACI06400, ACI06389], Halo (G) [NC_008202, ABE67273, ABE67264], Hammer (A6)[JF937094, AEK08675, AEK08665], Harvey (B1) [JF937095, AEK08780, AEK08764], Hedgerow (B2) [JN698991, AER47261, AER47247], HelDan (A3) [JF957058, AEJ92019, AEJ92009], Henry (E) [JF937096, AEK08873, AEK08864], Hertubise (B1) [JF937097, AEK09022, AEK09006], Hope (G) [GQ303261, ACU41480, ACU41471], island3 (I1) [HM152765, ADL71200, ADL71191], JC27 (A1) [JF937099, AEK09225, AEK09216], JHC117 [JF704098, AEJ95124, AEJ95114], JoeDirt (L1) [JF704108, AEK07063, AEK07055], Konstantine (H1) [NC_011292, ACI12447, ACI12436], Kostya (E) [NC_011056, ACF34189, ACF34180], KSSJEB [JF937110, AEK10517, AEK10508], Larva (K5) [JN243855, AEL19674, AEL19666], LeBron (L1) [NC_014461, ADL70983, ADL70975], LHTSCC (A4) [JN699015, AER49866, AER49855], Lilac (E) [JN382248, AEL21642, AEL21632], LittleE (J) [JF937101, AEK09416, AEK09398], MacnCheese (K3) [JX042579], Omega (J) [NC_004688, AAN12678, AAN12659], PBI1 (D1) [NC_008198, ABD58443, ABD58433], Phlyer (B3) [NC_012027, ACM42192, ACM42178], Pipefish (B3) [NC_008199, ABD58525, ABD58511], Pixie (K3) [JF937104, AEK09832, AEK09824], PLot (D1) [NC_008200, ABD58627, ABD58616], Predator (H1) [NC_011039, ACF05127, ACF05116], Redi (N) [JN624851, AEN79917, AEN79867], RedRock (A2) [GU339467, ADB93722, ADB93712], Rey (M) [JF937105, AEK09942, AEK09934], RockyHorror (F1) [JF704117, AEK06723, AEK06715], Rumpelstiltskin (L2) [JN680858, AEO94349, AEO94341], Switzer [JF937108, AEK10324, AEK10315], TM4 (A1) [NC_003387, AAD17585, AAD17577], Trixie (A2) [JN408461, AEL17859, AEL17849], UPIE (L1) [JF704113, AEK07560, AEK07552], Yoshi (F2) [JF704115, AEK07768, AEK07758]. Five mycobacteriophage genomes for the 79-phage comparison were downloaded from http://phagesdb.org, and included Archie (L2), Catdawg (0), Frederick (B4), Kratio (K) and Xerxes (N). The genomes from phagesdb.org were unannotated; therefore, DNA Master (http://cobamide2.bio.pitt.edu) was used to auto-annotate the genomes and identify TMP and MCP. For the 247-mycobacteriophage comparison, genomes included the previous 79 along with 157 sequences from GenBank and 11 sequences from the phagesdb.org website. The sequences from phagesdb.org included Bernardo, Hawkeye, HotShotFirst, JAMaL, Mendokysei, Mosby, Odin, Pegleg, Squirty, TA17A, and Whirlwhind. Fasta files of whole genome sequences were downloaded from the http://phagesdb.org website and TMP sequences were identified by Blast searches of the genomes. The 157 mycobacteriophage TMP sequences gathered from GenBank were as follows (cluster) [GenBank Accession number]: 244 (E) [DQ398041], ABU (B1) [JF704091], Adjutor (D1) [EU676000], Aeneas (A1) [JQ809703], Akoma (B3) [JN699006], Alice (C1) [JF704092], Alma (A9) [JN699005], Anaya (K1) [JF704106.1], Angel (G) [NC_012788.1], AnnaL29 (A1) [JN572060], Ardmore (F1) [NC_013936.1], Athena (B3) [JN699003], Ava3 (C1) [JQ911768], Avrafan (G) [JN699002.1], BarrelRoll (K1) [JN643714.1], Bask21 (E) [JF937091.1], Bethlehem (A1) [AY500153], BigNuz (P) [JN412591.1], BillKnuckles (A1) [JN699000], Blue7 (A6) [JN698999], Boomer (F1) [NC_011054.1], BPBiebs31 (A1) [JF957057], Bruns (A1) [JN698998], Butterscotch (D1) [FJ168660], Bxb1 (A1) [AF271693], Bxz1 (C1) [AY129337], Cali (C1) [EU826471], Catera (C1) [DQ398053], Chah (B1) [FJ174694], Che8 (F1) [NC_004680.1], Cjw1 (E) [AY129331], Courthouse (J) [JN698997.1], D29 (A2) [AF022214], Dandelion (C1) [JN412588], DD5 (A1) [EU744252], DeadP (F1) [JN698996.1], DLane (F1) [JF937093.1], Doom (A1) [JN153085], Dori (Singleton) [JN698995.1], Drago (F1) [JN542517.1], Drazdys (C1) [JF704116], Dreamboat (A1) [JN660814], DS6A (Singleton) [JN698994.1], Elph10 (E) [JN391441.1], EricB (A6) [JN049605], ET08 (C1) [GQ303260.1], Euphoria (A1) [JN153086], Eureka (E) [JN412590.1], Fang (B1) [GU247133], Flux (A4) [JQ809701], Gadjet (B3) [JN698992], George (A5) [JF704107], Ghost (C1) [JF704096], Giles (Q) [NC_009993.2], GUmbie (F1) [JN398368.1], Ibhubesi (F1) [JF937098.1], ICleared (A4) [JQ896627], IsaacEli (B1) [JN698990], JacAttac (B1) [JN698989], Jasper (A1) [EU744251], JAWS (K1) [JN185608.1], Jebeks (P) [JN572061.1], Jeffabunny (A6) [JN699019], Kamiyu (B3) [JN699018], KBG (A1) [EU744248], Kikipoo (B1) [JN699017], KLucky39 (B1) [JF704099], Kugel (A1) [JN699016], L5 (A2) [Z18946], Lesedi (A1) [JF937100], Liefie (G) [JN412593.1], LinStu (C1) [JN412592], Llij (F1) [NC_008196.1], Lockley (A1) [EU744249], LRRHood (C1) [GQ303262.1], Marvin (S) [JF704100.1], MeeZee (A4) [JN243856], Microwolf (A3) [JF704101], MoMoMixon (C1) [JN699626], Morgushi (B1) [JN638753], Mozy (F1) [JF937102.1], MrGordo (A1) [JN020140], Murdoc (B1) [JN638752], Museum (A1) [JF937103], Mutaforma13 (F1) [JN020142.1], Myrna (C2) [EU826466], Nappy (C1) [JN699627], Nigel (B4) [EU770221], Nova (D1) [JN699014], Oline (B1) [JN192463], Oosterbaan (B1) [JF704109], Optimus (J) [JF957059.1], Orion (B1) [DQ398046], OSmaximus (B1) [JN006064], Pacc40 (F1) [NC_011287.1], PackMan (A9) [JF704110], Patience (Singleton) [JN412589.1], Peaches (A4) [GQ303263.1], Perseus (A1) [JN572689], PG1 (B1) [AF547430], Phaedrus (B3) [EU816589], Phipps (B1) [JF704102], Pio (C1) [JN699013], Pleione (C1) [JN624850], PMC (F1) [NC_008205.1], Porky (E) [NC_011055.1], Puhltonio (B1) [GQ303264.1], Pukovnik (A2) [EU744250], Pumpkin (E) [GQ303265.1], Qyrzula (B2) [DQ398048], Rakim (E) [JN006062], Ramsey (F1) [NC_011289.1], RidgeCB (A1) [JN398369], Rizal (C1) [EU826467], Rockstar (A3) [JF704111], Rosebush (B2) [AY129334], Saintus (A8) [JN831654], Scoot17C (B1) [GU247134], ScottMcG (C1) [EU826469], Sebata (C1) [JN204348], Send513 (R ) [JF704112.1], Serendipity (B1) [JN006063], SG4 (F1) [JN699012.1], Shaka (A4) [JF792674], Shauna1 (F1) [JN020141.1], ShiLan (F1) [JN020143.1], SirDuracell (E) [JF937106.1], SirHarley (D1) [JF937107], SkiPole (A1) [GU247132], Solon (A1) [EU826470a], Spud (C1) [EU826468], Stinger (B4) [JN699011], Taj (F1) [JX121091.1], TallGrassMM (B1) [JN699010], Thibault (J) [JN201525.1], Thora (B1) [JF957056], ThreeOh3d2 (B1) [JN699009], Tiger (A5) [JX042578], Timshel (A7) [JF957060], TiroTheta9 (A4) [JN561150], Toto (E) [JN006061], Troll4 (D1) [FJ168662], Turbido (A2) [JN408460], Tweety (F1) [NC_009820.1], Twister (A10) [JQ512844], U2 (A1) [AY500152], UncleHowie (B1) [GQ303266.1], Violet (A1) [JN687951], Vista (B1) [JN699008], Vix (A3) [JF704114], Vortex (B1) [JF704103], Wally (C1) [JN699625], Wee (F1) [NC_014901.1], Wildcat (Singleton) [NC_008206.1], Wile (A4) [JN243857], Yoshand (B1) [JF937109], Zemanar (B4) [JF704104].
An additional 24 TMP sequences from coliphages were used which included HK578 [NC_019724], mEp213 [NC_019720], vB_EcoS_Rogue1 [NC_019718], HK446 [NC_019714], HK140 [NC_019710], mEp235 [NC_019708], mEp043 c-1 [NC_019706], mEpX2 [NC_019705], HK630 [NC_019723], HK633 [NC_019719], HK225 [NC_019717], mEp234 [NC_019715], HK629 [NC_019711], mEpX1 [NC_019709], mEp237 [JQ182730], JL1 [NC_019419], HK022 [NC_002166], lambda [NC_001416], JK06 [NC_007291], T1 [NC_005833], HK97 [NC_002167], N15 [NC_001901], and Escherichia phages ADB-2 [NC_019725], and HK75 [NC_016160].
Gepard  was used to generate dotplots of TMP nucleotide and amino acid sequences. All reference to known cluster assignments of mycobacteriophages were designated by Hatfull et al. . For the Maximum Likelihood phylogeny, TMP nucleic acid sequences were aligned using ClustalW  within MEGA4 software . The parameters included free end gaps, 65% similarity cost matrix (5.0/-4.0), 12 gap open penalty, and a 3 gap extension penalty. For primer design, 16–22 bp regions of high similarity were identified where primers could be designed with no more than 3 degenerate positions. This was done in Geneious software . The same alignment was used to infer a phylogeny using Bayesian Inference as implemented in MrBayes 3.2 . Briefly, the best-fit substitution model (GTR+I+G) was estimated using jModelTest . The Markov Chain Monte Carlo simulation was run by 15 million generations in two independent runs (8 chains each; 10% burn-in) and the distribution of sampled trees was summarized in TreeAnnotator 1.7.2 while convergence and mixing was assessed visually in Tracer 1.5 (http://tree.bio.ed.ac.uk/software/). The confidence interval of percent clustered and subclustered phage based on TMP comparison of 247 sequences was determined using a Confidence Interval for Proportions with an alpha level of 0.05 (95% confidence level).
For the alignment-free phylogeny, feature frequency profiles  were used to infer phylogenetic relationships [19–21] using Bacillus cereus PBC1 phage as outgroup. In order to infer a phylogeny the neighbor-joining method was used and the phylogeny bootstrapped 10,000 times to assess nodal support. A 50% majority-rule consensus tree was obtained using Paup* 4.0  and annotated in FigTree 1.3.1 (http://tree.bio.ed.ac.uk/software/figtree). A similar procedure was used to obtain a phylogeny for the TMP gene in all 79 phage species. Word size boundaries were estimated empirically using scripts and documentation provided in the feature frequency profile package. For quantitative comparison of phylogenies, the Matching Splits (MS; ) metric was estimated as implemented in TreeComp . The genealogical sorting index (gsi) was calculated on both genome and TMP phylogenies using an online server (http://www.genealogicalsorting.org) .
We are thankful for Howard Hughes Medical Institute Phage Research Initiative as well as Dr. Graham Hatfull’s laboratory at the University of Pittsburgh for providing us with various mycobacteriophage DNA samples. We appreciate their support and role in establishing and training the Phage Hunters Program at Brigham Young University. We also appreciate the careful analysis of the manuscript by Sherwood R. Casjens, University of Utah. Additional thanks to Michael Daetwyler and Michael Severson for their help with PCR and analysis of TMP gene sequences and Adam V. Gardner for his assistance with dotplot analysis. Eduardo Castro-Nallar was funded by Comisión Nacional de Investigación Científica y Tecnológica (CONICYT), Gobierno de Chile - Becas Chile.
- Pope WH, Jacobs-Sera D, Russell DA, Peebles CL, Al-Atrache Z, Alcoser TA, Alexander LM, Alfano MB, Alford ST, Amy NE, et al: Expanding the diversity of Mycobacteriophages: Insights into genome architecture and evolution. PLoS One. 2011, 6: 1-Google Scholar
- Hatfull GF: Mycobacteriophages: Genes and genomes. Annu Rev Microbiol. 2010, 64 (1): 331-356. 10.1146/annurev.micro.112408.134233.View ArticlePubMedGoogle Scholar
- Hendrix RW, Hatfull GF, Smith MCM: Bacteriophages with tails: chasing their origins and evolution. Res Microbiol. 2003, 154 (4): 253-257. 10.1016/S0923-2508(03)00068-8.View ArticlePubMedGoogle Scholar
- Hatfull GF: The secret lives of Mycobacteriophages. Adv Virus Res. 2012, 82: 179-288.View ArticlePubMedGoogle Scholar
- Casjens SR: Comparative genomics and evolution of the tailed-bacteriophages. Curr Opin Microbiol. 2005, 8 (4): 451-458. 10.1016/j.mib.2005.06.014.View ArticlePubMedGoogle Scholar
- Galtier N, Daubin V: Dealing with incongruence in phylogenomic analyses. Philosophical Transactions of the Royal Society B: Biological Sciences. 2008, 363 (1512): 4023-4029. 10.1098/rstb.2008.0144.View ArticleGoogle Scholar
- Hatfull GF, Jacobs-Sera D, Lawrence JG, Pope WH, Russell DA, Ko CC, Weber RJ, Patel MC, Germane KL, Edgar RH, et al: Comparative genomic analysis of 60 Mycobacteriophage Genomes: Genome clustering, gene acquisition, and gene size. J Mol Biol. 2010, 397 (1): 119-143. 10.1016/j.jmb.2010.01.011.PubMed CentralView ArticlePubMedGoogle Scholar
- Lawrence JG, Hatfull GF, Hendrix RW: Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches. J Bacteriol. 2002, 184 (17): 4891-4905. 10.1128/JB.184.17.4891-4905.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Abuladze NK, Gingery M, Tsai J, Eiserling FA: Tail length determination in Bacteriophage-T4. Virology. 1994, 199 (2): 301-310. 10.1006/viro.1994.1128.View ArticlePubMedGoogle Scholar
- Belcaid M, Bergeron A, Poisson G: The evolution of the tape measure protein: units, duplications and losses. BMC Bioinforma. 2011, 12 (Suppl 9): S10-10.1186/1471-2105-12-S9-S10.View ArticleGoogle Scholar
- Hatfull GF, Pedulla ML, Jacobs-Sera D, Cichon PM, Foley A, Ford ME, Gonda RM, Houtz JM, Hryckowian AJ, Kelchner VA, et al: Exploring the mycobacteriophage metaproteome: Phage genomics as an educational platform. Plos Genet. 2006, 2 (6): 835-847.View ArticleGoogle Scholar
- Krumsiek J, Arnold R, Rattei T: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 2007, 23 (8): 1026-1028. 10.1093/bioinformatics/btm039.View ArticlePubMedGoogle Scholar
- Tetart F, Desplats C, Kutateladze M, Monod C, Ackermann HW, Krisch HM: Phylogeny of the major head and tail genes of the wide-ranging T4-type bacteriophages. J Bacteriol. 2001, 183 (1): 358-366. 10.1128/JB.183.1.358-366.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Comeau AM, Krisch HM: The Capsid of the T4 Phage Superfamily: The evolution, diversity, and structure of some of the most prevalent proteins in the biosphere. Mol Biol Evol. 2008, 25 (7): 1321-1332. 10.1093/molbev/msn080.View ArticlePubMedGoogle Scholar
- Cortines JR, Weigele PR, Gilcrease EB, Casjens SR, Teschke CM: Decoding bacteriophage P22 assembly: identification of two charged residues in scaffolding protein responsible for coat protein interaction. Virology. 2011, 421 (1): 1-11. 10.1016/j.virol.2011.09.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Hauser R, Blasche S, Dokland T, Haggard-Ljungquist E, von Brunn A, Salas M, Casjens S, Molineux I, Uetz P: Bacteriophage protein-protein interactions. Adv Virus Res. 2012, 83: 219-298.PubMed CentralView ArticlePubMedGoogle Scholar
- Rajagopala SV, Casjens S, Uetz P: The protein interaction map of bacteriophage lambda. BMC Microbiol. 2011, 11: 213-10.1186/1471-2180-11-213.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
- Jun SR, Sims GE, Wu GA, Kim SH: Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution. Proc Natl Acad Sci U S A. 2010, 107 (1): 133-138. 10.1073/pnas.0913033107.PubMed CentralView ArticlePubMedGoogle Scholar
- Sims GE, Jun SR, Wu GA, Kim SH: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci U S A. 2009, 106 (8): 2677-2682. 10.1073/pnas.0813249106.PubMed CentralView ArticlePubMedGoogle Scholar
- Sims GE, Kim SH: Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proc Natl Acad Sci U S A. 2011, 108 (20): 8329-8334. 10.1073/pnas.1105168108.PubMed CentralView ArticlePubMedGoogle Scholar
- Sousa A, Ze-Ze L, Silva P, Tenreiro R: Exploring tree-building methods and distinct molecular data to recover a known asymmetric phage phylogeny. Mol Phylogenet Evol. 2008, 48 (2): 563-573. 10.1016/j.ympev.2008.04.030.View ArticlePubMedGoogle Scholar
- Casjens SR, Thuman-Commike PA: Evolution of mosaically related tailed bacteriophage genomes seen through the lens of phage P22 virion assembly. Virology. 2011, 411 (2): 393-415. 10.1016/j.virol.2010.12.046.View ArticlePubMedGoogle Scholar
- Casjens SR: Diversity among the tailed-bacteriophages that infect the Enterobacteriaceae. Res Microbiol. 2008, 159 (5): 340-348. 10.1016/j.resmic.2008.04.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Maddison WP: Gene trees in species trees. Syst Biol. 1997, 46 (3): 523-536. 10.1093/sysbio/46.3.523.View ArticleGoogle Scholar
- Pamilo P, Nei M: Relationships between gene trees and species trees. Mol Biol Evol. 1988, 5 (5): 568-583.PubMedGoogle Scholar
- Rosenberg NA: The probability of Topological concordance of gene trees and species trees. Theor Popul Biol. 2002, 61 (2): 225-247. 10.1006/tpbi.2001.1568.View ArticlePubMedGoogle Scholar
- Haggardljungquist E, Halling C, Calendar R: DNA-Sequences of the Tail Fiber Genes of Bacteriophage P2 - Evidence for Horizontal Transfer of Tail Fiber Genes among Unrelated Bacteriophages. J Bacteriol. 1992, 174 (5): 1462-1477.Google Scholar
- Page RDM, Charleston MA: From gene to organismal phylogeny: Reconciled trees and the gene tree/Species tree problem. Molecular Phylogenetics and Evolution. 1997, 7 (2): 231-240. 10.1006/mpev.1996.0390.View ArticlePubMedGoogle Scholar
- Leaché AD, Rannala B: The accuracy of species tree estimation under simulation: a comparison of methods. Syst Biol. 2011, 60 (2): 126-137. 10.1093/sysbio/syq073.View ArticlePubMedGoogle Scholar
- Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet. 2002, 3 (5): 380-390. 10.1038/nrg795.View ArticlePubMedGoogle Scholar
- Hardies SC, Thomas JA, Serwer P: Comparative genomics of Bacillus thuringiensis phage 0305phi8-36: defining patterns of descent in a novel ancient phage lineage. Virol J. 2007, 4: 97-10.1186/1743-422X-4-97.PubMed CentralView ArticlePubMedGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23 (21): 2947-2948. 10.1093/bioinformatics/btm404.View ArticlePubMedGoogle Scholar
- Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al: Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012, 28 (12): 1647-1649. 10.1093/bioinformatics/bts199.PubMed CentralView ArticlePubMedGoogle Scholar
- Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space. Syst Biol. 2012, 61 (3): 539-542. 10.1093/sysbio/sys029.PubMed CentralView ArticlePubMedGoogle Scholar
- Posada D: jModelTest: Phylogenetic Model Averaging. Mol Biol Evol. 2008, 25 (7): 1253-1256. 10.1093/molbev/msn083.View ArticlePubMedGoogle Scholar
- Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). 2003, Sunderland, MA: Sinauer AssociatesGoogle Scholar
- Bogdanowicz D, Giaro K: Matching split distance for unrooted binary Phylogenetic trees. IEEE/ACM Trans Comput Biol Bioinform. 2012, 9 (1): 150-160.View ArticlePubMedGoogle Scholar
- Bogdanowicz D, Giaro K, Wrobel B: TreeCmp: Comparison of trees in polynomial time. Evol Bioinform. 2012, 8: 475-487.Google Scholar
- Cummings MP, Neel MC, Shaw KL: A genealogical approach to quantifying lineage divergence. Evolution. 2008, 62 (9): 2411-2422. 10.1111/j.1558-5646.2008.00442.x.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.