Core and accessory genome architecture in a group of Pseudomonas aeruginosa Mu-like phages
© Cazares et al.; licensee BioMed Central. 2014
Received: 26 August 2014
Accepted: 11 December 2014
Published: 19 December 2014
Bacteriophages that infect the opportunistic pathogen Pseudomonas aeruginosa have been classified into several groups. One of them, which includes temperate phage particles with icosahedral heads and long flexible tails, bears genomes whose architecture and replication mechanism, but not their nucleotide sequences, are like those of coliphage Mu. By comparing the genomic sequences of this group of P. aeruginosa phages one could draw conclusions about their ontogeny and evolution.
Two newly isolated Mu-like phages of P. aeruginosa are described and their genomes sequenced and compared with those available in the public data banks. The genome sequences of the two phages are similar to each other and to those of a group of P. aeruginosa transposable phages. Comparing twelve of these genomes revealed a common genomic architecture in the group. Each phage genome had numerous genes with homologues in all the other genomes and a set of variable genes specific for each genome. The first group, which comprised most of the genes with assigned functions, was named “core genome”, and the second group, containing mostly short ORFs without assigned functions was called “accessory genome”. Like in other phage groups, variable genes are confined to specific regions in the genome.
Based on the known and inferred functions for some of the variable genes of the phages analyzed here, they appear to confer selective advantages for the phage survival under particular host conditions. We speculate that phages have developed a mechanism for horizontally acquiring genes to incorporate them at specific loci in the genome that help phage adaptation to the selective pressures imposed by the host.
KeywordsPhage Comparative genomics Pangenome Mu-like phages Phage adaptation Horizontal gene transfer
Comparison of closely related bacterial genomes has been useful for the study of genome evolution over small time scales and identification of lateral gene transfer events and strain-specific genes . Comparative analysis of different strains of Pseudomonas aeruginosa has shown that their genomes are mosaics consisting of a conserved component (core genome) interrupted by blocks of variable genes (accessory genome) located in limited chromosomal locations . It has been proposed that P. aeruginosa shapes its accessory genome to favor survival in a wide range of ecological niches, which represents a major evolutionary force influencing genome composition .
Comparative studies of tailed bacteriophage genome sequences have shown a pervasive mosaic genetic architecture, presumably arising from extensive horizontal exchanges and non-homologous genetic recombination of ancestral sequences (see , for a review). Interestingly, the genomes of several bacteriophage groups consist of conserved and variable genes [4–7], like those of bacteria, suggesting that they emerged by a similar evolutionary mechanism. Genes that encode interacting proteins, such as virion structural genes, are usually arranged in continuous modules rarely interrupted by non-homologous recombination. There is evidence for exchange of large blocks of genes that produce fully functional phages that are different from the inferred parents [8, 9]. A type of accessory genes are the so-called "Morons" (units of more DNA), usually single coding regions flanked by a transcription promoter and terminator that are inserted between two adjacent genes in related phages [3, 7]. The nucleotide composition of morons is usually different from that of adjacent genes arguing about the recent acquisition of the elements from a different source. In some cases morons are lysogenic conversion genes expressed from the repressed prophage and apparently conferring a selective benefit on the prophage by benefiting the host.
Bacteriophage Mu, which infects Escherichia coli, is the prototype of various temperate phages found in other bacterial species. Phages D3112 and B3 of Pseudomonas aeruginosa, resemble Mu in that they lysogenize by integrating their genomes at almost random positions in the host chromosome, their DNA replicates by transposition and their viral particles contain heterogeneous segments of host DNA attached at both termini of the phage genomes [10–12]. The viral particles of D3112 and B3 have isometric icosahedral heads like Mu, but their long flexible tails differ from the contractile tail typical of Mu [10, 12]. Although these three phage genomes follow the module order lysis-lysogeny, transposition-replication and virion morphogenesis [10, 12, 13], their sequence diverges at the nucleotide level. Indeed, D3112 and B3 are not closely related phages as they are homologous for only 7.5 kb near their right termini  and B3 presents a notable genetic rearrangement in the left arm of its genome relative to those of D3112 and Mu .
Mu-like prophage sequences have been described in Haemophilus, Neisseria and Deinococcus. Recently, many more P. aeruginosa Mu-like phage genomic sequences have been filed in the public databases, but we are unaware of efforts to compare these genomes and investigate the degree of diversity existing among them. In this work we sequenced the genomes of two locally isolated P. aeruginosa Mu-like phages whose sequences were similar to those of the D3112 group. Analysis of these and other annotated genomes revealed that they bear a common set of conserved genes representing most of the genome and a smaller group of short variable genes located in several specific loci. Following the P. aeruginosa terminology, the common set of genes is called “the core genome”, and the group of variable ORFs, which is different for each phage, is called “the accessory genome”; the sum of core and accessory genomes is “the pangenome”. We speculate that the accessory genes are acquired by horizontal transfer and that they increase the survival capacity of the phage by improving its adaptation to the particular conditions imposed by the ecology of its host.
Results and discussion
General features of the phages PaMx73 and H70
Sequence homology among PaMx73, H70 and Mu-like related genomes
BLASTn alignments showed that PaMx73 and H70 genomes were 87% identical to each other and homologous, to a lesser degree, with other P. aeruginosa phage genomes in public databases. These genomes corresponded either to vegetative phages (D3112 , MP29 , PA1/KOR/2010 , DMS3 , MP38 and MP22 ) or to putative prophages within bacterial genomes (LESB58 prophage 4 , 39016 , 138244  and NCGM2 ) (Figure 2). All these genomes are organized in functional modules similar to those described for D3112, a temperate phage (Additional file 2) . D3112 and MP22 are phages whose genome structures resemble that of coliphage Mu [10, 19]: a left module containing genes involved in the control of early gene expression and transposition-replication, a short middle regulatory region of late genes’ expression, and the right or late region containing morphogenesis genes. The putative promoter positions in the PaMx73 genome map (see above) also match most of the proposed promoter sites in Mu and D3112 genomes [13, 24]. Note that three of the analyzed genomes seemed incomplete as compared with the rest of the group (Figure 2 and Additional file 2): The putative prophage in NCGM2, which lacked around 2000 bp of the left end relative to the other genomes, probably due to a prophage partial deletion; PA1/KOR, about 2000 bp short in the left end ; and the genome of D3112 missing about100 bp at the right end, probably trimmed away during sequence assembly .
The mechanism of transposition-replication has been reported as the hallmark of the Mu-like phages [13, 25, 26]. In phage Mu, three imperfect repeat sequences close to both genome ends are recognized by its transposase [27, 28]. Therefore, we inspected the end regions of the phage genomes compared here for the presence of putative transposase binding sites. In all the genomes three 22 bp conserved putative binding sites were found in tandem next to the ends, except in the incomplete genomes mentioned above. The three binding sites at the left genome end, named L1, L2 and L3, corresponded to imperfect direct repeats located at 10, 93 and 124 bp from the end, whereas imperfect repeats R1, R2 and R3 were positioned at 4, 46 and 93 bp from the right genome end. In this case, R1 was inverted relative to R2 and R3 (Additional file 3). All the sequences and positions for the putative transposase binding sites were well conserved in consistency with the sequence conservation of the putative transposase A gene (see below).
The triplet 5'-TGT, identified at the genome ends of MP22 , has been reported as conserved in Mu-like phages of several bacterial groups [13, 25, 26]. We inspected the ends of the Mu-like phage genomes compared here looking for these terminal sequences. The triplet 5'-TGT was identified at the termini of PaMx73 and H70 during their genomic assemblies (see Methods) and, except for the incomplete genomes mentioned above, the triplet was present in all the genome ends examined in this work. The 5'-TG in the Mu genome has been shown to be important for assembly of a stable transposome complex [29, 30].
Short heterogeneous sequences, 50 to 100 nucleotides long, of P. aeruginosa genomic DNA were recognized flanking the genome ends of PaMx73 and H70 beyond the 5’-TGT sequences (data not shown). As in phage Mu, these host terminal sequences represent remnants of host DNA packaged in the virions, as relics of the mechanism of phage DNA replication by transposition . Host DNA sequences as long as 2 kb have been reported at the right end of D3112 genome . Thus, the shorter length of host DNA segments attached to the right end of the phage genomes was presumably an artifact resulting from the shotgun fragmentation process carried out on the phage DNAs before sequencing.
The core and accessory genomes of the analyzed phages
All vs all BLASTn alignments of the twelve phage genomes indicated that long homologous segments were interspersed with short non-homologous regions, often located at the same relative positions (Figure 2A and B). The long homologous segments, mainly contained conserved ORFs over more than 90% identical among the compared genomes. In contrast, the non-homologous segments contained either different sets of short non-conserved ORFs or functionally conserved genes with heterogeneous sequences. The regularity of the similarity patterns, particularly clear in the genomic right arm regions of PaMx73 and DMS3 (Figure 2A and B), prompted us to organize the phage genomes in a neighbor joining tree (Figure 2C). The result showed two main branches, each with six phage genomes: group 1 represented by the genome of PaMx73 and group 2 by that of DMS3. The set of conserved genes will be referred to as the “core genome” and the short non-conserved ORFs will be denoted as the “accessory genome” (Additional file 2) following Mathee et al., for the genome structure described for P. aeruginosa. The sites where the accessory genes were found will be called “Regions of Genomic Plasticity” or RPGs (Additional file 2) . The sum of core and accessory genomes was named the “pangenome” of this group of phages . As discussed below (Section Accessory genome), there seems to be more than a simple analogy between the concepts of phage and bacterial pangenomes.
Phage genomes containing core and variable components have been described for lambdoid phages , a group of T4-like phages  and cyanophages [5, 6]. The structure of the genomes described here formally parallels that of these phage genomes: the compared genomes share a core interrupted by several variable regions. In the T4-like group the core region primarily includes homologues of essential T4 genes, and the variable genome, located in specific loci named hyper plastic regions (HPRs), contains mostly small genes of unknown function. Nonetheless it is known that some of them encode adaptive functions that allow the phage to elude host exclusion systems (see below). It has been proposed that the core genes have evolved by vertical inheritance whereas the accessory genes have been horizontally transferred . Thus, genomes with core and accessory components seem a common evolutionary strategy for both, temperate and virulent phages.
It has been speculated that variable gene regions in phage genomes are acquired by horizontal transfer and recombination at sites that do not interfere with the expression of essential genes . Variable gene inserts often coincide with deviations in GC content along the genome indicating recent acquisition [10, 32, 33]. PaMx73 and DMS3 genomes showed an average GC content of 64.2% but two of their variable regions (see below, RGPs F and G) corresponded with valleys of GC content as low as 46.3% (Figure 2A and B). Inspection of the sequences flanking the different RPGs in the same genome, or the same type of RPGs in different genomes, did not lead to recognize sequences that could suggest a common recombinase target. As in the case of morons, the mechanism of acquisition of variable genes is mysterious .
Assignment of functions to ORFs in the phage genomes
Based on homology to functional domains, and to amino acid sequences of phage proteins in the data bank, putative functions were assigned to fourteen ORFs in the genomes of PaMx73 and H70 (Additional file 2, bottom). These conserved ORFs were related to regulation of gene expression, DNA replication and virion morphogenesis.
The BLASTp comparative analysis of the twelve phage genomes examined here revealed that 47 ORFs are conserved in most of the phages (i.e. the core genome) (Figure 4, Additional files 2 and 5), except for the three incomplete genomes mentioned above (see section Sequence homology). Of these, 21 ORFs had assigned functions (Figure 4, red and violet arrows) whereas 26 ORFs remained unassigned (Figure 4, green arrows). A list of putative functions encoded by the core ORFs is presented in Additional file 6. Twelve of the thirteen proteins identified by mass spectrometry in the PaMx73 virion were assigned to specific ORFs in the core genome since they had homologs in all the Mu-like genomes analyzed (Figure 4, violet arrows). In contrast, the virion protein encoded by gene h (Figure 3), located between core ORFs 27 (protease-scaffold) and 28 (major head subunit protein) (RGP H, Figure 4), had homologs only in the genomes of similarity group 1 (Figure 2C). We suggest that gene h could have been lost in the genomes of similarity group 2 from a group 1 ancestor because it encodes a structural homolog of a head decoration protein (see above and section Accessory genome), therefore it seems to belong to the head module of the phage genome.
Most homologous core ORFs were 70 to 100% identical at the amino acid level among the compared genomes, but core ORFs 28, 29, 35, 36, 37, were highly similar only among members of each of the two similarity groups (Figure 2) and with lower sequence identity levels (43-60%) between members of different similarity groups. Additionally, there were ORFs that showed poor overall sequence conservation even among the members of the same similarity group. These ORFs, corresponding to the putative repressors, Ner-like proteins and terminases, exhibited 54%, 65% and 36% amino acid sequence identity, respectively. Other putative gene products with variable sequence, but unassigned function, were those encoded by core ORFs 4 and 45 (Figure 4). High variability of repressor and antirepressor (ner) sequences has been observed in the Stx-like coliphages  whereas the sequence variability between the repressors of D3112 and MP22 has been associated with the absence of cross-immunity . Thus, the variation among the putative repressors observed here, likely indicates that these P. aeruginosa Mu-like phages belong to different immunity groups.
ORFs previously overlooked in the phage genomes
The accessory genome of the revised Mu-like phages contained mainly ORFs ranging from 34 to 100 codons that were either phage specific or shared by several phage genomes (Figure 4 and Additional file 6). The accessory ORFs were always located in the phage genome at positions that corresponded to non-homologous regions in the nucleotide sequence alignments (see Figure 2A and B). These regions of genomic plasticity (RPGs) were mainly located in the left arm of the genomes and each contained from zero to six ORFs (Figure 4 and Additional file 2). Nine different RPGs were labeled ‘A to I’ in the genome maps and the different accessory ORFs recognized within each RGP were identified by the corresponding lower case letter and consecutive number (Figure 4). Each genome contained between 7 and 11 accessory ORFs distributed among 4 to 7 RGPs. Note that with the exception of RGPs C and G, the remaining regions contain only one ORF and, therefore, they could be considered as regions of insertion-deletion or “indels”. These accessory genes could be examples of the morons described in lambdoid phages [3, 7]. One interesting example of morons increasing the phage fitness by aiding virion stability is provided by the genes encoding capsid decoration proteins in different phages . These genes have been considered accessory elements since they are absent in very closely related genomes and may be advantageous for the virions under certain conditions [3, 43]. For example, gpD and Dec proteins in lambdoid phages confer virion stability against chelating agents [43, 44], gpD provides mechanical reinforcement to withstand external physical stress in lambda  and Soc in T4 confers capsid resistance against high pH and thermal challenges [46, 47]. Despite proteins h of PaMx73 and gpD of lambda do not show sequence similarity, their genes are syntenic, ie., they are located between the genes encoding protease-scaffold and major head proteins in the corresponding genomes [13, 44]. In addition, proteins h, gpD and the decoration protein Dec of phage L, also present similar molecular mass (~11-14 KDa, Figure 3; [43, 44]). Based on the above coincidences, we suggest that protein h represents a new capsid decoration protein although additional experimental characterization will be necessary to validate this proposal.
Overall, 28 types of accessory ORFs were identified (Figure 4). The estimated size of the accessory genome in each phage represented between 6 to 10% of the genome. These non-homologous regions may have been acquired by recombination mediated by Red-like functions. Short sequences with as little as 78% of identity are used by Red-like functions to recombine short DNA segments into lambdoid phage genomes . A detailed analysis of the sequences flanking the accessory ORFs in Mu-like genomes is needed to speculate about the mechanism of heterologous gene gaining. Furthermore, it has been shown that lambdoid phages encoding their own recombination system bear more mosaic genomes and possess more diverse gene repertoires than those lambdoid phages that do not encode any recombinase, thus increasing the phage diversity and facilitating the possible adaptation to the host .
RGP C, which clusters several ORFs, is reminiscent of the ninR region of phages λ, HK97, HK022 and P22 . These phage genomes bear a group of about ten ORFs between genes P and Q. Like the ORFs in RGPs C, the genes in the nin region are short, less than 100 codons long, closely packed together, dispensable, and unique or shared only among some members of the group. As proposed for nin genes , function of genes in RGPs C and G may help phages to adapt to the particular host they infect. The genes rexA, rexB and ren of phage lambda may also represent examples of accessory genes conferring a selective advantage. The rex genes encode a two-component exclusion system that inhibits the growth of other phages infecting lambda lysogens . The gene product of ren prevents lambda from self-exclusion . It has been proposed that acquisition of novel metabolic capabilities in P. aeruginosa through horizontal gene transfer appears to be a key evolutionary force shaping the bacterial genome which is reflected in the genome plasticity of individual strains . We and other authors  propose that a similar mechanism rules the genomes of the phages in their adaptation to the particular host exclusion functions.
Distribution of homologues for the Mu-like pangenome ORFs in the data bank
To investigate about the nature and origin of core and accessory genes we looked for the frequency of homologous genes for each ORF through BLASTp searches against the non-redundant NCBI database (Additional file 6 and Figure 6). Bacterial and viral sequences accounted for almost all the homologues detected in the database (16559 and 3524, respectively). Several ORFs had homologues across a variety of bacterial species but, interestingly, the core genes had matches mainly from Pseudomonas genomes whereas the accessory genes had matches from the Pseudomonas genera and other most distant bacterial species. On average, the ORFs in the core genome had about three fold more homologues than the accessory ORFs (Figure 6 and Additional file 6) underscoring the essentiality of core gene functions. On the contrary, accessory ORFs generally had a lower number of homologues (Figure 6) with the exception of ORFs d2, h and i which showed about 500 matches or more each. Notice that these exceptions suggest that such ORFs could have been originally core genes that were lost from the phage genomes under conditions where they were dispensable. These ORFs represent interesting candidates to elucidate their function. ORF h, which may encode the capsid decoration protein in the Mu-like phages of the similarity group 1 (Figures 2 and 4), could represent a core head gene that was lost from the phage genomes of the similarity group 2 (Figure 4). Our analysis confirms the observation concerning the low number of homologues to anti-CRIPSR genes in the databases suggesting that they are specific for Mu-like phages and other mobile genetic elements of P. aeruginosa. Note that these results are restricted to the sequences available in the non-redundant NCBI database, therefore a sampling bias exist for the homologues to core and accessory ORFs.
Genome comparison between P. aeruginosaMu-like phage PaMx73 and coliphage Mu
In spite the genomic homology observed among the P. aeruginosa Mu-like phages analyzed here, a BLASTn comparison between the genomic sequences of PaMx73 and coliphage Mu  revealed no significant sequence similarity. Yet, the two genomes showed similar functional modular organization. Based on BLASTp searches, twelve ORFs in the left arm of PaMx73 genome corresponded to homologous genes in the left arm of the Mu genome. These genes encode transcriptional regulation, replicative transposition and head morphogenesis proteins in Mu and show between 23 to 63% amino acid similarity with their corresponding homologues in PaMx73 (Figure 5). However, there were about twenty short ORFs without assigned functions in the Mu and PaMx73 left arm regions that did not show homology. Total genome comparison revealed that the right genomic arms of both phages were different in sequence, ORF number and size. This was expected because the right genome arms encode the tail genes and the two phages differ strikingly in tail morphology. Mu shows the contractile tail of myophages whereas PaMx73 has the flexible tail typical of siphophages (Figure 1). Interestingly, the genomic right arms of Mu and Mu-like myophages that infect other bacterial genera are homologous in gene distribution, size and sequence . Since H70 (Figure 1), D3112 and MP22 phages have also been characterized as siphophages [10, 19], the flexible tail is apparently a unique feature of this group of P. aeruginosa Mu-like viruses. This appears to represent an interesting case of mosaicism between phages of different lineages.
We looked for three other genomic features of Mu in P. aeruginosa Mu-like phages: 1) The invertible G segment, 2) the in-frame translational start of the scaffolding protein within the protease gene and 3) the translational frameshift in the overlapping tail assembly genes . As expected, no G invertible segment was identified among the Mu-like tail genes as the tail structure of this group of syphophages is totally different from the Mu tail. However, a putative internal start site in the protease-encoding gene 27 of PaMx73 was identified at Val 177 codon that could be the initiator of the scaffolding protein because it is preceded by a plausible Shine-Dalgarno sequence (ACGAGGA) by a 9 nucleotide spacer. In spite of the sequence divergence observed between these protease/scaffolding genes and those of Mu (33% of sequence similarity at the amino acid level), they show similar lengths, their internal start sites are located at the same relative positions and their corresponding Shine-Dalgarno sequences are placed to eight bases from the start codons (data not shown; ). Concerning the third feature, the putative overlapping tail assembly genes in PaMx73 could correspond to core ORFs 35 and 36. A slippery sequence T TTT TTC  was located at codons 116 to 118 of core ORF 35, 43 codons ahead of the stop codon (TAA) overlapping the core ORF 36 putative initiation codon (AUG). This configuration would require a -1 frameshifting to read both core ORFs as a unique gene conforming the majority of frameshift sequences analyzed for the tail assembly genes of many phages leaving the -2 frameshift for the Mu genes as exceptional .
The genomic characterization of two locally isolated Pseudomonas aeruginosa bacteriophages showed that they belong to a family of phages and putative prophages of clinical strains reported worldwide, the prototype of which is D3112. The genomic nucleotide sequences of a dozen phages of this group were 50 to 90% identical among themselves, but in regard to the distribution of predicted protein sequences, they were highly syntenic. From a broader perspective, the genomic features indicated that the phages resembled coliphage Mu, however, their tail is flexible and not contractile like those of Mu and Mu-like phages of other bacterial species. The genomes compared here had long homologous regions interspersed with short heterologous blocks. The long conserved regions, which represent most of each genome, contained essential genes encoding replication and regulatory functions and structural proteins of the viral particles, whereas the genes located in the heterologous blocks were variable for each phage and presumably non-essential on the used plating host. This group of accessory genes, which seem to be acquired by horizontal transfer, may represent a selective advantage for the phages. Remarkably, among these were anti-CRISPR genes, which permit certain infections of hosts harbouring the CRISPR-Cas immunity system and a gene encoding a putative decoration protein that could be involved in the capsid stability. These observations extend to another group of phages the concept of “pangenome”, the sum of core and accessory genomes, not only in the way that they are distributed in the chromosomes, but also in their functional and evolutionary implications for phage biology.
Bacterial strains and bacteriophage isolation
The Pseudomonas aeruginosa clinical strains HIM5 and Ps33 were cultivated overnight with shaking at 37°C in Luria-Bertani (LB) medium. Bacteriophage PaMx73 was isolated from an environmental water sample  and bacteriophage H70 from the supernatant of a culture of the lysogenic strain HIM5 (Sepúlveda-Robles and Uc-Mass, personal communication). Strain Ps33 was the host to propagate PaMx73 and H70.
Bacteriophage propagation, purification and electron microscopy
Bacteriophage propagation was performed using the standard soft agar overlay method : 100 μl of phage stock (~108 pfu) were mixed with 300 μl of P. aeruginosa liquid culture and 3 ml of LB top agar. The mixture was overlaid on a plate containing LB solid medium and then incubated overnight at 37°C to produce the confluent lysis of the host cells. The phage particles were recovered by scraping off the top agar layer and adding 5 ml of modified phage buffer (50 mM TrisHCL-pH 8, 10 mM MgSO4, 100 mM NaCl, and 0.01% Gelatine) to the surface of the plate. The agar-containing suspension was taken off the plate, stirred slowly during five hours at 4°C, and then centrifuged at 9300 g for ten minutes. The supernatant was treated with DNase I and RNase (1 μg/ml each, at 37°C for 30 min), and the phage particles were precipitated in 1.4 M NaCl and 16% w/v PEG 8000 at 4°C. The precipitated phage particles were concentrated by centrifugation at 8000 g for 30 min and subsequently purified by CsCl gradient centrifugation as previously described . Dialyzed CsCl-purified phage stocks were used for electron microscopy. 10 μl of phage particles were deposited on a carbon-coated copper grid and incubated 5 min at room temperature. The excess solution was adsorbed with filter paper and the grid was stained twice with uranyl acetate (2%, pH 7) for 30 sec and 2 min, respectively. Grids were examined under a JEM-2000 transmission electron microscope at 80 Kv. Dimensions of the virions were calculated from 15 viral particles.
Bacteriophage DNA extraction, sequencing and assembly
DNA was obtained by phenol-chloroform extraction from CsCl-purified phage suspensions as previously described . High-throughput DNA sequencing was carried out at the National Laboratory of Genomics for Biodiversity (CINVESTAV, Irapuato, Mexico) using the Roche/454 system for PaMx73 , and the SOLiD technology for H70 DNAs. The 454 sequence reads were preprocessed with the Newbler assembler using default values (http://www.roche-applied-science.com) whereas SOLiD reads were preprocessed using the Applied Biosystems de novo assembly accessories. The phage genomes were assembled de novo using Velvet v1.1 , and refinement of the assemblies was performed by inspection. The reads mapping at the genome ends were trimmed from the final sequence until the last conserved nucleotide in all the cases.
Genome annotation and sequence analysis
The coding sequences or in PaMx73 and H70 genomes were predicted with heuristic Hidden Markov Models using GeneMark v1.1 . The location of ORFs positions was further corrected identifying ribosome binding sites with rbs_finder.pl . Determination and visualization of GC contents were performed with Artemis . BLASTp searches  against the non-redundant protein database on the NCBI server were carried out with the predicted ORF products to identify homologous sequences and improve the genome annotation. Conserved protein domains and protein families were searched with InterProScan  and NCBI-CDD . The Artemis annotation tool  was used to conduct the functional genome annotation integrating BLAST, InterPro and CDD data. The non-coding regions of the phage genomes were screened for the presence of putative promoter sequences using BPROM (Softberry, Inc.) and Neural Network Promoter Prediction (NNPP)  programs. The promoter analysis tool hosted in PRODORIC website  was then used to scan the putative promoter sequences searching for transcription factor binding sites specific for P. aeruginosa.
The nucleotide genome sequences and annotations of PaMx73 and H70 were deposited in GenBank under accession numbers JQ067085 and KM233689, respectively.
SDS-PAGE and mass spectrometry analysis of the virión structural proteins
CsCl-purified phage particles were resuspended in Laemmli loading buffer and boiled for 5 min. The mixture was loaded onto a 10% SDS-PAGE gel and the component proteins were resolved at 180 volt for 1.5 h. Protein bands were visualized by staining with Coomassie Brilliant Blue R250 dye and a pre-stained SDS-PAGE broad range protein standard (BioRad Hercules, CA, USA) was used to estimate the molecular weight of the observed proteins. The protein bands were carefully excised from the Coomassie-stained SDS gel and destained for 12 h with a mixture of 50% methanol and 5% acetic acid. The destained slices were washed with deionized water, soaked for 10 min in 100 mM ammonium bicarbonate, dehydrated with 100% acetonitrile and vacuum-dried. Proteins were reduced with 10 mM DTT and S-alkylated cisteine with 100 mM iodoacetamide in 100 mM ammonium bicarbonate. In-gel digestion was performed by adding 600 ng of mass spectrometry-grade trypsin (Promega, Madison, WI, USA) in 50 mM ammonium bicarbonate followed by overnight incubation at room temperature. Peptides were extracted twice with 50% acetonitrile and 5% formic acid for 30 min and the extracts were vacuum-dried and resuspended in 20 μL of 0.1% formic acid. Analysis of tryptic peptides was carried out using an integrated nano-LC_ESI_MS/MS system. Spectra were acquired in automated mode using data-dependent acquisition (DDA) and DDA raw data files were processed and subsequently converted to peak lists (pkl format) using the ProteinLynx Global Server v2.4 (PLGS) software (Waters Corporation). The mass spectra data in pkl files were compared with the putative protein sequences of PaMx73 using PLGS, OMSSA  and MASCOT (Version 1.6b9, Matrix Science, London; available at http://www.matrixscience.com) search algorithms to achieve the protein identification.
Computational modeling of PaMx73 virion proteins
Virion structural proteins identified by mass spectrometry analysis but without function inferred by sequence homology were selected to predict their 3D structures and functions using the I-TASSER platform (http://zhanglab.ccmb.med.umich.edu/I-TASSER/). The putative amino acid sequences of the selected proteins were submitted for computational modeling to the I-TASSER web server following the procedure described . 3D models with a minimal C-score of -3 or higher were considered reliable structures. Minimum TM-score and coverage values of 0.5 and 0.6, respectively, and functional congruence among the structural matches observed for each predicted model, were the criteria taken into account to consider a structural alignment as significant.
Comparative genome analysis
Genomes with sequence homology to PaMx73 and H70 were found via BLASTn searches  against the nucleotide collection of NCBI. The homologous genomes were acquired from NCBI under accession numbers [GenBank:FM209186] (LESB58), [GenBank:NC_005178] (D3112), [GenBank:NC_011613] (MP29), [GenBank:HM624080] (PA1/KOR), [GenBank:NC_008717] (DMS3), [GenBank:CM001020] (39016), [GenBank:NC_011611] (MP38), [GenBank:AEVV01000017] (138244), [GenBank:NC_009818] (MP22) and [GenBank:AP012280] (NCGM2). Full lengths of putative prophage sequences (see the text, section Sequence homology) were determined by identifying the triplet 5'-TGT, conserved at termini of vegetative Mu-like phages, or detecting the last prophage ORF matching with the rest of compared genomes. Genomic comparisons at nucleotide level were performed with BLASTn to identify the extension and location of homologous regions. Percentages of nucleotide identity were calculated from alignments performed with MUMer v3.0  and genome maps were constructed using in-house scripts. A neighbor joining tree was constructed based on a multiple genome alignment made with Mauve , using a progressive alignment with default settings. Homology searches at protein level were carried out following an all-versus-all strategy with BLASTp to identify the ORFs corresponding to core and accessory components of the phage genomes. Phage ORFs were considered homologous if they were syntenic among compared genomes and their BLASTp matches had a maximun e-value of 1e-05. Additionally, BLASTp searches were used to detect ORFs that were overlooked in the annotations of genomes acquired from NCBI. The previously overlooked ORFs were then considered to determine core and accessory genomes.
The number of homologues deposited in GenBank for each ORF in the pangenome was determined by BLASTp searches. The core ORFs of PaMx73 were used as query sequences for the core genome whereas the accessory ORFs of the different phage genomes were used as query to examine the accessory genome (Additional file 6). The similar sequences detected through the BLASTp searches were considered reliable homologues if the sequences shared at least 75% of their total length, with minimal similarity coverage of 75% of the total alignment and if the hit had a maximun e-value of 1e-03. The information about the organism harboring each homologue was used to classify them into the main categories: Viruses or Bacteria. Matches to vectors sequences were eliminated by inspection during the search process.
This work was funded by grants from the Consejo Nacional de Ciencia y Tecnología (CONACYT number 166814) and Instituto de Ciencia y Tecnología del Distrito Federal (ICyT number PICSA 11–107). AC was recipient of a fellowship from CONACYT (number 233018). We thank Dr. Omar Sepúlveda for providing phage PaMx73 and M. Sc. Victor Flores for kindly supply the scripts used for genomic analyses. We wish to thank María de Lourdes Rojas-Morales for her technical assistance in our electron microscopy studies. We also appreciate the careful analysis to the manuscript and valuable suggestions by Donald Court, Ry Young, Gabriel Moreno, Rosa Bermudez, Luis Kameyama and two anonymous reviewers.
- Medigue C, Moszer I: Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol. 2007, 158 (10): 724-736. 10.1016/j.resmic.2007.09.009.PubMedView ArticleGoogle Scholar
- Mathee K, Narasimhan G, Valdes C, Qiu X, Matewish JM, Koehrsen M, Rokas A, Yandava CN, Engels R, Zeng E, Olavarietta R, Doud M, Smith RS, Montgomery P, White JR, Godfrey PA, Kodira C, Birren B, Galagan JE, Lory S: Dynamics of Pseudomonas aeruginosa genome evolution. Proc Natl Acad Sci U S A. 2008, 105 (8): 3100-3105. 10.1073/pnas.0711982105.PubMed CentralPubMedView ArticleGoogle Scholar
- Hendrix RW, Lawrence JG, Hatfull GF, Casjens S: The origins and ongoing evolution of viruses. Trends Microbiol. 2000, 8 (11): 504-508. 10.1016/S0966-842X(00)01863-1.PubMedView ArticleGoogle Scholar
- Comeau AM, Bertrand C, Letarov A, Tetart F, Krisch HM: Modular architecture of the T4 phage superfamily: a conserved core genome and a plastic periphery. Virology. 2007, 362 (2): 384-396. 10.1016/j.virol.2006.12.031.PubMedView ArticleGoogle Scholar
- Labrie SJ, Frois Moniz K, Osburne MS, Kelly L, Roggensack SE, Sullivan MB, Gearin G, Zeng Q, Fitzgerald M, Henn MR, Chisholm SW: Genomes of marine cyanopodoviruses reveal multiple origins of diversity. Environ Microbiol. 2013, 15 (5): 1356-1376. 10.1111/1462-2920.12053.PubMedView ArticleGoogle Scholar
- Sullivan MB, Huang KH, Ignacio Espinoza JC, Berlin AM, Kelly L, Weigele PR, DeFrancesco AS, Kern SE, Thompson LR, Young S, Yandava C, Fu R, Krastins B, Chase M, Sarracino D, Osburne MS, Henn MR, Chisholm SW: Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ Microbiol. 2010, 12 (11): 3035-3056. 10.1111/j.1462-2920.2010.02280.x.PubMed CentralPubMedView ArticleGoogle Scholar
- Juhala RJ, Ford ME, Duda RL, Youlton A, Hatfull GF, Hendrix RW: Genomic sequences of bacteriophages HK97 and HK022: pervasive genetic mosaicism in the lambdoid bacteriophages. J Mol Biol. 2000, 299 (1): 27-51. 10.1006/jmbi.2000.3729.PubMedView ArticleGoogle Scholar
- Hendrix RW: Bacteriophage genomics. Curr Opin Microbiol. 2003, 6 (5): 506-511. 10.1016/j.mib.2003.09.004.PubMedView ArticleGoogle Scholar
- Hendrix RW: Bacteriophages: evolution of the majority. Theor Popul Biol. 2002, 61 (4): 471-480. 10.1006/tpbi.2002.1590.PubMedView ArticleGoogle Scholar
- Wang PW, Chu L, Guttman DS: Complete sequence and evolutionary genomic analysis of the Pseudomonas aeruginosa transposable bacteriophage D3112. J Bacteriol. 2003, 186 (2): 400-410.View ArticleGoogle Scholar
- Roncero C, Darzins A, Casadaban MJ: Pseudomonas aeruginosa transposable bacteriophages D3112 and B3 require pili and surface growth for adsorption. J Bacteriol. 1990, 172 (4): 1899-1904.PubMed CentralPubMedGoogle Scholar
- Braid MD, Silhavy JL, Kitts CL, Cano RJ, Howe MM: Complete genomic sequence of bacteriophage B3, a Mu-like phage of Pseudomonas aeruginosa. J Bacteriol. 2004, 186 (19): 6560-6574. 10.1128/JB.186.19.6560-6574.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Morgan GJ, Hatfull GF, Casjens S, Hendrix RW: Bacteriophage Mu genome sequence: analysis and comparison with Mu-like prophages in Haemophilus, Neisseria and Deinococcus. J Mol Biol. 2002, 317 (3): 337-359. 10.1006/jmbi.2002.5437.PubMedView ArticleGoogle Scholar
- Sepulveda-Robles O, Kameyama L, Guarneros G: High diversity and novel species of Pseudomonas aeruginosa bacteriophages. Appl Environ Microbiol. 2012, 78 (12): 4510-4515. 10.1128/AEM.00065-12.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee DG, Urbach JM, Wu G, Liberati NT, Feinbaum RL, Miyata S, Diggins LT, He J, Saucier M, Deziel E, Friedman L, Li L, Grills G, Montgomery K, Kucherlapati R, Rahme LG, Ausubel FM: Genomic analysis reveals that Pseudomonas aeruginosa virulence is combinatorial. Genome Biol. 2006, 7 (10): R90-10.1186/gb-2006-7-10-r90.PubMed CentralPubMedView ArticleGoogle Scholar
- Chung IY, Cho YH: Complete genome sequences of two Pseudomonas aeruginosa temperate phages, MP29 and MP42, which lack the phage-host CRISPR interaction. J Virol. 2012, 86 (15): 8336-10.1128/JVI.01127-12.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim S, Rahman M, Kim J: Complete genome sequence of Pseudomonas aeruginosa lytic bacteriophage PA1O which resembles temperate bacteriophage D3112. J Virol. 2012, 86 (6): 3400-3401. 10.1128/JVI.07191-11.PubMed CentralPubMedView ArticleGoogle Scholar
- Zegans ME, Wagner JC, Cady KC, Murphy DM, Hammond JH, O'Toole GA: Interaction between bacteriophage DMS3 and host CRISPR region inhibits group behaviors of Pseudomonas aeruginosa. J Bacteriol. 2009, 191 (1): 210-219. 10.1128/JB.00797-08.PubMed CentralPubMedView ArticleGoogle Scholar
- Heo YJ, Chung IY, Choi KB, Lau GW, Cho YH: Genome sequence comparison and superinfection between two related Pseudomonas aeruginosa phages, D3112 and MP22. Microbiology. 2007, 153 (Pt 9): 2885-2895.PubMedView ArticleGoogle Scholar
- Winstanley C, Langille MG, Fothergill JL, Kukavica-Ibrulj I, Paradis-Bleau C, Sanschagrin F, Thomson NR, Winsor GL, Quail MA, Lennard N, Bignell A, Clarke L, Seeger K, Saunders D, Harris D, Parkhill J, Hancock RE, Brinkman FS, Levesque RC: Newly introduced genomic prophage islands are critical determinants of in vivo competitiveness in the Liverpool epidemic strain of Pseudomonas aeruginosa. Genome Res. 2009, 19 (1): 12-23.PubMed CentralPubMedView ArticleGoogle Scholar
- Stewart RM, Wiehlmann L, Ashelford KE, Preston SJ, Frimmersdorf E, Campbell BJ, Neal TJ, Hall N, Tuft S, Kaye SB, Winstanley C: Genetic characterization indicates that a specific subpopulation of Pseudomonas aeruginosa is associated with keratitis infections. J Clin Microbiol. 2011, 49 (3): 993-1003. 10.1128/JCM.02036-10.PubMed CentralPubMedView ArticleGoogle Scholar
- Soares-Castro P, Marques D, Demyanchuk S, Faustino A, Santos PM: Draft genome sequences of two Pseudomonas aeruginosa clinical isolates with different antibiotic susceptibilities. J Bacteriol. 2011, 193 (19): 5573-10.1128/JB.05446-11.PubMed CentralPubMedView ArticleGoogle Scholar
- Miyoshi Akiyama T, Kuwahara T, Tada T, Kitao T, Kirikae T: Complete genome sequence of highly multidrug-resistant Pseudomonas aeruginosa NCGM2.S1, a representative strain of a cluster endemic to Japan. J Bacteriol. 2011, 193 (24): 7010-10.1128/JB.06312-11.PubMed CentralPubMedView ArticleGoogle Scholar
- Bidnenko EM, Akhverdian VZ, Krylov VN: [Transcriptional mapping and study of transcription regulation of the Pseudomonas aeruginosa phage-transposon D3112]. Genetika. 2000, 36 (12): 1645-1655.PubMedGoogle Scholar
- Fogg PC, Hynes AP, Digby E, Lang AS, Beatty JT: Characterization of a newly discovered Mu-like bacteriophage, RcapMu, in Rhodobacter capsulatus strain SB1003. Virology. 2011, 421 (2): 211-221. 10.1016/j.virol.2011.09.028.PubMedView ArticleGoogle Scholar
- Summer EJ, Gonzalez CF, Carlisle T, Mebane LM, Cass AM, Savva CG, LiPuma J, Young R: Burkholderia cenocepacia phage BcepMu and a family of Mu-like phages encoding potential pathogenesis factors. J Mol Biol. 2004, 340 (1): 49-65. 10.1016/j.jmb.2004.04.053.PubMedView ArticleGoogle Scholar
- Kahmann R, Kamp D: Nucleotide sequences of the attachment sites of bacteriophage Mu DNA. Nature. 1979, 280 (5719): 247-250. 10.1038/280247a0.PubMedView ArticleGoogle Scholar
- van Drunen CM, Mientjes E, van Zuylen O, van de Putte P, Goosen N: Transposase A binding sites in the attachment sites of bacteriophage Mu that are essential for the activity of the enhancer and A binding sites that promote transposition towards Fpro-lac. Nucleic Acids Res. 1994, 22 (5): 773-779. 10.1093/nar/22.5.773.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee I, Harshey RM: Importance of the conserved CA dinucleotide at Mu termini. J Mol Biol. 2001, 314 (3): 433-444. 10.1006/jmbi.2001.5177.PubMedView ArticleGoogle Scholar
- Lee I, Harshey RM: The conserved CA/TG motif at Mu termini: T specifies stable transpososome assembly. J Mol Biol. 2003, 330 (2): 261-275. 10.1016/S0022-2836(03)00574-6.PubMedView ArticleGoogle Scholar
- Tettelin H, Riley D, Cattuto C, Medini D: Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol. 2008, 11 (5): 472-477. 10.1016/j.mib.2008.09.006.PubMedView ArticleGoogle Scholar
- Cumby N, Davidson AR, Maxwell KL: The moron comes of age. Bacteriophage. 2012, 2 (4): 225-228.PubMed CentralPubMedView ArticleGoogle Scholar
- Daubin V, Lerat E, Perriere G: The source of laterally transferred genes in bacterial genomes. Genome Biol. 2003, 4 (9): R57-10.1186/gb-2003-4-9-r57.PubMed CentralPubMedView ArticleGoogle Scholar
- Lavigne R, Ceyssens PJ, Robben J: Phage proteomics: applications of mass spectrometry. Methods Mol Biol. 2009, 502: 239-251. 10.1007/978-1-60327-565-1_14.PubMedView ArticleGoogle Scholar
- Roy A, Kucukural A, Zhang Y: I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010, 5 (4): 725-738. 10.1038/nprot.2010.5.PubMed CentralPubMedView ArticleGoogle Scholar
- Iwai H, Forrer P, Pluckthun A, Guntert P: NMR solution structure of the monomeric form of the bacteriophage lambda capsid stabilizing protein gpD. J Biomol NMR. 2005, 31 (4): 351-356. 10.1007/s10858-005-0945-7.PubMedView ArticleGoogle Scholar
- Pell LG, Liu A, Edmonds L, Donaldson LW, Howell PL, Davidson AR: The X-ray crystal structure of the phage lambda tail terminator protein reveals the biologically relevant hexameric ring structure and demonstrates a conserved mechanism of tail termination among diverse long-tailed phages. J Mol Biol. 2009, 389 (5): 938-951. 10.1016/j.jmb.2009.04.072.PubMedView ArticleGoogle Scholar
- Barbirz S, Muller JJ, Uetrecht C, Clark AJ, Heinemann U, Seckler R: Crystal structure of Escherichia coli phage HK620 tailspike: podoviral tailspike endoglycosidase modules are evolutionarily related. Mol Microbiol. 2008, 69 (2): 303-316. 10.1111/j.1365-2958.2008.06311.x.PubMedView ArticleGoogle Scholar
- Muller JJ, Barbirz S, Heinle K, Freiberg A, Seckler R, Heinemann U: An intersubunit active site between supercoiled parallel beta helices in the trimeric tailspike endorhamnosidase of Shigella flexneri Phage Sf6. Structure. 2008, 16 (5): 766-775. 10.1016/j.str.2008.01.019.PubMedView ArticleGoogle Scholar
- Xiang Y, Leiman PG, Li L, Grimes S, Anderson DL, Rossmann MG: Crystallographic insights into the autocatalytic assembly mechanism of a bacteriophage tail spike. Mol Cell. 2009, 34 (3): 375-386. 10.1016/j.molcel.2009.04.009.PubMed CentralPubMedView ArticleGoogle Scholar
- Rodriguez Rubio L, Martinez B, Donovan DM, Rodriguez A, Garcia P: Bacteriophage virion-associated peptidoglycan hydrolases: potential new enzybiotics. Crit Rev Microbiol. 2013, 39 (4): 427-434. 10.3109/1040841X.2012.723675.PubMedView ArticleGoogle Scholar
- Smith DL, Rooks DJ, Fogg PC, Darby AC, Thomson NR, McCarthy AJ, Allison HE: Comparative genomics of Shiga toxin encoding bacteriophages. BMC Genomics. 2012, 13: 311-10.1186/1471-2164-13-311.PubMed CentralPubMedView ArticleGoogle Scholar
- Gilcrease EB, Winn-Stapley DA, Hewitt FC, Joss L, Casjens SR: Nucleotide sequence of the head assembly gene cluster of bacteriophage L and decoration protein characterization. J Bacteriol. 2005, 187 (6): 2050-2057. 10.1128/JB.187.6.2050-2057.2005.PubMed CentralPubMedView ArticleGoogle Scholar
- Wendt JL, Feiss M: A fragile lattice: replacing bacteriophage lambda’s head stability gene D with the shp gene of phage 21 generates the Mg2 + -dependent virus, lambda shp. Virology. 2004, 326 (1): 41-46. 10.1016/j.virol.2004.05.024.PubMedView ArticleGoogle Scholar
- Hernando Perez M, Lambert S, Nakatani Webster E, Catalano CE, De Pablo PJ: Cementing proteins provide extra mechanical stabilization to viral cages. Nat Commun. 2014, 5: 4520-PubMedView ArticleGoogle Scholar
- Ishii T, Yanagida M: The two dispensable structural proteins (soc and hoc) of the T4 phage capsid; their purification and properties, isolation and characterization of the defective mutants, and their binding with the defective heads in vitro. J Mol Biol. 1977, 109 (4): 487-514. 10.1016/S0022-2836(77)80088-0.PubMedView ArticleGoogle Scholar
- Steven AC, Greenstone HL, Booy FP, Black LW, Ross PD: Conformational changes of a viral capsid protein. Thermodynamic rationale for proteolytic regulation of bacteriophage T4 capsid expansion, co-operativity, and super-stabilization by soc binding. J Mol Biol. 1992, 228 (3): 870-884. 10.1016/0022-2836(92)90871-G.PubMedView ArticleGoogle Scholar
- Martinsohn JT, Radman M, Petit MA: The lambda red proteins promote efficient recombination between diverged sequences: implications for bacteriophage genome mosaicism. PLoS Genet. 2008, 4 (5): e1000065-10.1371/journal.pgen.1000065.PubMed CentralPubMedView ArticleGoogle Scholar
- Bobay LM, Touchon M, Rocha EP: Manipulating or superseding host recombination functions: a dilemma that shapes phage evolvability. PLoS Genet. 2013, 9 (9): e1003825-10.1371/journal.pgen.1003825.PubMed CentralPubMedView ArticleGoogle Scholar
- Bondy-Denomy J, Pawluk A, Maxwell KL, Davidson AR: Bacteriophage genes that inactivate the CRISPR/Cas bacterial immune system. Nature. 2013, 493 (7432): 429-432.PubMedView ArticleGoogle Scholar
- Pawluk A, Bondy Denomy J, Cheung VH, Maxwell KL, Davidson AR: A New group of phage anti-CRISPR genes inhibits the type I-E CRISPR-Cas system of Pseudomonas aeruginosa. MBio. 2014, 5 (2): e00896-14.PubMed CentralPubMedView ArticleGoogle Scholar
- Tock MR, Dryden DT: The biology of restriction and anti-restriction. Curr Opin Microbiol. 2005, 8 (4): 466-472. 10.1016/j.mib.2005.06.003.PubMedView ArticleGoogle Scholar
- Comeau AM, Krisch HM: War is peace–dispatches from the bacterial and phage killing fields. Curr Opin Microbiol. 2005, 8 (4): 488-494. 10.1016/j.mib.2005.06.004.PubMedView ArticleGoogle Scholar
- Parma DH, Snyder M, Sobolevski S, Nawroz M, Brody E, Gold L: The Rex system of bacteriophage lambda: tolerance and altruistic cell death. Genes Dev. 1992, 6 (3): 497-510. 10.1101/gad.6.3.497.PubMedView ArticleGoogle Scholar
- Xu J, Hendrix RW, Duda RL: Conserved translational frameshift in dsDNA bacteriophage tail assembly genes. Mol Cell. 2004, 16 (1): 11-21. 10.1016/j.molcel.2004.09.006.PubMedView ArticleGoogle Scholar
- Sambrook J, Russell DW: Molecular Cloning : A Laboratory Manual. 2001, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 3Google Scholar
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18 (5): 821-829. 10.1101/gr.074492.107.PubMed CentralPubMedView ArticleGoogle Scholar
- Borodovsky M, Mills R, Besemer J, Lomsadze A: Prokaryotic gene prediction using GeneMark and GeneMark.hmm. Curr Protoc Bioinform. 2003, 4: 4-5.Google Scholar
- Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL: A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics. 2001, 17 (12): 1123-1130. 10.1093/bioinformatics/17.12.1123.PubMedView ArticleGoogle Scholar
- Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA: Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008, 24 (23): 2672-2676. 10.1093/bioinformatics/btn529.PubMed CentralPubMedView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralPubMedView ArticleGoogle Scholar
- Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, et al: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37 (Database issue): D211-D215.PubMed CentralPubMedView ArticleGoogle Scholar
- Marchler Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH: CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 2010, 39 (Database issue): D225-D229.PubMed CentralPubMedGoogle Scholar
- Reese MG: Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem. 2001, 26 (1): 51-56. 10.1016/S0097-8485(01)00099-7.PubMedView ArticleGoogle Scholar
- Munch R, Hiller K, Grote A, Scheer M, Klein J, Schobert M, Jahn D: Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes. Bioinformatics. 2005, 21 (22): 4187-4189. 10.1093/bioinformatics/bti635.PubMedView ArticleGoogle Scholar
- Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH: Open mass spectrometry search algorithm. J Proteome Res. 2004, 3 (5): 958-964. 10.1021/pr0499491.PubMedView ArticleGoogle Scholar
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol. 2004, 5 (2): R12-10.1186/gb-2004-5-2-r12.PubMed CentralPubMedView ArticleGoogle Scholar
- Darling AE, Mau B, Perna NT: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010, 5 (6): e11147-10.1371/journal.pone.0011147.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.