Horizontal gene transfer in Histophilus somni and its role in the evolution of pathogenic strain 2336, as determined by comparative genomic analyses

Background Pneumonia and myocarditis are the most commonly reported diseases due to Histophilus somni, an opportunistic pathogen of the reproductive and respiratory tracts of cattle. Thus far only a few genes involved in metabolic and virulence functions have been identified and characterized in H. somni using traditional methods. Analyses of the genome sequences of several Pasteurellaceae species have provided insights into their biology and evolution. In view of the economic and ecological importance of H. somni, the genome sequence of pneumonia strain 2336 has been determined and compared to that of commensal strain 129Pt and other members of the Pasteurellaceae. Results The chromosome of strain 2336 (2,263,857 bp) contained 1,980 protein coding genes, whereas the chromosome of strain 129Pt (2,007,700 bp) contained only 1,792 protein coding genes. Although the chromosomes of the two strains differ in size, their average GC content, gene density (total number of genes predicted on the chromosome), and percentage of sequence (number of genes) that encodes proteins were similar. The chromosomes of these strains also contained a number of discrete prophage regions and genomic islands. One of the genomic islands in strain 2336 contained genes putatively involved in copper, zinc, and tetracycline resistance. Using the genome sequence data and comparative analyses with other members of the Pasteurellaceae, several H. somni genes that may encode proteins involved in virulence (e.g., filamentous haemaggutinins, adhesins, and polysaccharide biosynthesis/modification enzymes) were identified. The two strains contained a total of 17 ORFs that encode putative glycosyltransferases and some of these ORFs had characteristic simple sequence repeats within them. Most of the genes/loci common to both the strains were located in different regions of the two chromosomes and occurred in opposite orientations, indicating genome rearrangement since their divergence from a common ancestor. Conclusions Since the genome of strain 129Pt was ~256,000 bp smaller than that of strain 2336, these genomes provide yet another paradigm for studying evolutionary gene loss and/or gain in regard to virulence repertoire and pathogenic ability. Analyses of the complete genome sequences revealed that bacteriophage- and transposon-mediated horizontal gene transfer had occurred at several loci in the chromosomes of strains 2336 and 129Pt. It appears that these mobile genetic elements have played a major role in creating genomic diversity and phenotypic variability among the two H. somni strains.


Background
Histophilus somni is a commensal or opportunistic pathogen of the reproductive and respiratory tracts of cattle. H. somni was initially identified as the etiologic agent of bovine thrombotic meningoencephalitis (TME), but also causes bovine shipping fever pneumonia, either independently or in association with Mannheimia haemolytica and Pasteurella multocida. Pneumonia and myocarditis are currently the most commonly reported diseases due to H. somni [1]. Infections resulting in abortion, infertility, arthritis, septicemia, and mastitis can also be caused by H. somni with varying degrees of frequency and severity in cattle [2]. Similar disease conditions associated with strains of H. somni have been described in sheep [2]. Relatively less pathogenic and/or avirulent variants of H. somni have also been isolated from cattle, most frequently from the mucosal surfaces of the genital tract [3].
Numerous in vitro and in vivo studies during the pregenomic era have shed light on the differences in virulence properties between H. somni pathogenic isolates from sick animals and serum-sensitive commensal isolates from the genital tract [4]. However, thus far only a few genes involved in lipooligosaccharide (LOS) biosynthesis and serum-resistance have been identified in H. somni using DNA/DNA and DNA/protein comparisons [5][6][7]. H. somni pneumonia strain 2336 and preputial strain 129Pt have been comprehensively characterized phenotypically and have been analyzed in several comparative studies [8][9][10]. However, a comprehensive understanding of the genetic basis that determines the phenotypic variability among H. somni stains is necessary to gain further insights into their pathogenicity.
Comparative (in silico) analysis of bacterial genomes is a powerful tool for the prediction and/or identification of biochemical differences, virulence attributes, pathogenic ability, and adaptive evolution among related species/ strains [11]. Among the Pasteurellaceae, the genomes of one or more species pathogenic to humans or animals from the genera Actinobacillus, Haemophilus, Mannheimia, Pasteurella, and others have been sequenced. The availability of these genome sequences has facilitated whole genome comparisons that have provided insights into the physiology and pathogenic evolution of the corresponding bacteria [12,13].
Horizontal gene transfer (HGT: defined as the "acquisition of new genes either directly by transformation with naked DNA, transduction with phages, or the uptake of plasmids or chromosomal fragments by conjugation") plays a critical role in driving the evolution of pathogenic bacteria [14]. Reduction in genome size (referred to as reductive evolution) can occur as a result of continuous loss of genetic material due to gene deletion and/or mutation followed by DNA erosion [15]. Previous analyses by biochemical and pulsed field gel electrophoresis indicated that H. somni strains 2336 and 129Pt have common ancestry, but are non-clonal [16,17]. The following mechanisms may have engendered the genetic differences between these strains: (i) only one strain acquired genes by HGT while the other one did not; (ii) only one strain lost genes by deletion/mutation and underwent 'reductive evolution'; (iii) both strains independently and continuously acquired and lost genes, and the net loss or gain of genes is a determinant of their divergent evolution; (iv) gene convergence and the accumulation of synonymous and/or nonsynonymous nucleotide substitutions occurred across the genomes of the two strains.
The rationale for the present study was to determine, using whole genome sequencing and comparative genomics, the mechanisms responsible for genetic variability between the two strains. It was also envisaged that a comparative genomics and bioinformatics approach would facilitate identification of H. somni genes putatively involved in virulence and pathogenesis.

Methods
Genomic DNA (2 mg) from H. somni strain 2336 was purified using the Puregene protocol (Gentra Systems, Minneapolis, MN). The shotgun sequencing phase for this genome required~35,200 sequence reads to reach 8-fold coverage [18]. Library construction, template preparation, sequencing, assembly, and data analyses were performed as described previously [19,20]. The sequence data assembled with Phred-Phrap were viewed using Consed to assess data quality and design closure experiments. Consed was also used to identify putative repeat regions so that the problems associated with assembling these regions could be resolved by way of combinatorial PCR experiments to isolate the repeat sequences on PCR amplicons. The location and exact sequence of each repeat was confirmed by isolating PCR fragments that contained each repeat in its entirety, followed by primer walking across the PCR product.
For initial gap closure, Single Primer Amplification of Contig Ends (SPACE), which is similar to the single-primer PCR procedure for rapid identification of transposon insertion sites, was used [21]. Additional primers were designed, as necessary, to verify the correct assembly of contigs by confirmatory PCR. Simultaneously, a fosmid library was constructed for scaffolding purposes using the vector pCC1fos (Epicentre Biotechnologies, Madison, WI) with 40 kb inserts. Sequencing of the fosmids was necessary to close gaps across sequences that occur more than once in the genome, such as those of insertion sequences and ribosomal genes. Gaps that were not closed by SPACE-walking were closed using the sequence of H. somni strain 129Pt as a scaffold and the reads were assembled with parallel phrap (High Performance Software, LLC). Gap closure at this stage was also facilitated by AUTOFINISH [22]. Possible mis-assemblies were corrected with Dupfinisher [23] or transposon bombing of bridging clones [24] using an EZ::TN™ kit (Epicentre). The National Human Genome Research Institute standards for the Human Genome Project (1 error per 10, 000 assembled bases) were followed for H. somni to obtain sufficient quality genomic data.
Final automated annotation of the genome of strain 2336 was performed at the Oak Ridge National Laboratory using methods similar to those used to annotate the strain 129Pt genome [13]. Briefly, protein domains were identified by comparing each predicted protein against a Hidden Markov Model protein family database [25]. To estimate the number of proteins specific to each strain, the Smith-Waterman algorithm [26] was used to compare all predicted proteins from strain 129Pt against those from strain 2336 and vice-versa. Proteins deemed to be specific to each strain were compared against the NCBI non-redundant protein database to determine whether they were hypothetical or conserved hypothetical. The translated ORF was named a hypothetical protein if there was less than 25% identity or an aligned region was less than 25% of the predicted protein length. Prediction of the number of subsystems and pairwise BLAST comparisons of protein sets within strains 2336 and 129Pt were carried out with the Rapid Annotation using Subsystems Technology (RAST), which is a fully automated, prokaryotic genome annotation service [27]. This platform identifies tRNA and rRNA genes using the tools tRNAscan-SE and "search_ for_rnas", respectively [27].
Multiple genome comparisons were performed using the 'progressive alignment' option available in the program MAUVE version 2.3.0 [28,29]. Default scoring and parameters were used for generating the alignment. A synteny plot was generated using the program NUCmer, which creates a dot plot based on the number of identical alignments between two genomes [30]. Prophage regions (PRs) were identified using Prophinder http:// aclame.ulb.ac.be/Tools/Prophinder/, an algorithm that combines similarity searches, statistical detection of phage-gene enriched regions, and genomic context for prophage prediction [31]. Identification and annotation of genomic islets/islands (GIs) other than prophages were performed based on sequence composition bias and comparative genomic analysis [32]. Briefly, differences in the GC content, the occurrence of 'cornerstone genes' (e.g., transposases), and/or a continuous stretch of genes encoding hypothetical proteins were used as reference points for detection of GIs. Insertion sequences (ISs) were identified by whole genome BLASTX analysis of strains 2336 and 129Pt using the IS finder http:// www-is.biotoul.fr/. Gene acquisition and loss among the two strains was determined by comparing gene order, orientation of genes (forward/reverse), GC content of genes (percent above or below whole genome average), features of intergenic regions (e.g., remnants of IS elements, integration sites), and the similarity of proteins encoded by genes at a locus of interest (> 90% identity at the predicted protein level). Putative horizontally transferred genes (HTGs; defined as genes whose greatest homology, based on BLASTP scores, is to genes from a more distant phylogenetic group than to genes from the same or a close phylogenetic group as the query genome) were compiled using the integrated microbial genomes (IMG) system http://img.jgi.doe.gov. This system uses not only the best hit (i.e., the homolog with the highest bitscore), but also all the matches that have bitscores equal to or greater than that of the best hit to identify putative HTGs [33]. DNA and protein sequences were aligned using the ClustalW http://www. ebi.ac.uk/Tools/clustalw2/index.html and BOXSHADE http://www.ch.embnet.org/software/BOX_form.html programs as described previously [34]. . Although the chromosomes of the two H. somni strains differ in size by~256,000 bp, their average GC content, gene density (total number of genes predicted on the chromosome), and percentage of the sequence (number of genes) that encodes proteins were similar (Table 1). H. somni strain 2336 did not contain plasmids, but H. somni strain 129Pt contained a single plasmid [34]. Some of the other relevant features of the chromosomes of H. somni strains 2336 and 129Pt are shown in Table 1.

Properties of the chromosomes
Whole genome alignment using MAUVE showed the presence of extensive blocks of homologous regions, which is typical of closely related genomes ( Figure 1). To further dissect their co-linearity, a BLASTN comparison of the two genomes was performed at the Joint Genome Institute web site. This analysis indicated that there were 400 homologous regions (219 plus/plus and 181 plus/ minus, sequence range > 1,000 bp, but < 30,000 bp) and the average nucleic acid identity among these homologous regions was 98.5% (the identity range was 94%-99%, E-value = 0). The plus/plus homologous regions refer to those present on the forward strand in both strains (i.e., those that have the same orientation in both the chromosomes), and the plus/minus homologous regions refer to those present on the reverse strand in strain 129Pt in relation to those present on the forward strand in strain 2336 (i.e., those that have the opposite orientation, indicative of chromosome inversion). Several large gaps, translocations, and inversions became visible in the alignment generated by NUCmer ( Figure 2). Detailed sequence examination revealed that some of these gaps and/or inversions were associated with integrative and conjugative elements.

Comparison of prophage regions and genomic islands
Prophinder predicted four PRs in strain 2336, but only one in strain 129Pt (Table 2). PR I of strain 2336 was the shortest and had homology to 3 segments totaling~4,000 bp of bacteriophage HP1 of H. influenzae (66-68% nucleotide identity, E-value = 0 to 4e-07). The GC content of all four PRs from strain 2336 (PR I, 40.23%; PR II, 43.95%; PR III, 39.63%; PR IV, 39.85%) was higher than that of the overall genome (37.38% GC). PR II of strain 2336, which had the highest GC content (43.95%) among the four PRs, also contained a 5,679 bp segment (774,075 bp to 779,754 bp) with homology to a region within the genome of H. influenzae strain 10810 (72% nucleotide identity, E-value = 0). PR III of strain 2336 was the longest and contained 38 ORFs of unknown function (annotated as encoding hypothetical proteins). The genome of H. parasuis strain SH0165 contained several short sequences that had homology to this region (e.g., 5 segments totaling~2,700 bp in the 1,136,975 bp to 1,143,731 bp region, 69-76% nucleotide identity, E-value = 5e-152 to 7e-11). PR IV of strain 2336 was the most conspicuous and contained at  least 10 ORFs encoding putative proteins related to bacteriophage structural components. PR IV contained homology to a region of the genome of H. influenzae strain 86-028NP (9 segments totaling~4,700 bp in the 1,707,498 bp to 1,725,687 bp region, 67-84% nucleotide identity, E-value = 3e-126 to 1e-05). Furthermore, the P2 family lysogenic bacteriophage phi-MhaA1-PHL101 from M. haemolytica serotype A1 contained some of the ORFs found within PR IV of strain 2336. PR I of strain 129Pt was 6,361 bp shorter than PR III of strain 2336, but the two PRs had a similar GC content. A BLASTN analysis indicated that a sequence of~20,000 bp was conserved between PR I of strain 129Pt and PR III of strain 2336 (96-99% nucleotide identity, E-value = 0).   Figure 3B). Bacteriophages within the Prophinder database containing some of the predicted ORFs from PRs I-IV of strain 2336 and PR I of strain 129Pt are shown in Figure 4. Manual curation indicated that strains 2336 and 129Pt contained 3 and 6 GIs, comprising a total of 72,709 bp and 36,947 bp, respectively ( Table 2). GI I of strain 2336 contained 17 ORFs of unknown function and a similar sequence was not found in strain 129Pt or other members of the Pasteurellaceae. GI II of strain 2336 was the longest, had a higher GC content (~43%) than the overall genome, and contained 16 ORFs of unknown function. The genome of P. multocida strain Pm70 contained several short sequences that had homology to this region (e.g., GI I of strain 129Pt contained an ORF encoding a resolvase/integrase-like protein [GenBank:HS_0445] and 6 ORFs of unknown function. A similar sequence was not found in strain 2336, but H. parasuis strain SH0165 contained several short sequences with homology to this region (e.g., 3 segments totaling 2,040 bp, 68-79% nucleotide identity, E-values = 0 to 1e-06). GI II of strain 129Pt contained an ORF encoding a putative phage DNA primase-like protein [GenBank:HS_0533] and 9 ORFs of unknown function. A similar sequence was not found in strain 2336 or other members of the Pasteurellaceae. GI III of strain 129Pt contained ORFs encoding a putative phage terminase protein [GenBank:HS_1334], a prophage regulatory element [GenBank:HS_1335], an integrase [GenBank:HS_1337], and 4 ORFs of unknown function. A similar sequence was not found in strain 2336, but Aggregatibacter aphrophilus strain NJ8700 contained several short sequences that had homology to this region (e.g., 2 segments totaling 1,334 bp in the 1,818,930 bp to 1,821,098 bp region, 74-80% nucleotide identity, E-values = 0 to 4e-35). All predicted ORFs from GIs IV and V of strain 129Pt were of unknown function and strain 2336 contained several short sequences that had homology to these regions (e.g., 7 segments totaling 2,184 bp in the 1,965,862 bp to 1,990,890 bp region of GI II, 67-92% nucleotide identity, E-values = 9e-42 to 1e-10). The chromosome of strain 2336 had no regions of homology to GI VI of strain 129Pt, but H. parasuis strain SH0165 contained several short sequences that had homology to this region (e.g., 3 segments totaling 1,573 bp, 68% nucleotide identity, E-values = 2e-51 to 2e-13).

Comparison of insertion sequences
Insertion sequence finder indicated that strains 2336 and 129Pt contained several IS elements distributed throughout the chromosomes. Not surprisingly, some of these IS elements were found within the PRs and/or GIs described above. Insertion sequence 1016 (IS1016), consisting of a transposase [217 amino acids (aa)] flanked by 18-29 bp of terminal inverted repeats, is a member of the IS1595 superfamily [35,36].  necessary for the synthesis of phosphorylcholine [37], appeared to be interrupted due to an insertion-excision event involving IS1016, since two partial homologs of the transposase gene occurred upstream of licA.

BLAST comparison of protein sets
Strains 2336 and 129Pt contained 1550 predicted protein coding genes in common (bidirectional best hits, at least 90% identity at the predicted protein level). In strain 2336, 440 ORFs could not be assigned a function based on BLAST analysis and were therefore annotated as encoding hypothetical or conserved hypothetical proteins. In strain 129Pt, 429 ORFs were annotated as encoding hypothetical or conserved hypothetical proteins. Among hypothetical proteins that were common to both strains, 30 did not have homologs outside the genus. Pairwise BLAST comparisons indicated that strain 2336 contained 311 putative protein coding genes with no homologs in strain 129Pt (additional file 1). Within this subset, proteins encoded by 302 genes had at least 51 aa and 9 genes ([GenBank:HSM_0528], [GenBank:HSM_0530],  In both strains, a vast majority of putative HTGs appeared to have had their origins among members of gammaproteobacteria (mostly within Pasteurellales and Enterobacteriales). Putative HTGs with possible origins among members of betaproteobacteria (27 in strain 2336, 11 in strain 129Pt) and alphaproteobacteria (1 in strain 2336, 6 in strain 129Pt) were also identified. Among HTGs identified were those encoding proteins putatively involved in virulence (e.g., filamentous hemagglutinins, proteases, and antibiotic resistance regulators). A complete list of these genes is available at the 'Organism Details' sections for strains 2336 and 129Pt within IMG. Other strain-specific genes identified encoded DNA methylases (7 in strain 2336, none in strain 129Pt), transposases (8 in strain 2336, 1 in strain 129Pt), ABC transporters (5 in strain 2336, none in strain 129Pt), ATPases (4 in strain 2336, none in strain 129Pt), transcriptional regulators (14 in strain 2336, 3 in strain 129Pt), kinases (2 in strain 2336, 1 in strain 129Pt), and several proteins related to bacteriophage functions (e.g., of the 10 integrase/resolvase-related genes found in strain 129Pt, six have no homologs in strain 2336 and of the 7 integrase/ resolvase-related genes found in strain 2336, 2 have no homologs in strain 129Pt). Excluding intergenic regions, the total length of sequence that was associated with specific genes in strain 2336 was 254,052 bp (~11% of the genome), and was 98,016 bp (~5% of the genome) in specific genes of strain 129Pt.

Identification of genes encoding polysaccharide biosynthesis/modification enzymes
A search of the NCBI non-redundant protein database using the BLASTP algorithm identified 17 ORFs that encode putative glycosyltransferases (GTs) in the genomes of strains 2336 and 129Pt. Seven of these ORFs were common to both genomes (at least 96% identity at the predicted protein level), 8 were found only in strain 2336, and 2 were found only in strain 129Pt. Among the ORFs encoding putative GTs common to both strains, 5 contained simple sequence repeats (SSRs), and 4 of the 8 ORFs encoding GTs found in strain 2336 contained SSRs. A list of putative GTs and their SSRs identified in both strains are shown in  (Table 3) The lipooligosaccharide biosynthesis (lob) gene cluster consisting of lob1 and lob2ABCD ORFs encoding glycosyltransferases involved in attaching the outer core glycoses of the LOS was previously identified in strain 738, which is an LOS phase variant of strain 2336 [5,7]. Strain 129Pt encoded full-length homologs of lob1 and lob2D, but only the 5' ends of lob2A and lob2C and lacked lob2B [13]. Strain 2336 contained full-length homologs of lob1 and lob2ABC, but a truncated homolog of lob2D (table 3). The variations in the lob loci in the genomes of strains 2336 and 129Pt correlate with the differences in the structures of the LOS of these strains, as determined by NMR spectroscopy and mass spectrometry [16,38]. In addition, both strains contained ORFs encoding a phosphoheptose isomerase ( HS_0333], glmU). These five genes/enzymes were predicted to be involved in LOS core biosynthesis in strains 2336 and 129Pt.
Strain 2336 also contained a locus that encodes proteins putatively involved in exopolysaccharide (EPS) and/or LOS biosynthesis (  0707], rmlB). These three genes/enzymes were predicted to be involved in polysaccharide biosynthesis/modification.

Acquisition and loss of genes encoding filamentous hemagglutinins
The chromosome of H. somni strain 2336 contained four loci that have twelve putative genes encoding proteins homologous to FhaB, and four putative genes encoding proteins homologous to FhaC. Locus , which appeared to be associated with a transposon. Strain 129Pt had a locus containing genes that flanked locus II of strain 2336, but did not contain the FhaB and FhaC homologs ( Figure 5, locus II).
Locus III of strain 2336 was 14,066 bp (GC content of 37%) and contained genes encoding FhaB ([GenBank: HSM_1489]) and FhaC ([GenBank:HSM_1490]). The fhaB (12,288 bp) of this locus was the second largest gene in the genome and the largest among the 12 homologs. This gene encoded a putative protein homologous to the high molecular weight immunoglobulin-binding protein of H. somni and the large supernatant proteins (Lsp1 and Lsp2) of H. ducreyi that have been previously described [39]. No transposon or phage regions were apparent in this locus. Strain 129Pt had a locus containing genes that flank locus III of strain 2336, but did not contain the FhaB and FhaC homologs ( Figure 5 The twelve putative FhaB homologs found in the four loci of strain 2336 varied in size, with the smallest and largest being 83 aa and 4,095 aa, respectively. Phylogenetic comparison indicated that the four FhaB homologs within locus I were most closely related to each other as were the six FhaB homologs within locus IV (data not shown). The FhaC homologs of loci I (581 aa) and III (586 aa) were more closely related to each other than to FhaC from locus II (450 aa). Multiple sequence alignment of N-terminal fragments of FhaB homologs from the four loci of strain 2336 with those of Bordetella pertussis FHA and Proteus mirabilis HpmA showed that they contain several common features ( Figure 6). Of the many residues that are shown to be involved in B. pertussis FHA secretion, 6 (four asparagine and one each of serine and glutamic acid) were conserved in all six homologs and 2 (an asparagine and a methionine) were conserved in five of the homologs ( Figure 6). However, the NPNG (Figure 6, S2) and CXXC (Figure 6, S3) motifs that may play a role in stabilization of the helical structure were conserved in only three homologs. Genomic comparison of members of the [GenBank: CLSK923564] cluster revealed that in 3 cases (Arthrobacter aurescens TC1, R. eutropha JMP134, and Xanthomonas campestris pv. vesicatoria str. 85-10), the ORF encoding subtilisin was found on plasmids. In the case of Delftia acidovorans SPH-1, Burkholderia ambifaria AMMD, Polaromonas naphthalenivorans CJ2, Chelativorans sp. BNC1, and E. coli O127:H6 str. E2348/69, the ORF encoding subtilisin was found within a prophage region. Furthermore, in the case of Photorhabdus luminescens subsp. laumondii TTO1, P. syringae pv. phaseolicola 1448A, P. fluorescens Pf0-1, Verminephrobacter eiseniae EF01-2, and Xanthomonas oryzae pv. oryzae PXO99A, the ORF encoding subtilisin appeared to be associated with genes encoding transposases. However, in Chromohalobacter salexigens DSM 3043 and Anaeromyxobacter sp. K, the ORF encoding subtilisin was not associated with prophage or transposase sequences. Interestingly, in 15 host species the Locus II. 1. DNA-binding transcriptional activator GutM, 2. Sorbitol-6-phosphate dehydrogenase, 3. PTS system glucitol/sorbitol-specific IIA component, 4. PTS system, glucitol/sorbitol-specific, IIBC subunit, 5. PTS system, glucitol/sorbitol-specific, IIC subunit, A. 153 bp sequence found only in strains 129Pt and 2336, B. 500 bp sequence with homology to type III restriction-modification genes, C. 173 bp sequence found only in strain 129Pt, 6-fhaC, 7. fhaB, T. Transposon-related sequence, 8. UbiH/UbiF/VisC/COQ6 family ubiquinone biosynthesis hydroxylase, 9. Polyprenyl-6-methoxyphenol 4-hydroxylase, 10 subtilisin ORF formed an operon with an ORF encoding homologous ATPases of the AAA family. In the case of A. aurescens TC1, a transposon insertion appears to have disrupted the AAA ATPase-subtilisin operon. The two ORFs have a 4 bp overlap in 8 species and appear to be co-transcribed (Figure 7, left side). Furthermore, comparison of subtilisin sequences from these 8 species indicated that motifs containing the catalytic triad (Asp-His-Ser) were conserved (Figure 7, right side).

Comparison of genes encoding transferrin-binding proteins
A comparison of the genomes of H. somni strains 2336 and 129Pt revealed that both strains contained genes

Discussion
Genetic events such as deletions, duplications, insertions, and inversions are relatively common in bacterial chromosomes as a result of bacteriophage infection, integration and excision of plasmids, transpositions, and/or replication-mediated translocations [40,41]. In addition, different prophages embedded within a single chromosome can contain similar genes encoding integration and structural functions, and it is not uncommon for these genes to undergo homologous recombination. One of the consequences of such homologous recombination is the rearrangement of the host chromosome [42]. These events are known to be the precursors of evolution and can bring about a significant change in the number, linear order, and orientation of genes on the circular chromosomes of different strains/species of closely related bacteria [43].
The presence of PRs in the chromosomes of strains 2336 and 129Pt was a notable feature since the number and diversity of genes associated with these PRs far exceeded those described in H. influenzae strains Rd KW20 and 86-028NP [44,45]. The difference in the size of the chromosomes of strains 2336 and 129Pt was partly due to PRs and associated genes. Similar observations have been made in other bacteria wherein prophageassociated sequences constitute a large portion of strainspecific DNA [42]. Although one of the functions of RM systems is to afford protection against bacteriophage attack (the "cellular defense hypothesis"), it is interesting to note that both strains contain several prophage-like sequences despite the presence of genes encoding putative RM systems in their chromosomes. The lack of ORFs encoding HsdM, M.HsoI, and R.HsoI in strain 129Pt indicates that these systems are not absolutely essential for cell survival. Their absence may also partially explain the relative ease with which this strain can be transformed in the laboratory.
Biosynthesis of polysaccharides requires a multitude of GTs, which catalyze the transfer of sugars from an activated donor to an acceptor molecule and are usually specific for the glycosidic linkages created [46]. Intra and interspecies divergence of genes encoding GTs are not uncommon. Phase-variable LOS is an important virulence factor of pathogenic strains of H. somni. Phase-variation of H. somni LOS has been shown to be due to the presence of SSRs in genes that encode GTs and enzymes involved in assembling non-glycose LOS components such as phosphorylcholine [5,7,37]. The genes lob1, lob2AB, and lob2D contain SSRs either just before the start codons or within the open reading frame [47]. In addition, [GenBank:HSM_0148], [GenBank:HSM_0164], [GenBank:HSM_0975], and [GenBank:HSM_1552] also contain SSRs that may be responsible for LOS phase variation, but require additional experimental investigation.
Most H. somni strains also produce a biofilm-associated EPS consisting primarily of mannose and galactose [47]. Although characterization of some of the genes involved in the biosynthesis and/or modification of H. somni LOS/ EPS has been determined, the identification of several more has been facilitated by comparative genome analyses [12,13,37]. It is likely that some of the observed variations among genes encoding GTs in H. somni and other Pasteurellaceae members is due to recombination events and/or selective pressure. Furthermore, variation in the composition and structure of the LOSs of strains 2336 and 129Pt may, in part, be due to different GT genes they have acquired or lost. In view of this, the role of new GT genes putatively involved in LOS biosynthesis and phase variation that have been identified in this study needs to be investigated.
Several types of two-partner secretion (TPS) pathways have been identified and characterized in Gram-negative bacteria [48]. Filamentous hemagglutinins (Fha), consisting of a membrane-anchored protein (FhaC), which is involved in the activation/secretion of the cognate hemagglutinin/adhesin (FhaB), are prototypes of two-partner virulence systems. Homologs of FhaB and FhaC that possibly play a role in pathogenesis have been found in several members of the genera Bordetella, Haemophilus, Proteus, and Pasteurella, [49][50][51][52][53]. Among the 4 loci containing fha homologs in strain 2336, locus II appeared to be an acquisition mediated by a transposon and locus I appeared to be an acquisition due to homologous recombination. It appears that strain 129Pt has lost an fhaB homolog due to bacteriophage excision (Locus IV). It is possible that fhaB homologs in locus I of strain 2336 are paralogs, as are the fhaB homologs in locus IV. Together, these genes represent a large collection of fhaB and fhaC homologs in a single genome. The presence of multiple fhaB homologs in strain 2336 may, in part, be responsible for the serum resistance of this strain, in contrast to strain 129Pt, which does not contain full length fhaB homologs and is serumsensitive [10]. Structural and functional studies of N-terminal fragments of FHA from B. pertussis and HpmA from P. mirabilis have indicated that the proteins form a righthanded parallel β-helix [52][53][54]. Several residues that mediate the interaction of B. pertusis FHA with its cognate FhaC, and facilitate secretion, have also been identified [54]. Although H. somni FhaB homologs have some features in common with FHA of B. pertussis and HpmA of P. mirabilis, a distinct region is the direct repeat 2 Fic domain in the FhaB homolog in locus III of strain 2336 that has been shown to induce cytotoxicity in human HeLa cells, bovine turbinate cells, and bovine alveolar type 2 cells [55,56]. However, the secretion determinants of these proteins and the role of FhaC proteins in their secretion remain unknown. Subtilases ([GenBank:COG1404]; subtilisin-like serine proteases) are a large superfamily of functionally diverse endo-and exo-peptidases that occur in prokaryotes and eukaryotes [57]. Bacterial subtilisins may have a role in pathogenesis besides facilitating protein degradation and nutrient acquisition [58]. However, subtilisin-like serine proteases from members of the Pasteurellaceae have not been characterized previously. The presence of a gene encoding a putative subtilase whose homologs were not found in other members of the Pasteurellaceae was yet another example of HGT in strain 2336. Although H. ducreyi strain 35000HP contains genes ([GenBank: HD1094] and [GenBank:HD1278]) encoding serine proteases that belong to the D-H-S family, they are unrelated to each other and to [GenBank:HSM_188]. In Agrobacterium tumefaciens, genes encoding AAA-ATPase and subtilisin-like serine protease have been shown to be functionally related and this pair has been proposed to constitute a toxin-antitoxin system that contributes to stability of plasmid pTiC58 [59]. A conjugative megaplasmidencoded subtilase has been shown to be a virulence factor in E. coli and it has been suggested this toxin activates a V-ATPase in Vero cells [60,61]. Whether the ATPase-subtilisin pair identified in this study is transcriptionally and functionally coupled, and whether the protease gene contributes to the pathogenicity of strain 2336, has yet to be determined.
The ability to acquire and metabolize iron is an important determinant of bacterial survival and adaptability. In some bacteria, genes that facilitate iron uptake have been shown to be acquired by horizontal transfer [62]. Several members of the Pasteurellaceae possess special outer membrane protein (OMP) receptors consisting of two unrelated transferrin-binding proteins, TbpA and TbpB, which facilitate acquisition of transferrin-bound iron from their hosts [63]. H. somni strain 649 has a TbpA-TbpB receptor system that acquires iron only from bovine transferrin, and a second, probably redundant, TbpA2 receptor that can acquire iron from bovine, caprine, or ovine transferrins [64]. From genomic analyses, it is apparent that H. somni strains possess multiple genes for the acquisition of iron. Horizontal transfer and clustering of genes related to iron metabolism is indicative of enrichment and adaptation of pathogenic H. somni to different niches within its natural host. Furthermore, products of one or more of these genes may facilitate binding to transferrins/lactoferrins of different host species and such a gene repertoire could enhance the ability of this bacterium to survive in a variety of ruminants. Studies using a mouse model have suggested the role of bovine transferrin and lactoferrin in increasing the virulence of strain 2336 [65].
Many bacterial pathogens contain surface proteins that facilitate adhesion to and/or invasion of the host mucosal barriers [66,67]. Some of these proteins may also be involved in bacterial aggregation to form biofilms and their evasion of the host's innate immune system [67,68]. At least three major categories of bacterial adherence proteins have been identified, two of which are hair-like structures called pili and non-pilus-associated proteins called adhesins. The plasmid-encoded YadA is a prototype non-pilus-associated protein that has been well characterized [69]. Mutation or deletion of the yadA homolog can reduce virulence in pathogenic bacteria [70,71]. Large adhesins in some bacteria are associated with ABC transporters and may be involved in biofilm formation [72]. A transposon mutagenesis approach has implicated several genes, including those encoding filamentous haemagglutinins, in H. somni biofilm formation [73]. H. somni also contains genes putatively involved in adhesin synthesis/transport, pilus formation, and quorum sensing (e.g., luxS), but their role in facilitating biofilm formation remains to be investigated.
Veterinarians use several antibiotics to treat H. somni infections and feedlot cattle enterprises frequently rely on tetracyclines for prophylaxis as well as growth promotion [1,74,75]. Although not common, H. somni resistance to tetracycline has been reported [76,77]. Copper and zinc are often included in commercial cattle diets to achieve optimal growth and reproduction. Emergence of copper/ zinc resistance in bacteria of animal origin has been documented and attributed to the excessive presence of these metals in livestock feed [78]. Furthermore, the occurrence of genes related to metal and antibiotic resistance on integrative/conjugative elements and their horizontal co-transfer has been noted previously [79,80]. In view of these observations, it was not surprising to find a GI containing genes putatively involved in copper, zinc, and tetracycline resistance in strain 2336.
Transcriptional regulators play crucial roles in bacterial functions and they have been classified into a number of families [81]. The homologs of [GenBank:HSM_0806] (LysR, [NCBI:CLSK797597] cluster) in H. influenzae and P. dagmatis are associated with genes encoding proteins involved in fatty acid metabolism (e.g., acetyl-CoA acetyltransferase, 3-oxoacid CoA-transferase, and fatty acid transporters). Therefore, this cluster may represent a novel class of metabolic regulators within the LysR family. Most members of the [NCBI:PRK13756] cluster are involved in regulation of antibiotic resistance genes [81]. Homologs of [GenBank:HSM_1734] and [GenBank:HSM_1735] among members of the Pasteurellaceae encode tetracycline resistance and are associated with mobile genetic elements [82][83][84][85][86]. Homologs of [GenBank:HSM_1736] and [Gen-Bank:HSM_1737] in other bacteria are known to be horizontally transferred and may mediate resistance to antibiotics [87]. Furthermore, homologs of [GenBank: HSM_1192] and [GenBank:HSM_1193] are predicted to be involved in multidrug resistance [88,89]. In summary, it appears that strain 2336 contained at least three different systems related to antibiotic resistance. Although the functional role of these genes remains to be established, their similarity to metal/antibiotic resistance genes associated with mobile genetic elements in other members of the Pasteurellaceae is clinically significant.
From genome comparisons, it appears that there is no correlation between chromosome size and the number of tRNA genes (the genomes of H. somni 2336, H. somni 129Pt, H. influenzae 86-028NP, H. ducreyi 35000HP, and P. multocida Pm70 contain 49, 49, 58, 46, and 57 tRNA genes, respectively). Whether the lower number of tRNA genes found in H. somni strains is due to disruptive integration of bacteriophages into tRNA genes (as in 'bacteriophage disruption of tRNA genes in Lactobacillus johnsonii' [90]) or is a result of compensatory gene loss in lieu of acquisition of new genes (as in 'genome reduction in pathogenic and symbiotic bacteria' [91]) is unknown. Nevertheless, comparison of the chromosomes of strains 129Pt and 2336 bolsters the proposition that prophages and transposons have played a major role in creating genomic diversity and phenotypic variability in the two strains. It is also apparent that strains 2336 and 129Pt have independently and intermittently acquired and lost genes since their divergence from a common ancestor, and that the net gain in strain 129Pt is less than the net gain in strain 2336.

Conclusions
H. somni strain 2336 contains a larger chromosome when compared to other Haemophilus and Histophilus strains whose genome sequences are available. Several regions that resemble the pathogenicity islands of other virulent bacteria are present in strain 2336. There is evidence to suggest that most of these regions were acquired by HGT mechanisms, whereas similar regions were not found in the commensal strain 129Pt. Although previous studies have discovered the genetic basis for some of the phenotypic dissimilarities between strains 2336 and 129Pt, complete genome sequence analyses have provided a comprehensive account of innate and acquired genetic traits. Furthermore, comparisons of the genomes of strains 2336 and 129Pt have contributed to our understanding of the biology and pathogenic evolution of these bacteria. The post-genomic era for H. somni poses new challenges and opportunities in terms of functional characterization of genes and deciphering their roles in colonization, survival, and pathogenesis. Continued analyses of the genomes of H. somni strains and comparison with newly sequenced genomes of other bacteria should enhance the current knowledge on virulence mechanisms. Nevertheless, the results from this study are expected to facilitate the development of improved diagnostic tests for and vaccines against H. somni.

Additional material
Additional file 1: Additional file 1 List of H. somni strain 2336 specific genes. This table lists the strain-specific genes found in H. somni strain 2336. This data was obtained by cross-comparison of the genomes of strains 2336 and 129Pt using blastn.
Additional file 2: Additional file 2 List of H. somni strain 129Pt specific genes. This table lists the strain-specific genes found in H. somni strain 129Pt. This data was obtained by cross-comparison of the genomes of strains 129Pt and 2336 using blastn.