Isolates
The genomes of 1133 M. haemolytica isolates were sequenced and/or analyzed in this study. The isolates were categorized into two collections; Zoetis and KSU-USMARC, and four groups; lung clinical isolates group 1, lung clinical isolates group 2, nasopharyngeal non-clinical isolates, and clinical and non-clinical isolates (Table 1, Additional file 1). Groups 1 and 2 of lung clinical isolates consisted of 317 isolates that were part of a 10-year study of antimicrobial susceptibility of bacteria that cause BRD in the United States and Canada [14], (Additional file 1). These isolates were originally sent to Pfizer Animal Health (now Zoetis) by participating veterinary diagnostic laboratories, and their genomes were sequenced for this study with the permission of both Zoetis and the diagnostic laboratories that contributed the isolates. Group 1 of lung clinical isolates consisted of 155 isolates that originated from 35 U.S. States and 5 Canadian Provinces within the years of 2002–2011 (Additional file 1). Group 2 of lung clinical isolates consisted of 162 isolates that originated from 29 U.S. States and 3 Canadian Provinces. In order for an isolate to be included in a lung clinical isolate group, it had to originate from a different state, or a different year within a state from all other isolates in the group to minimize epidemiological relatedness.
Of the other 813 isolates used in the study, 810 were part of a longitudinal study comprised of 180 cattle that were first sampled for nasopharyngeal bacteria at the time of their purchase from sale barns located in Athens, Tennessee (n = 60 cattle), Richmond, Kentucky (n = 60 cattle), or Maryville, Missouri (N = 60 cattle) in the fall of 2013. The cattle were transported from the sale barns to a research feeding facility in Kansas where half received a metaphylactic treatment for the control of BRD (gamithromycin (6 mg/kg, subcutaneously administered), and half received a saline control. The animals were then monitored for BRD and periodically sampled over 28 days for M. haemolytica and other bacteria from nasopharyngeal swabs and/or bronchoalveolar lavage fluid and monitored for signs of BRD as previously described [34], (Additional file 1).
The 35 isolates comprising the group of nasopharyngeal non-clinical isolates from the KSU-USMARC collection were selected for minimal epidemiological relatedness to each other based on geographical origin (Table 1, Additional file 1). Each of the 35 isolates originated from the nasopharynx of a different animal that did not have clinical signs of BRD. Ten of the cattle sampled were at a sale barn in Athens, Tennessee, 12 were at a sale barn in Maryville, Missouri, and 13 were at a sale barn in Richmond, Kentucky. The isolates were obtained from the cattle prior to their transport to the Kansas research feeding facility and their genomes were sequenced for this study.
The group of clinical and non-clinical isolates consisted of 781 isolates of which many were epidemiologically close, or directly related to each other. The genomes of 505 isolates from this group were sequenced for this study. The genomes of the remaining 276 isolates were sequenced as previously described [29, 30], (Additional file 1).
Isolate culture from field collected samples and M. haemolytica identification
Isolates from the Zoetis collection were originally identified as M. haemolytica by diagnostic labs located in the U.S. or Canada, and were further characterized or confirmed as M. haemolytica as necessary [14]. Additionally, the isolates were re-confirmed as M. haemolytica in 2013 via the OmniLog microbial identification system using GEN III chemistry as per the manufacturer’s protocol (Biolog, Inc., Hayward, CA, USA).
Nasopharyngeal non-clinical isolates were cultured from two deep nasopharyngeal swabs that were collectively taken from the left and right nasopharynx of each animal with double guarded sterile uterine swabs (VetOne, Boise, Idaho, USA). Swabs from each nasopharynx of an animal were either combined in sample tubes containing liquid Amies media (Becton, Dickinson and Company, Franklin Lakes, New Jersey, USA) and transported to the US Meat Animal Research Center at 4 °C, or combined in liquid Amies media with 20% glycerol, frozen at −80 °C, and subsequently transported to the US Meat Animal Research Center. Aliquots of the media were spread plated onto both Chocolate agar and Brain Heart Infusion (BHI) agar plates containing 5% sheep blood (Hardy Diagnostics, Santa Maria, CA, USA) and cultured 16–20 h at 37 °C with 5% CO2. Hemolysis, or lack thereof, on BHI blood agar plates was not an identification criterion for M. haemolytica, as not all strains show β-hemolysis on culture plates containing bovine or ovine blood [38, 39]. Colonies that resembled M. haemolytica by morphology were cultured for 16–20 h in deep 96-well blocks containing 1 ml of liquid BHI (Becton, Dickinson and Company, Franklin Lakes, NJ, USA) per well at 37 °C without CO2 and subsequently frozen at −80 °C with 10% glycerol. The isolates were later cultured twice on chocolate agar plates and colonies from the second culture were identified as M. haemolytica via Gen III as described above.
Regarding isolates of the clinical and non-clinical group, those obtained from animals not showing clinical signs of BRD were obtained and identified as M. haemolytica as described for the group of nasopharyngeal non-clinical isolates. Isolates within the clinical and non-clinical group that came from animals diagnosed with respiratory disease were either obtained from deep nasopharyngeal swabs or bronchoalveolar lavage samples as previously described [34]. The clinical isolates were cultured on trypticase soy agar plates with 5% blood, chocolate agar plates, and MacConkey agar plates in 5% CO2 for 18–24 h as previously described [34]. The clinical isolates were identified as M. haemolytica via matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS), per the manufacturer’s protocol (Bruker Daltonics, Billerica, MA, USA).
DNA purification, library construction, and sequencing
The genomes of 857 M. haemolytica isolates were sequenced for this study. This included all 317 isolates in the Zoetis collection, all 35 isolates in the group of nasopharyngeal non-clinical isolates, and 505 of the 781 isolates in the group of clinical and non-clinical isolates. Genomic sequencing of the remaining 276 isolates in the group of clinical and non-clinical isolates has been described elsewhere [29, 30]. The sequence data of those 276 isolates were included in this study for nucleotide polymorphism identification and downstream analyses.
For DNA purification, isolates were cultured in deep 96 well blocks containing 1 mL BHI per well at 37 °C without supplemental CO2 for 16–20 h. DNA was purified from the cultures using MO BIO UltraClean® -htp 96 Well Microbial DNA kits per the manufacturer’s instructions (Carlsbad, CA, USA). Resulting DNA samples were quantified with a Promega Quantus Fluorometer and QuantiFluor Dyes® per the manufacturer instructions (Promega, Madison, WI, USA). Illumina-based Nextera XT DNA Library kits were prepared in batches of 96 as per the manufacturer’s instructions, with Nextera XT Index Kits (V1) set A which provided 96 unique index combinations (Illumina, San Diego, CA, USA). Given that the Nextera XT DNA Library kit protocol involved a PCR step, pre- and post-PCR procedures were conducted in physically separated labs to avoid aerosol contamination of samples, reagents, and libraries. Subsequent to their construction, batches of 96 libraries were combined, and paired-end sequencing was conducted on an Illumina MiSeq instrument with either MiSeq Reagent Kit version 2 (2 × 250 bp, 500 cycles) or version 3 (2 × 300 bp, 600 cycles), (Illumina, San Diego, CA, USA).
Quality control checks of isolate sequence
Sequence coverage for each isolate was determined to ensure that a minimum of 10× sequence per genome was attained. The coverage was calculated using total bases of sequence per isolate divided by 2.6 million, which is the typical genome size for known M. haemolytica isolates in GenBank. Any isolate that did not have 10× sequence per genome was re-sequenced to achieve that coverage or higher.
The DNA sequence of each isolate was also checked for evidence of contamination and/or isolate misidentification using Geneious software and a series of BLAST analyses. Geneious (version 8.1.5), [40], was used to map the sequence reads from each isolate using the Geneious mapper program with Medium/Fast Sensitivity and fine tuning of up to 5× iterations. The reads were mapped to an available complete, finished, circular reference genome of M. haemolytica strain USDA-ARS-USMARC-183 which was isolated from a 1991 BRD case in Kansas, USA ([28], GenBank: CP004752). Because the Illumina platform generated paired-end reads of clonal sequence [41], the mapped reads could be checked for the presence of nucleotide polymorphisms where both alleles were observed between reads from the same isolate, and were not the result of mapping or sequencing errors. Detection of these polymorphisms was an indication that more than one type of M. haemolytica, or bacterial species was represented in the library. Any M. haemolytica isolate found to be represented by a library that yielded sequences indicative of different strain types of M. haemolytica, or other bacterial species, was re-cultured for isolation of all discernable M. haemolytica types. The resulting isolates were subsequently sequenced. Additionally, for each isolate sequenced in this study, full length 16S rDNA and the leukotoxin gene (lktA), as well as random regions selected throughout the genome were compared to M. haemolytica reference genomes present in NCBI GenBank using BLASTN [42], for in silico confirmation that the isolates were M. haemolytica.
Genome-wide nucleotide polymorphism identification and genotyping
MiSeq-produced sequence reads of the 857 M. haemolytica isolates sequenced in this study, as well as those of the 276 previously sequenced M. haemolytica, were mapped to the genome of M. haemolytica strain USDA-ARS-USMARC-183 using Bowtie 2 (v2,2,1) with default and the “very-sensitive-local” arguments at Intrepid Bioinformatics Inc (Louisville, KY, USA). Standard alignment post-processing of the mapped reads which included the marking of PCR duplicates and realignment of insertion deletion polymorphisms was done via done Genome Analysis Toolkit (GATK) v1.5-32-g2761da9. “Singleton” nucleotide polymorphisms, where the minor allele was observed in only one isolate, as well as insertion deletion polymorphisms, were not included in the dataset. All isolates were initially genotyped for nucleotide polymorphisms identified throughout the genome. Nucleotide polymorphisms identified within integrated phage sites and duplicated genomic regions of strain USDA-ARS-USMARC-183 were not included in subsequent analyses due the inherent difficulty in accurately mapping short sequence reads to repetitive sequence that can create polymorphism artifacts [43, 44].
Identification of ICE sequence
ICE sequences with or without the macrolide resistance genes erm(42), msr(E), and mph(E) have been reported for the 276 M. haemolytica isolates included in this study that were previously sequenced [29]. The 857 M. haemolytica isolates sequenced in this study, and the additional 276 previously sequenced isolates, were queried for ICE and antibiotic resistance gene sequences by mapping their sequence reads to three homologous ICE sequences using Geneious (version 8.1.5). The three ICEs were ICEPmu1, which was identified within the genome of P. multocida 36950 (GenBank: CP003022), ICEMh1, which was identified within the genome of M. haemolytica isolate M42548 (GenBank: CP005383), and a putative ICE identified in M. haemolytica isolate USDA-ARS-USMARC-183 that is homologous to the conserved sequence backbone of ICEPmu1 and ICEMh1 and does not contain antibiotic resistance genes [23]. The presence or absence of ICE sequence, as well as antibiotic resistance genes present in ICEPmu1 and ICEMh1 was determined for each isolate based on the observation of reads mapped to annotated regions of those reference sequences.
Trees and networks constructed from concatenated nucleotide polymorphism genotypes
For each isolate, concatenated nucleotide polymorphism genotypes were created from their corresponding alleles of genome-wide polymorphisms identified in this study. A Neighbor-Joining tree was generated from concatenated nucleotide polymorphism genotypes of all 1133 isolates involved in this study (Fig. 1). The nucleotide polymorphisms used to generate the concatenated nucleotide polymorphism genotypes for generation of the tree each had allele genotypes scored for 95% or more of the isolates. The alleles were formatted into an alignment in MacVector (version 14.5.1), [45] using clustal W. A Neighbor-Joining tree was generated from the alignment using the following programs: Seqboot, Dnadist, Neighbor, and Consense which are all part of PHYLIP (version 3.69), [46]. An F84 model of substitution with a transition transversion ratio of 2 was used to generate the tree, which was bootstrapped 100 times. The tree was viewed using the program Dendroscope (version 3.5.7), [47, 48].
A Neighbor-Joining network was generated from the concatenated nucleotide polymorphism genotypes of 521 genotype 1 isolates (Fig. 2), and another from those of 600 genotype 2 isolates (Fig. 3). The nucleotide polymorphisms used to generate the networks had allele genotypes scored for 90% or more of the isolates. The nucleotide polymorphism genotypes of the 521 genotype 1 isolates represented in Fig. 2 were tested for recombination using a phi test [49], in SplitsTree4 (version 4.12.3), [50], as were those of the 600 genotype 2 isolates represented in Fig. 3. The networks were also made in SplitsTree4 with an F84 model of substitution and a transition/transversion ratio of 2. Support for clusters identified in the networks was determined using IQ-TREE (version 1.3.10), which is a maximum-likelihood phylogeny platform [51]. For each isolate group, ModelFinder was used within IQ-TREE to find the optimal model of substitution. For the networks of Figs. 2 and 3, GTR + G and SYM + G models of substitution were used, respectively, to generate the maximum-likelihood trees. Support within the trees was determined in IQ-TREE with 1000 dataset replicates using SH-aLRTs, [52], and ultrafast bootstraps [53].
Statistical testing
Isolates comprising the two following groups were used for statistical testing; lung clinical isolates group 1 (n = 155), and the group of nasopharyngeal non-clinical isolates (n = 35). Two-way contingency table analyses were conducted to test for an association between genotype 2 M. haemolytica with 1) the lungs of diseased animals, and 2) the presence of ICE sequence homologous to the conserved sequence backbones of ICEPmu1, ICEMh1 and other homologs (http://statpages.info/ctab2x2.html), [54]. Additionally, a 2-way contingency table analysis was conducted to test for an association between genotype 2b M. haemolytica and antibiotic resistance genes found in variable regions of ICEPmu1 or ICEMh1. Only group 1 lung clinical isolates that were genotype 2 with subtypes supported by SH-aLRT and ultrafast bootstrap values were used for this analyses (n = 147). The group of nasopharygeal non-clinical isolates was not included due to the low frequency of genotype 2 M. haemolytica in the group (n = 6).