Babesia microti from humans and ticks hold a genomic signature of strong population structure in the United States
BMC Genomics volume 17, Article number: 888 (2016)
Babesia microti is an emerging tick-borne apicomplexan parasite with increasing geographic range and incidence in the United States. The rapid expansion of B. microti into its current distribution in the northeastern USA has been due to the range expansion of the tick vector, Ixodes scapularis, upon which the causative agent is dependent for transmission to humans.
To reconstruct the history of B. microti in the continental USA and clarify the evolutionary origin of human strains, we used multiplexed hybrid capture of 25 B. microti isolates obtained from I. scapularis and human blood. Despite low genomic variation compared with other Apicomplexa, B. microti was strongly structured into three highly differentiated genetic clusters in the northeastern USA. Bayesian analyses of the apicoplast genomes suggest that the origin of the current diversity of B. microti in northeastern USA dates back 46 thousand years with a signature of recent population expansion in the last 1000 years. Human-derived samples belonged to two rarely intermixing clusters, raising the possibility of highly divergent infectious phenotypes in humans.
Our results validate the multiplexed hybrid capture strategy for characterizing genome-wide diversity and relatedness of B. microti from ticks and humans. We find strong population structure in B. microti samples from the Northeast indicating potential barriers to gene flow.
Babesia microti, a tick-borne apicomplexan parasite, is the primary causative agent of human babesiosis, an emerging malaria-like illness in the northeastern and upper midwestern United States [1, 2]. Human babesiosis presents with a wide array of clinical severity, from asymptomatic infection in about a quarter of adults to fatal disease in up to 21 % of immunocompromised individuals [1, 3, 4]. Babesia microti is the most common pathogen transmitted through blood transfusion in the U.S. [5–7]. The first human case of babesiosis in an immunocompetent person was identified in 1969 on Nantucket Island, Massachusetts. The causative species was B. microti, which is now recognized as the most common Babesia spp. causing disease in humans .
In the United States, the enzootic transmission cycle of B. microti is similar to that of the etiologic agent of Lyme disease (Borrelia burgdorferi) and includes the blacklegged tick vector (Ixodes scapularis) and a range of vertebrate reservoir hosts, primarily the white-footed mouse (Peromyscus leucopus) . The current invasion of B. microti and B. burgdorferi in the Northeast has followed the spread of their shared vector, I. scapularis, out of coastal refugia in Massachusetts and Rhode Island [9, 10]. Epidemiological modeling studies have estimated that although these diseases are spreading at a similar rate, Lyme disease dissemination had preceded that of babesiosis, which may reflect differences in transmissibility, differing reservoir host competences or differential barriers to spread .
Evaluation of genetic variation and structure of B. microti across its endemic range will shed light on the evolution, spread and transmission dynamics of this pathogen. Prior attempts to describe genetic diversity and relatedness of B. microti strains have been hampered by the conserved nature of the genetic markers used; both single locus (18S ribosomal DNA, rDNA, β-tubulin) and variable nucleotide tandem repeats (VNTR) [12–14]. These loci provide limited resolution at the spatio-temporal scale necessary for describing the relatively recent B. microti invasion in northeastern U.S. because of their low genetic diversity. Whole genome level analyses provide a powerful tool to explain origin, patterns and dynamics of spread, as they contain more genetic diversity than individual genes . The first complete genome sequence of a B. microti isolate was reported in 2012  and showed that the parasite is significantly distant from other apicomplexan taxa, including Babesia bovis and Theileria species. The genome of B. microti consists of 4 chromosomes, 1 linear mitochondrial genome and 1 circular apicoplast. The nuclear genome is ~6.5 megabases (Mb), the smallest apicomplexan genome sequenced to date, while the mitochondrial and apicoplast genomes are 11.1 and 28.7 Kb long, respectively [16–18]. A recent estimate of B. microti genome diversity was made using clinical samples , however no study has estimated B. microti diversity from natural populations of both human hosts and tick vectors.
Here, we describe genome-wide diversity, population structure and phylogenetic relationships of B. microti strains obtained from both individual field-collected ticks and human blood samples in babesiosis endemic and emerging areas in the Northeast and upper Midwest. We applied a hybrid capture assay to enrich for near-full-length B. microti genomes and co-capture B. burgdorferi from naturally infected vector and human populations . Using this approach, we were able to survey pathogen genomic diversity directly from mixed DNA samples of tick vectors or human blood, precluding propagation in culture or immunodeficient rodents, methods known to introduce strain biases. Furthermore, by harnessing available genomic data we were able to assess for the first time the full spectrum of genomic diversity of co-infecting pathogens . Genome-wide analyses of the newly sequenced B. microti genomes from the continental U.S. demonstrate that B. microti is structured into subpopulation clusters and provides new insights into the evolutionary history and origin of B. microti strains.
Whole genome capture of tick- and human-derived B. microtiB. microti strains
We performed whole genome capture for 44 B. microti strains from 33 field-collected infected nymphal ticks and 11 human samples. B. microti load, measured by qPCR, was highly variable in field-collected nymphal tick samples (median = 2,314 genome equivalents; range = 14.7–22,259) and in humans samples (median: 71,551 genome equivalents range: 2344–737,340) (Additional file 1: Table S1). The mean capture efficiency for the 44 samples (the proportion of sequence reads mapping to the B. microti R1 reference genome) was 51.6 % (0.31–98.4 %) with a mean genome coverage of 161X (range: 0.10–1,161X) (Table 1). B. microti load in starting samples (prior to hybrid capture) was a significant predictor of B. microti genome coverage in a quasi-Poisson model (p < 0.001) (Additional file 1: Figure S1). Coinfection with B. burgdorferi in ticks was a significant negative predictor for B. microti capture efficiency (p = 0.0002) and average genome coverage (p = 0.0292) in a quasi-Poisson model. For analysis of polymorphisms, we included samples with at least 72.8 % coverage of the genome (4.7 Mb of the 6.4 Mb reference genome) at minimum read depth of 10X, excluding telomeric regions, resulting in 25 B. microti isolates from 12 sites in northeastern U.S. and one site in the upper Midwest (Table 1, Fig. 1a). The 25 B. microti genomes (14 tick- and 11 human-derived strains) had a mean genome coverage of 366X (range: 14–1,937X).
Using SNP calling criteria described in the methods section, we detected 3,767 variants (SNPs, Single Nucleotide Polymorphisms and Indels) at the chromosomal level for this data set. Additional file 1: Figure S2 shows the distribution of the nucleotide diversity (π) per sliding window of 10-kb regions along the four chromosomes of B. microti for the 25 sequenced genomes. On average among the four chromosomes, we found a SNP density of 5.9 SNPs per 10,000 bp.
The complete apicoplast genome sequences (genome size: 28.7 Kb) had a mean genome coverage of 441X (range: 8–2,169X) (Table 1) with a range of 11–144 SNPs/apicoplast genome. The mean pairwise differences per site (π), among the 25 B. microti apicolplast genomes was π = 9.4 × 10−4 (SD = 3.9 × 10−4) (Table 2). We identified 17 unique apicoplast haplotypes composed of 167 segregating sites. The mean haplotype diversity for the apicoplast genomes was Hd = 0.943 (SD = 0.031, Table 2). Nucleotide and haplotype diversity did not significantly differ between the three clades identified by the Bayesian phylogenetic and network analyses (see below), a result that may have been due in part to the difference in sample sizes (Additional file 1: Table S2).
Population structure and differentiation
The genome-wide SNP analyses showed strong genetic differentiation among geographic isolates, despite the relatively low genome-wide diversity across the sequenced B. microti samples. Discriminant analysis of principal components (DAPC) conducted on the genome-wide SNPs from the 25 B. microti samples revealed both continental and population specific patterns of genetic variation. Four main clusters were identified with the two principal components retained accounting for 58 % of the variance (Fig. 1a, b, Additional file 1: Figures S3 and S4). The first discriminant function distinguishes the Northeast from the Wisconsin B. microti samples. The second discriminant function differentiates the Northeast mainland (New England and NY state) and Long Island (hereafter ‘Northeast’) and the Nantucket Island from those from Martha’s Vineyard Island/Cape Cod samples in southeastern Massachusetts (MA, in Fig. 1b). The largest two clusters include both tick- and human-derived samples from the Northeast (mainland New England, New York and Long Island) (cluster 1, n = 12) and Nantucket Island (cluster 2, n = 8). The third cluster comprises tick-derived B. microti samples from Martha’s Vineyard Island/Cape Cod (MA) (cluster 3, n = 3). Cluster four encompasses the two tick-derived samples from Wisconsin (Fig. 1a, b).
To estimate population structure, probability of ancestry and admixture levels in each B. microti sample, we conducted model-based population structure analysis implemented in the ADMIXTURE program . This analysis identified the same clusters found in the DAPC analysis (Additional file 1: Figure S5, Fig. 1c). As in the DAPC analysis, human-derived samples grouped into two distinct clusters and are genetically more similar to non-human isolates than to each other, suggesting that human infectivity is not constrained to a single B. microti clade. The majority of samples did not show sign of genetic admixture with the exception of a human sample from Nantucket Island (N11-15; Fig. 1c).
F ST estimates between clusters 1 (Northeast) and 2 (Nantucket Island), which include both clinical and tick-derived isolates, were low (F ST = 0.136) and associated with a modest number (n = 95) of diagnostic (fixed in alternative states) SNPs. Similar estimates for comparison between clusters 2 (Nantucket Island) and 3 (Martha’s Vineyard Island/Cape Cod) and clusters 1 (Northeast) and 3 (Marta’s Vineyard Island/Cape Cod) revealed high levels of differentiation (Cluster 2–3: F ST = 0.456, 504 diagnostic SNPs; cluster 1–3: F ST = 0.415 and 519 diagnostic SNPs) (Additional file 1: Table S3).
Phylogenetic analysis and divergence time estimation
To investigate the phylogenetic relationships among the 25 B. microti samples, we focused on the apicoplast genome (28.7 Kb) due to the presence of few repetitive elements and a coding density of over 98 % . We constructed a maximum likelihood (ML) tree and conducted a Bayesian coalescent analysis on the 25 complete apicoplast genomes using the mutation rate estimated for the most closely related apicomplexan parasite available, Plasmodium falciparum: 1.2 × 10−8 substitutions/site/year (1.2 % per million years;). The ML tree identified three distinctive clades (Additional file 1: Figure S6). The Bayesian coalescent analysis produced a tree with concordant topology to the ML tree (Fig. 2). The most recent common ancestor between the Northeast and Wisconsin clades was estimated to have occurred 207,000 years ago (95 % Highest Posterior Density –HPD, 115–359). The most recent common ancestor of the Martha’s Vineyard Island/Cape Cod and the Northeast clades was estimated to have occurred 46,000 years ago (95 % HPD 25–70) (Fig. 2).
We constructed a TCS network  to assess the genealogical relationships between the circulating apicoplast haplotypes and to gain insight into the population level phenomena that might have contributed to the observed patterns (Additional file 1: Figure S7). This network confirms the topology obtained from the phylogenetic analyses (Additional file 1: Figure S7, Fig. 2). The statistical parsimony analysis did not connect all the 17 haplotypes into a single network (Additional file 1: Table S2). One of the three identified haplogroups (groups of related haplotypes), which includes the Wisconsin haplotypes, is separate and includes two divergent haplotypes (42 substitutions apart) and is not connected with the other two haplogroups. A second haplogroup includes all the haplotypes from Northeast except for the Martha’s Vineyard Island/Cape Cod haplotypes, which comprise the third haplogroup. The Northeast and Martha’s Vineyard Island/Cape Cod haplogroups are genetically distinct from each other (20 substitutions apart). While the Northeast haplogroup encompassed several divergent haplotypes (from 1 to 19 substitutions), the three haplotypes within the Martha’s Vineyard Island/Cape Cod haplogroup differ by up to 5 substitutions. However, nucleotide diversity and haplotype does not significantly differ between the Northeast and Martha’s Vineyard Island/Cape Cod haplogroups (Additional file 1: Table S2).
The Bayesian Skyline Plot, which describes population changes as a function of time, suggests a B. microti population expansion within the past ~1000 years (Additional file 1: Figure S8, A and B). The strength of this inference might be limited because the Bayes Factors do not favor a Bayesian Skyline model over a constant population size model, possibly because of small sample size and low diversity of the samples.
We used several multiple population genetic statistics to test for historic changes in B. microti population size (Additional file 1: Tables S2 and S3). Tajima’s D and Fu & Li’s F statistics are expected to be 0 in neutrally evolving populations of constant population size and negative in populations undergoing purifying selection or demographic expansion. Tajima’s D and Fu & Li’s F statistics on apicoplast haplotypes were negative (though not significantly different from 0) for all B. microti samples, as well samples from the Northeast clade. These findings potentially indicate historic population expansion but sample sizes were too small for accurate measurement in the Wisconsin and Martha’s Vineyard Island/Cape Cod clades. The mismatch distribution is unimodal for all samples examined together (results not shown), suggesting demographic or spatial population expansion. We tested evidence for a spatial B. microti expansion by fitting a model of spatial expansion to the mismatch distribution in ARLEQUIN  and evaluated expansion model fit using the sum of square deviations (SSD)  and the raggedness index  (Additional file 1: Table S4). Here, non-significant values for SSD and for the raggedness index signify that the data do not deviate from that expected under the model of expansion. The hypothesis of spatial expansion of B. microti could not be rejected for all samples (SSD = 0.022, P = 0.80, Harpending’s raggedness = 0.025, P = 0.70) as well as in the Northeast clade (SSD = 0.11, P = 0.11, Harpending’s raggedness = 0.05, P = 0.62) (Additional file 1: Table S3). The analysis could not be carried out on the other clades because of their small sample size. The more distant estimated expansion time of all samples (τ = 137.44) compared to the Northeast samples (τ = 3.90) likely reflects ancient divergence between the Wisconsin and other samples and a more recent spatial expansion event in Northeast.
Co-infection pattern with B. burgdorferi for human samples
We used a dual-pathogen genome capture array to enrich for multiple targeted parasite species within a single human or vector specimen to assess co-infections pattern with B. burgdorferi. Fifty percent of tick samples were known to be coinfected with B. burgdorferi. We had no a priori knowledge of the coinfection status of the human samples. Four out of the eleven human samples (~36 %; N14-004, N11-23, CT14-14, N11-16) had a mean B. burgdorferi chromosomal coverage of 7.8X (Additional file 1: Table S4) indicating that B. burgdorferi DNA was detectable in the blood of B. microti infected patients at the time of sampling.
Our sample of B. microti collected from a wide spatial range from both ticks and humans reveals higher levels of genetic diversity than previously reported using only single locus or microsatellite markers [12–14], and significantly lower diversity than other Apicomplexa parasites. This higher genetic resolution allowed us to identify fine-scale genomic differentiation and to examine the evolutionary and demographic history of an important emerging parasite.
Comparison of B. microti genomic diversity with other apicomplexan taxa
Genome-wide diversity of B. microti is lower than that of other apicomplexan parasites. For comparison, SNP density observed in our data set (5.9 SNPs per 10,000 bp) is four-fold lower than estimates in other apicomplexan species. For example, in 16 geographically diverse Plasmodium falciparum parasites, a total of 46,937 SNPs (22.4 per 10,000 bp) were identified, demonstrating substantially higher genomic diversity compared to B. microti . The low diversity of B. microti may be explained by non-mutually exclusive mechanisms as compared to P. falciparum, including the ecology of the transmission cycle and the demographic history of the parasite. B. microti is characterized by a low transmission rate compared to P. falciparum. The tick/mouse natural transmission cycle of B. microti spans a minimum of two years, with a single blood meal for pathogen acquisition as a tick larva and another blood meal for transmission as a tick nymph, typically about 11 months after the larval meal [29–31]. This low biting frequency reduces the likelihood of superinfection and thus opportunities for parasite recombination. In strong contrast, malaria mosquito vectors have a 3–4 day feeding cycle, which provide numerous opportunities for parasite infection and/or transmission and parasite diversity during the adult lifespan . The other possible mechanism for the markedly low diversity in B. microti is a recent population bottleneck that occurred during the rapid conversion of forests to farmland and intensive hunting in New England from ~1800–1930, resulting in a well-documented reduction of white-tailed deer and tick populations [14, 33–36].
Genetic differentiation among B. microti isolates
Despite low genome wide diversity, multivariate analyses of the genome-wide SNPs reveals strong population structure, with at least four major parasite genetic clusters in the USA, three of them in the Northeast (Fig. 1a and b). This significant geographic structure is consistent with the low dispersal capability of B. microti, given that the primary reservoirs are short-range migration hosts, such as rodents [11, 21]. Additionally, we found significant genomic divergence among B. microti populations within the Northeast. Samples from Northeast, Nantucket Island and Martha’s Vineyard Island/Cape Cod represented three distinct populations (Fig. 1c). Our findings on B. microti population structure in the continental US support the results of a recent study by Lemieux and colleagues . We also found very limited evidence of genetic admixture among isolates from the northeastern clades (a single human sample from the Nantucket Island population; N11-15; Fig. 1c), consistent with a low dispersal capability of B. microti and thus a limited low gene flow between these populations.
Phylogenetic analyses, timing of the divergence and demography
The ML and Bayesian phylogenetic analyses of the complete apicoplast genomes provide timing of divergence of the B. microti clades that were identified. On a continent-wide scale, our findings are consistent with the Northeast and Midwest B. microti clades having common ancestry in the Pleistocene, as long as ~200,000 years ago, and possibly more recently (Figs. 1a and 2), but cannot distinguish between either a west-to-east or east-to-west radiation of B. microti across North America. Inferring directionality of B. microti radiation requires inclusion of a more distantly related Babesia species to B. microti (i.e. Babesia rodhaini or Babesia duncani) as an outgroup, however their genomes have not been sequenced yet. At a regional scale, the ML and Bayesian phylogenetic analyses confirmed the genetic distinctiveness of the B. microti isolates from Martha’s Vineyard Island/Cape Cod compared to the isolates from Nantucket Island and from the Northeast sites, despite their geographic proximity (Fig. 2). These phylogenetic analyses also provided a time estimate for the divergence of the Martha’s Vineyard Island/Cape Cod and Nantucket Island clades. While the timing for the split between the Wisconsin and Northeastern samples dates back to ~ 207,000 years ago (95 % HPD x115-359), the split between Martha’s Vineyard Island/Cape Cod and Nantucket Island is relatively recent ~46,000 years (95 % HPD 25–70) (Fig. 2). Notably, we used an established mutation rate of 1.2 × 10−8 sub./site/year; ; and this mutation rate may be very conservative; however, there is currently a paucity of apicoplast substitution rate estimates. It is unclear which species of tick maintained B. microti over geologic timeframes. While I. scapularis is the dominant present day vector, other vectors may also have played a role, as there is evidence that B. microti was able to be maintained in the absence of I. scapularis in Martha’s Vineyard Island through a sylvatic cycle by the rodent-feeding nidicolous tick species, Ixodes muris or I. angustus [13, 14].
Sampled B. microti harbor a genomic signature of historic spatial expansion across all samples as well as for the Northeast clade (Additional file 1: Table S3), although our small sample size limits precise estimation of the timing of a historic population bottleneck (Additional file 1: Table S3). The Bayesian Skyline Plot suggests a B. microti population expansion in the northeastern U.S. that occurs within the past 1000 years. Although limited by sample size, the topology of the haplotype network (Additional file 1: Figure S7) suggests an older expansion of the Northeast clade than the one including the Martha’s Vineyard Island/Cape Cod haplogroups, which may suggest a west to east spread.
The combination of our genome-wide SNP approach, the coalescent based phylogenetic analyses of the apicoplast genomes, and the network analyses allowed us to describe patterns and timing of divergence among isolates found at both large and at fine spatial scale that were undetectable using previous methods . Further sampling will likely yield a more refined understanding of invasion history, as well as increased monitoring of the levels of genetic admixture among strains. Such information may provide important epidemiological and clinical insights.
The origin of human clinical isolates
Based on the genome-wide SNP analysis, clinical samples fall within two genetic clusters (Northeast and Nantucket Island), indicating that multiple B. microti variants circulating in tick populations are infectious to humans (Fig. 1b, c). Generating a genomic catalog of both human and tick-derived B. microti strains will provide a foundation for continued study of determinants of B. microti virulence in humans. Further genomic studies are needed to identify protein-coding loci or other genomic regions that could be more efficiently targeted for development of improved diagnostics, anti-parasitic compounds or identification of virulence factors.
Our study demonstrates that hybrid capture can simultaneously enrich and whole-genome sequence multiple targeted parasite species within a single human or vector specimen, allowing for study of the effect of multi-parasite epidemiological and of clinical outcomes. We found a pattern of coinfections with B. burgdorferi in ~36 % of the human samples. The observed coinfection rate is not unexpected given the high coinfection rate of B. burdgorferi in the northeastern tick populations (up to 18.6 % in adult I. scapularis), potentially driven by positive interactions between B. microti and B. burgdorferi in ticks and hosts . Although our multiplexed hybrid capture method is highly sensitive and scalable, we determined that B. microti copy number in initial samples is a significant predictor for capture efficiency and genome coverage (Additional file 1: Figure S1), and we propose ~2300 B. microti genome equivalents as a conservative minimum parasite load estimate required in vector samples to achieve a genome coverage suitable for population genomics applications .
We demonstrate that B. microti genomes reveal a signature of ancient evolutionary processes that place population-history divergence of the Northeast and Midwest B. microti populations at least back to the Pleistocene, while the northeastern clades diverged more recently. The association of human-derived samples to distinct, rarely intermixing clusters, raises the possibility that divergent infectious phenotypes affect humans, resulting in differential clinical outcomes and epidemiological patterns. Our work provides the foundation for more extensive sampling and genome wide association studies to explore the full implications of the observed genomic structure.
B. microti-infected ticks were sampled from a large spatial range reflecting the current geographical distribution of B. microti in the United States and hypothesized ancestral sites of its tick vector . Samples include 33 B. microti strains derived from host-seeking I. scapularis nymphs from 25 sites from the continental U.S. (New York, Connecticut, Massachusetts, New Hampshire and Wisconsin) and 11 strains from peripheral blood samples obtained from patients from 3 sites in the northeastern U.S.: the original babesiosis focus, Nantucket Island and other currently endemic sites in Connecticut and Maine. Field collections of I. scapularis, DNA extraction from individual ticks and human blood samples, and qPCR testing for B. microti infection and parasite load determination were perfomred using previously described protocols [39, 40].
Whole genome capture and sequencing
B. microti genomic DNA was enriched from mixed DNA templates of tick and human samples using a custom hybridization capture array following the NimbleGen SeqCap method (Madison, USA) . Hybrid capture techniques overcome the need of propagation of B. microti in rodents or whole genome amplification, methods used to enrich pathogens prior to genome sequencing and known to introduce biases in strain representation and propagation of sequencing errors, respectively [41, 42]. We used a dual-genome pathogen capture array targeting B. microti and B. burgdorferi genomes. Custom DNA probes were designed in silico to tile 99.9 % of the B. microti R1 reference genome (GenBank: GCF_000691945.1; genome size 6.4 Mb, 4 chromosomes and 1 mitochodrion; GenBank: LK028575 apicoplast complete genome, 28.7 Kb) [16–18], while B. burgdorferi, probes were design as previously reported . Illumina library preparation, hybridization capture and sequencing was conducted at the Yale Center for Genomic Analysis (YCGA). Briefly, library preparation for each sample was conducted using a modified Roche/Nimblegen SeqCap EZ Library Short Read protocol . Library concentration was determined using PicoGreen assay (Invitrogen) and size selection was performed on a Caliper LabChip GX instrument (PerkinElmer). Equimolar amounts of each indexed genomic library were pooled in 10-plex prior to capture for a total of 1 ug total genomic DNA per hybridization reaction. Samples were heat-denatured and mixed with the custom DNA probes (Roche/NimbleGen) and hybridization performed at 47 °C for 68 h. Samples were then washed with a series of stringent buffers to remove non-specifically bound DNA fragments. The captured fragments were PCR amplified and purified with AMPure XP beads. Sample concentrations were normalized to 2 nM and loaded onto two lanes of Illumina version 3 flow cells at a concentration that yields 170–200 million passing filter clusters per lane. Samples were sequenced using 75 bp paired end sequencing on an Illumina HiSeq 2500 according to Illumina protocols. Sample de-multiplexing was performed using Illumina’ s CASAVA 1.8.2 software.
We used univariate logistic regression to detect correlates of B. microti capture efficiency and genome coverage. We used a quasi-poisson model to account for overdispersion. Statistical analyses were implemented in R version 3.1.1 (R Core. 2014).
Read alignment and SNP detection
Illumina sequence reads for each sample were aligned to the newly annotated B. microti R1 reference genome and the apicoplast complete genome [18, 44] using BWA alignment tool  (v. 0.7.7, bwa mem, default parameters). PCR duplicates were identified and excluded from downstream analyses using Picard (v. 1.117) MarkDuplicates (http://picard.sourceforge.net/). Aligned reads were realigned around indels using GATK (v. 3.3) IndelRealigner [46, 47] and base quality scores of realigned reads were then recalibrated using GATK TableRecalibration. Genome coverage for each sample was calculated using GATK DepthOfCoverage (mmq > 20, mbp > 20). For downstream analyses we included samples for which >75 % of the B. microti R1 reference genome had at least 10X coverage, resulting in a total of 25 B. microti strains (13 tick- and 11 human-derived strains). Genetic variants (SNPs and INDELs) were identified using GATK UnifiedGenotyper and stringent filters were applied using GATK VariantFiltration to achieve a high confidence SNP set [(DP < 12) || (QUAL < 50) || (SB > −0.10) || (MQ0 > = 2 && (MQ0/(1.0 * DP)) > 0.1)]. SNPs identified in B. microti telomeres  were excluded to minimize potential false-positive variant calls due to low sequence complexity in these regions that precluded accurate short read mapping.
To determine co-infection pattern of human samples with B. burgdorferi, Illumina sequence reads for these samples were aligned to the B. burgodorferi B31 reference genome (910 Kb) and mapping statistics were calculated using the same methods and filtering parameters as described above.
Discriminant Analysis of Principal Components (DAPC), a non model-based method implemented in the adegent R package [48, 49] was used to describe genetic clusters using synthetic linear combination of variables (in this case SNP genotypes) that best differentiate between two or more groups of individuals. First, we ran sequential K-means clustering algorithm for K = 2 to K = 22 on the SNP data transformed by principal component analysis and subsequently clusters were identified using discriminant analysis (DA). We examined and compared the clustering solutions using Bayesian Information Criterion (BIC). Based on this analysis, K = 4 represented the optimal number of clusters to describe our B. microti SNP data set (Additional file 1: Figure S2). To minimize overfitting of the data we retained two principal components, representing 58.6 % of the total variance, following optimization procedure as recommended by the package developers  (Additional file 1: Figure S4).
We used ADMIXTURE  to estimate the optimum number of ancestral clusters in our dataset and to assess the ancestry of the 25 B. microti samples using all bi-allelic SNPs and a subset of SNPs thinned to remove those likely in LD . Considering only fully called sites, F ST was calculated in vcftools  using corrections for small sample size [52, 53].
Apicolplast diversity and demographic history
The number of segregating sites, total number of mutations, number of haplotypes, haplotype gene diversity and two measures of nucleotide diversity, and θ W, [54, 55] were calculated using DnaSP v5 . To establish neutrality and to examine demographic history of B. microti, Fu & Li’s D and F statistics  in addition to Tajima’s D  were calculated in DnaSP v5 . In a population at demographic equilibrium, the mismatch distribution is expected to multimodal; in populations that recently passed through a demographic or spatial expansion, the mismatch distribution is expected to be unimodal [59, 60].
Phylogenetic analysis and divergence time estimation
The 25 B. microti apicoplast genome sequences generated in this study were aligned using ClustalW  with default parameters, and the alignment was manually adjusted to maintain reading frame integrity in the protein coding genes. The resulting alignment contained 28,620 sites. The genealogical relationships among the apicoplast genomes were investigated using several approaches. First, we estimated phylogenetic networks using the statistical parsimony procedure implemented in the software TCS 1.2  using the 95 % limit for a parsimonious connection. Phylogenetic analyses were conducted using maximum likelihood (ML) inference as implemented in MEGA v5.2.2  and Bayesian coalescent analysis using the BEAST package v1.7.1 . The program jModelTest was used to determine the most appropriate substitution models for both ML and Bayesian analyses (TIM + G + I was selected using the Bayesian information criterion). We conducted BEAST coalescent analysis to estimate the genetic divergence of our apicoplast genome sequences and determine whether our data reflect changes in B. microti population sizes. We used the known substitution rate for the most closely related apicoplexan parasite available to calibrate our phylogeny and date internal nodes. The used substitution rate derives from a mitochondrial substitution rate for Plasmodium of 1.2 × 10−8 sub./site/year (1.2 % per million years;); a previous analysis of mitochondrial and apicoplast evolutionary rates suggested that they are broadly similar . The application of this rate may be speculative; however, there is currently a paucity of apicoplast substitution rate estimates. All Markov chains were run for 20 million generations, sampled every 2000 generations and the first 10 % of each chain was discarded as burn-in period. Analyses were run in duplicate to ensure Markov chain convergence. Strict and relaxed clocks (uncorrelated lognormal clock  were used, as were constant and Bayesian Skyline Plot  demographic models. Default priors were used for most model parameters; the exceptions were broad lognormal priors applied to population sizes and truncated exponential priors on the variance of the uncorrelated lognormal clock. Clock and demographic models were compared using Bayes factors. Marginal likelihoods for all combinations of clock and demographic models were estimated using path sampling, stepping stone and harmonic estimators [67, 68]. The combination of a relaxed clock and a constant population size had the highest marginal likelihood, and received “strong” support over analyses with strict clock models; there was equivocal support for a constant over variable population history. All apicoplast phylogenetic results will reference the analysis with the TIM + G + I, uncorrelated lognormal clock and constant size models. The program Tracer v1.5 (http://tree.bio.ed.ac.uk/software/tracer/) was used to inspect the BEAST output for convergence.
To investigate the demographic history of B. microti samples, we tested for a signal of spatial expansion using the mismatch distribution as implemented in ARLEQUIN . In a population at demographic equilibrium, the mismatch distribution is expected to multimodal; in populations that recently passed through a demographic or spatial expansion, the mismatch distribution is expected to be unimodal [59, 60].
Vannier E, Krause PJ. Human babesiosis. N Engl J Med. 2012;366(25):2397–407.
Diuk-Wasser MA, Vannier E, Krause PJ. Coinfection by ixodes tick-borne pathogens: ecological, epidemiological, and clinical consequences. Trends Parasitol. 2016;32(1):30–42.
White DJ, Talarico J, Chang HG, Birkhead GS, Heimberger T, Morse DL. Human babesiosis in New York State: review of 139 hospitalized cases and analysis of prognostic factors. Arch Intern Med. 1998;158(19):2149–54.
Krause PJ, Gewurz BE, Hill D, Marty FM, Vannier E, Foppa IM, Furman RR, Neuhaus E, Skowron G, Gupta S, et al. Persistent and relapsing babesiosis in immunocompromised patients. Clin Infect Dis. 2008;46(3):370–6.
Ngo V, Civen R. Babesiosis acquired through blood transfusion, California, USA. Emerg Infect Dis. 2009;15(5):785–7.
Young C, Krause PJ. The problem of transfusion-transmitted babesiosis. Transfusion. 2009;49(12):2548–50.
Herwaldt BL, Linden JV, Bosserman E, Young C, Olkowska D, Wilson M. Transfusion-associated babesiosis in the United States: a description of cases. Ann Intern Med. 2011;155(8):509–19.
Mather TN, Telford 3rd SR, Moore SI, Spielman A. Borrelia burgdorferi and Babesia microti: efficiency of transmission from reservoirs to vector ticks (Ixodes dammini). Exp Parasitol. 1990;70(1):55–61.
Matuschka FR, Spielman A. The emergence of Lyme disease in a changing environment in North America and central Europe. Exp Appl Acarol. 1986;2(4):337–53.
Spielman A, Wilson ML, Levine JF, Piesman J. Ecology of Ixodes dammini-borne human babesiosis and Lyme disease. Annu Rev Entomol. 1985;30:439–60.
Walter KS, Pepin KM, Webb CT, Gaff HD, Krause PJ, Pitzer VE, Diuk-Wasser MA. Invasion of two tick-borne diseases across New England: harnessing human surveillance data to capture underlying ecological invasion processes. Proc Biol Sci/Roy Soc. 2016;283(1832).
Goethert HK, Telford 3rd SR. What is Babesia microti? Parasitology. 2003;127(Pt 4):301–9.
Goethert HK, Lubelcyzk C, LaCombe E, Holman M, Rand P, Smith Jr RP, Telford 3rd SR. Enzootic Babesia microti in Maine. J Parasitol. 2003;89(5):1069–71.
Goethert HK, Telford 3rd SR. Not “out of Nantucket”: Babesia microti in southern New England comprises at least two major populations. Parasit Vectors. 2014;7:546.
Sintchenko V, Holmes EC. The role of pathogen genomics in assessing disease transmission. BMJ. 2015;350:h1314.
Cornillot E, Hadj-Kaddour K, Dassouli A, Noel B, Ranwez V, Vacherie B, Augagneur Y, Bres V, Duclos A, Randazzo S, et al. Sequencing of the smallest Apicomplexan genome from the human pathogen Babesia microti. Nucleic Acids Res. 2012;40(18):9102–14.
Cornillot E, Dassouli A, Garg A, Pachikara N, Randazzo S, Depoix D, Carcy B, Delbecq S, Frutos R, Silva JC, et al. Whole genome mapping and re-organization of the nuclear and mitochondrial genomes of Babesia microti isolates. PLoS One. 2013;8(9):e72657.
Garg A, Stein A, Zhao W, Dwivedi A, Frutos R, Cornillot E, Ben Mamoun C. Sequence and annotation of the apicoplast genome of the human pathogen Babesia microti. PLoS One. 2014;9(10):e107939.
Lemieux JE, Tran AD, Freimark L, Schaffner SF, Goethert H, Andersen KG, Bazner S, Li A, McGrath G, Sloan L, et al. A global map of genetic diversity in Babesia microti reveals strong population structure and identifies variants associated with clinical relapse. Nat Microbiol. 2016;1(7):16079.
Carpi G, Walter KS, Bent SJ, Hoen AG, Diuk-Wasser M, Caccone A. Whole genome capture of vector-borne pathogens from mixed DNA samples: a case study of Borrelia burgdorferi. BMC Genomics. 2015;16:434.
Hersh MH, Tibbetts M, Strauss M, Ostfeld RS, Keesing F. Reservoir competence of wildlife host species for Babesia microti. Emerg Infect Dis. 2012;18(12):1951–7.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64.
Ricklefs RE, Outlaw DC. A molecular clock for malaria parasites. Science. 2010;329(5988):226–9.
Clement M, Posada D, Crandall KA. TCS: a computer program to estimate gene genealogies. Mol Ecol. 2000;9(10):1657–9.
Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform. 2005;1:47–50.
Schneider S, Excoffier L. Estimation of past demographic parameters from the distribution of pairwise differences when the mutation rates very among sites: Application to human mitochondrial DNA. Genetics. 1999;152(3):1079–89.
Harpending HC. Signature of ancient population growth in a low-resolution mitochondrial DNA mismatch distribution. Hum Biol. 1994;66(4):591–600.
Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner Jr DA, Daily JP, Sarr O, Ndiaye D, Ndir O, et al. A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007;39(1):113–9.
Sonenshine DE. Biology of ticks, vol. 2. New York: Oxford University Press; 1993.
Davis S, Bent SJ, Steere AC. Loop analysis for pathogens: niche partitioning in the transmission graph for pathogens of the North American tick Ixodes scapularis Lyme disease. J Theor Biol. 2011;269(1):96–103.
Brunner JL, Ostfeld RS. Multiple causes of variable tick burdens on small-mammal hosts. Ecology. 2008;89(8):2259–72.
Gilles HM, Warrell DA. Essential malariology. 3rd ed. 1993.
Steere AC, Malawista SE. Cases of Lyme disease in the United States: locations correlated with distribution of Ixodes dammini. Ann Intern Med. 1979;91(5):730–3.
Wood CL, Lafferty KD. Biodiversity and disease: a synthesis of ecological perspectives on Lyme disease transmission. Trends Ecol Evol. 2013;28(4):239–47.
Cronon W. Changes in the Land: Indians, Colonists, and the Ecology of New England. 1983.
Sigal LH, Curran AS. Lyme disease: a multifocal worldwide epidemic. Annu Rev Public Health. 1991;12:85–109.
Dunn JM, Krause PJ, Davis S, Vannier EG, Fitzpatrick MC, Rollend L, Belperron AA, States SL, Stacey A, Bockenstedt LK, et al. Borrelia burgdorferi promotes the establishment of Babesia microti in the northeastern United States. PLoS One. 2014;9(12):e115494.
Alex Buerkle C, Gompert Z. Population genomics based on low coverage sequencing: how low should we go? Mol Ecol. 2013;22(11):3028–35.
Diuk-Wasser MA, Liu Y, Steeves TK, Folsom-O'Keefe C, Dardick KR, Lepore T, Bent SJ, Usmani-Brown S, Telford 3rd SR, Fish D, et al. Monitoring human babesiosis emergence through vector surveillance New England, USA. Emerg Infect Dis. 2014;20(2):225–31.
Rollend L, Bent SJ, Krause PJ, Usmani-Brown S, Steeves TK, States SL, Lepore T, Ryan R, Dias F, Ben Mamoun C, et al. Quantitative PCR for detection of Babesia microti in Ixodes scapularis ticks and in human blood. Vector Borne Zoonotic Dis. 2013;13(11):784–90.
Lasken RS, Stockwell TB. Mechanism of chimera formation during the Multiple Displacement Amplification reaction. BMC Biotechnol. 2007;7:19.
Meyerhans A, Vartanian JP, Wain-Hobson S. DNA recombination during PCR. Nucleic Acids Res. 1990;18(7):1687–91.
Roche NimbleGen. SeqCap EZ library SR user’s guide version 4.2. Madison: Roche NimbleGen, Inc; 2013.
Silva JC, Cornillot E, McCracken C, Usmani-Brown S, Dwivedi A, Ifeonu OO, Crabtree J, Gotia HT, Virji AZ, Reynes C, Colinge J, Kumar V, Lawres L, Pazzi JE, Pablo JV, Hung C, Brancato J, Kumari P, Orvis J, Tretina K, Chibucos M, Ott S, Sadzewicz L, Sengamalay N, Shetty AC, Su Q, Tallon L, Fraser CM, Frutos R, Molina DM, Krause PJ, Ben Mamoun C.Genome-wide diversity and gene expression profiling of Babesia microti isolates identify polymorphic genes that mediate host-pathogen interactions. Sci Rep. 2016;6:35284. doi:10.1038/srep35284.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43:11.10.1-33. doi:10.1002/0471250953.bi1110s43.
Jombart T, Devillard S, Dufour AB, Pontier D. Revealing cryptic spatial patterns in genetic variability by a new multivariate method. Heredity (Edinb). 2008;101(1):92–103.
Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24(11):1403–5.
Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 2010;11:94.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Weir BS, Hill WG. Estimating F-statistics. Annu Rev Genet. 2002;36:721–50.
Cockerham CC, Weir BS. Covariances of relatives stemming from a population undergoing mixed self and random mating. Biometrics. 1984;40(1):157–64.
Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7(2):256–76.
Nei M, Kumar S. Molecular evolution and phylogenetics. New York: Oxford University Press; 2000.
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25(11):1451–2.
Fu YX, Li WH. Maximum likelihood estimation of population parameters. Genetics. 1993;134(4):1261–70.
Tajima F. The effect of change in population size on DNA polymorphism. Genetics. 1989;123(3):597–601.
Slatkin M, Hudson RR. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics. 1991;129(2):555–62.
Rogers AR, Harpending H. Population growth makes waves in the distribution of pairwise genetic differences. Mol Biol Evol. 1992;9(3):552–69.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–80.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28(10):2731–9.
Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29(8):1969–73.
Outlaw DC, Ricklefs RE. Comparative gene evolution in haemosporidian (apicomplexa) parasites of birds and mammals. Mol Biol Evol. 2010;27(3):537–42.
Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4(5):e88.
Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005;22(5):1185–92.
Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA, Alekseyenko AV. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol. 2012;29(9):2157–67.
Baele G, Li WL, Drummond AJ, Suchard MA, Lemey P. Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. Mol Biol Evol. 2013;30(2):239–43.
Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214.
The authors thank Janna Brancato for technical assistance with DNA extraction of human samples.
This project was funded by NIH/NIAID R21AI112938 and the Gordon and Llura Gund Foundation. GC was supported by the Gaylord Donnelley Postdoctoral Environmental Fellowship (the Yale Institute for Biospheric Studies).
Availability of data and materials
Metadata and sequence data of each sample in this study were submitted to NCBI Short Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/), under SRA accession: SRP058536. Sequence alignment and phylogenetic tree used to generate Fig. 2 were submitted to TreeBase (Accession number is 20043 and Study Accession URL: http://purl.org/phylo/treebase/phylows/study/TB2:S20044).
GC, KSW, AC, MD, conceived and designed the study; GC, KSW, AK generated and analyzed the data; CBM, PJK, AD, EC, LT contributed reagents/materials/analysis tools; GC wrote the manuscript; All authors read, edited and approved the manuscript for publication.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The protocol used to collect human blood samples for this study was approved by the Yale HIC (IRB 0905005140).
About this article
Cite this article
Carpi, G., Walter, K.S., Mamoun, C.B. et al. Babesia microti from humans and ticks hold a genomic signature of strong population structure in the United States. BMC Genomics 17, 888 (2016). https://doi.org/10.1186/s12864-016-3225-x
- Tick-borne pathogen
- Hybrid capture
- Population genomics
- Coalescent analysis