Edinburgh Research Explorer Population genomic and evolutionary modelling analyses reveal a single major QTL for ivermectin drug resistance in the pathogenic nematode, Haemonchus contortus

and evolutionary a single major QTL for drug resistance the Abstract Background: Infections with helminths cause an enormous disease burden in billions of animals and plants worldwide. Large scale use of anthelmintics has driven the evolution of resistance in a number of species that infect livestock and companion animals, and there are growing concerns regarding the reduced efficacy in some human-infective helminths. Understanding the mechanisms by which resistance evolves is the focus of increasing interest; robust genetic analysis of helminths is challenging, and although many candidate genes have been proposed, the genetic basis of resistance remains poorly resolved. Results: Here, we present a genome-wide analysis of two genetic crosses between ivermectin resistant and sensitive isolates of the parasitic nematode Haemonchus contortus , an economically important gastrointestinal parasite of small ruminants and a model for anthelmintic research. Whole genome sequencing of parental populations, and key stages throughout the crosses, identified extensive genomic diversity that differentiates populations, but after backcrossing and selection, a single genomic quantitative trait locus (QTL) localised on chromosome V was revealed to be associated with ivermectin resistance. This QTL was common between the two geographically and genetically divergent resistant populations and did not include any leading candidate genes, suggestive of a previously uncharacterised mechanism and/or driver of resistance. Despite limited resolution due to low recombination in this region, population genetic analyses and novel evolutionary models supported strong selection at this QTL, driven by at least partial dominance of the resistant allele, and that large resistance-associated haplotype blocks were enriched in response to selection. Conclusions: We have described the genetic architecture and mode of ivermectin selection, revealing a major genomic locus associated with ivermectin resistance, the most conclusive evidence to date in any parasitic nematode. This study highlights a novel genome-wide approach to the analysis of a genetic cross in non-model organisms with extreme genetic diversity, and the importance of a high-quality reference genome in interpreting the signals of selection so identified.


Background
Parasitic worms, commonly termed helminths, are extremely diverse and frequently responsible for significant morbidity and mortality in their hosts. Control of helminths of human and veterinary importance is heavily dependent on the large-scale administration of anthelmintic drugs; for instance, the macrocyclic lactone ivermectin has been extremely successful in the control of a number of helminths in both humans and animals [1]. These successes are now threatened by the emergence of drug resistance. In many species of parasitic nematodes of livestock, anthelmintic drug resistance is already a major problem for global agricultural production and animal welfare and the rapid acquisition of resistance to both single and multiple drug classes has been widely documented [2,3]. Furthermore, there are growing concerns regarding the reduced efficacy of compounds used in mass drug administration (MDA) programs for some human-infective helminths, which may reflect the emergence of resistance [4][5][6][7]. In spite of the importance of this issue, remarkably little is known regarding the molecular mechanisms of resistance to most anthelmintic drug groups, with the notable exception of the benzimidazole class. This is, at least in part, due to a lack of genomic resources, tools and techniques with which to study these experimentally challenging organisms.
Haemonchus contortus is a gastrointestinal parasite of wild and domesticated ruminants that has a major impact on the health and economic productivity of sheep and goats globally. Resistance of H. contortus to almost all of the classes of anthelmintic drugs, including to multiple classes simultaneously, has been documented in many regions of the world [8][9][10][11][12], and can arise within a just a few years of introduction of a new drug class [13,14]. Partly for these reasons, H. contortus has emerged as a model parasitic nematode to characterise anthelmintic resistance, as well as drug and vaccine discovery research as alternate means of control [15][16][17]. Its utility as a model is largely due to a greater amenability to experimentation than most parasitic nematodes; it is possible to establish and maintain isolates in vivo in the natural host, perform genetic crosses in vivo, and undertake in vitro culture for part of its life cycle, allowing drug assays and genetic manipulation such as RNAi to be performed [18]. The ability to utilise these molecular approaches is complemented by extensive information about the structure of the genome and transcriptional differences between the major life stages [19][20][21].
Research into the genetic basis of ivermectin resistance has been dominated by the examination of a number of candidate genes [22]. Chosen based on their potential roles in the mechanism of action or efflux of drugs, many candidate genes have been proposed to be associated with ivermectin resistance in H. contortus (and other parasitic nematodes targeted with ivermectin) on the basis of studies comparing SNP or haplotype frequencies between small numbers of resistant and susceptible isolates [23][24][25][26][27], or pre-and post-ivermectin treatment [28,29]. However, considering the extremely high levels of genetic diversity of H. contortus populations, together with the limited number of well-characterised ivermectin resistant isolates, at best only circumstantial and inconsistent support is available for the involvement of any of the leading candidate genes. Further, careful validation of a number of these candidates in different field populations [30] or in controlled crosses [31] has failed to support previously defined associations between tested candidates and their associated resistance profile. The lack of consistency between studies had led to much discussion and debate regarding the complexity of the genetic basis of ivermectin resistance, both in terms of the number of loci involved and the extent of genetic variation between geographically distinct resistant isolates [22,32,33]. Moreover, the discordance among studies broadly reflects the difficulty in identifying real associations between genotype and phenotype simply by comparing resistant and susceptible populations, or individuals, due to the confounding effects of extremely high levels of genome-wide genetic variation [34].
Genome-wide and genetic mapping approaches are emerging for H. contortus due to progress in genetic crossing methodologies [21,[35][36][37][38][39][40] and an increasing complement of genomic resources [19][20][21]. Two serial backcrosses have previously been undertaken between the susceptible isolate MHco3(ISE) and two geographically independent ivermectin resistant isolates, MHco4(WRS) and MHco10(CAVR) [39] (Fig. 1a,b). Co-segregation of a single microsatellite marker, Hcms8a20, confirmed that these backcrosses had successfully introgressed ivermectin resistance loci from the two resistant populations into the susceptible MHco3(ISE) genomic background. Subsequent genetic studies showed that none of the leading candidate genes from the literature -Hco-avr-14, Hcoglc-5, Hco-lgc-37, Hco-pgp-9, Hco-pgp-2 and Hco-dyf-7showed evidence of introgression [31], and therefore, the genetic mediator(s) of resistance remain unresolved. However, the increasing accessibility to high throughput sequencing of non-model organisms, together with a high-quality reference genome for H. contortus, offers an opportunity to characterise precisely the genome-wide evidence of introgression and genetic architecture in this parasite.
Here we build upon these two previous genetic crosses, extending them for further generations of passage with ivermectin selection (Fig. 1c), and generating whole genome sequencing data throughout the experiment. We used a bulk segregant approach, together with a recently improved chromosome-scale genome assembly, to characterise the genetic architecture in the two genetically and geographically distinct ivermectin resistance populations of H. contortus. In a bulk segregant analysis, allele frequency differences are determined genome-wide between pools of individuals that differ in a defined phenotype. This approach has recently been used to map resistance-associated genes in a number of field and laboratory crosses of parasites [41][42][43], including to map variation associated with ivermectin efficacy in three different helminth species [5,44,45]. We analyse these data with traditional population genetic and novel statistical methods to identify and characterise a Fig. 1 Outline of the initial crosses, backcrosses, in vivo passage and selection experiments. a. Ivermectin resistant populations MHco10(CAVR) or MHco4(WRS) ("resistant" haplotypes are depicted as red lines) were crossed with an ivermectin sensitive population MHco3(ISE) ("susceptible" haplotypes as blue lines) to generate heterozygous F 1 progeny. F 1 eggs were cultured in vitro to L 3 , before maturation in vivo to L 4 /immature adults, from which females were used to initiate the backcross. b. The first round of backcross was performed by crossing heterozygous females from the initial cross with susceptible MHco3(ISE) males in vivo, resulting in F 2 progeny with reduced heterozygosity due to enrichment of MHco3(ISE) haplotypes. Resistance alleles were maintained in the backcross population by selection with ivermectin, before seeding the next round of the backcross with cross-derived heterozygous females and new susceptible MHco3(ISE) males. This process was repeated for four rounds of backcrossing, resulting in the backcrossed population becoming genetically similar to the MHco3(ISE) parental line in all regions of the genome not linked to ivermectin resistance. After four rounds of backcrossing, introgressed L 3 progeny were used to infect a new recipient sheep, resulting in segregation of susceptible and resistant alleles in both haplotypes among the progeny (mixed red/blue haplotypes). Eggs were cultured to L 3 before infecting two sheep, one exposed to ivermectin and one that did not receive drug. Post ivermectin treatment, L 4 /immature adults from both sheep were recovered on necropsy for sequencing. c. Post-backcross L 3 were further passaged into a worm-naive sheep and treated with ivermectin, after which eggs were recovered and cultured to infective L 3 for reinfection. This process was repeated for four generations of passage with selection (but without backcrossing). L 4 /immature adults were recovered after passage three and four for sequence analysis single genomic introgression region and major QTL associated with resistance. This work represents the most comprehensive analysis of the genome-wide impact of selection and genetics of anthelmintic resistance in a parasitic nematode to date. Further, the study demonstrates the power of using a genetic crossing approach, enhanced by the use of a highly contiguous genome assembly as a framework for genome-wide analyses, to eliminate false positive genetic signals in candidate genes previously associated with resistance.  (Fig. 2a), with the average diversity almost twice as high in the resistant isolates (MHco10[CAVR] mean π = 0.035 ± 0.008 standard deviations (SD); MHco4[WRS] mean π = 0.038 ± 0.008 SD) than the susceptible one (MHco3[ISE] mean π = 0.022 ± 0.008 SD) ( Fig. 2c; Additional file 1: Figure S1A). The two X chromosomal scaffolds were significantly less diverse (MHco3[ISE] mean π = 0.008 ± 0.008, MHco10[CAVR] mean π = 0.014 ± 0.012, MHco4[WRS] mean π = 0.017 ± 0.014) ( Fig. 2c; Additional file 1: Figure S1A), consistent with our recent finding that the X chromosome contains as little as 10% as much genetic diversity relative to the autosomes [21]. Each parental population contained local regions of high diversity that differentiated it from the others; of thẽ 7.6 million biallelic SNPs distributed in the genomes of the samples analysed,~514 thousand were private to MHco3(ISE) (Additional file 2: Figure S2A),~960 thousand to MHco10(CAVR) (Additional file 2: Figure S2B) and~685 thousand to MHco4(WRS) (Additional file 2: Figure S2C). In addition, high quality homozygous structural variants were evident between populations (Additional file 3: Figure S3A,C,E), with a large number of deletions, and substantially fewer duplications and inversions detected (Additional file 3: Figure S3B,D,F). Further, we examined short-range haplotype diversity in each population by measuring linkage disequilibrium (LD) between pairs of SNPs detected in paired reads (Additional file 1: Figure S1B). Although restricted by the Pool-seq design to pairwise SNP comparisons less than 500 bp apart, i.e., within a read pair, we nevertheless observed considerable decay in LD over this distance. Moreover, the rate of LD decay was correlated with nucleotide diversity: a greater loss of LD in the MHco4(WRS) and MHco10(CAVR) populations over the 500 bp was observed, relative to the less diverse, susceptible MHco3(ISE) population.

Results
Genetic diversity between each of the three parental populations was assessed by pairwise analysis (measured by F ST of single nucleotide polymorphisms [SNPs] in 10 kbp windows), which confirmed significant genome-wide differentiation (Fig. 2d). This is highlighted by multiple discrete peaks of differentiation within each chromosome, both between resistant and susceptible stains, but also between the two resistant populations. We must therefore conclude that most of this genetic differentiation reflects underlying genetic structure between populations unrelated to their drug resistance phenotype; this is perhaps not surprising, given they are derived from reproductively isolated populations due to their geographic distribution (Fig. 2b). In addition, MHco3(ISE) was subject to multiple rounds of inbreeding during is original derivation as a laboratory population [46], whereas the resistant populations have not been subjected to deliberate inbreeding and were isolated from outbred populations more recently. Collectively, these data emphasise the challenge of characterising genetic variation associated with phenotypes such as drug resistance by simply comparing genetic diversity between a susceptible and resistant population without accounting for the extensive background genetic variation present.
Genome-wide analysis of genetic diversity reveals the same single large introgressed region in both backcross lines To map genetic variation linked to ivermectin resistance in the two parental resistance populations, we sequenced preand post ivermectin treatment samples of the initial backcross between each of MHco10(CAVR) and MHco4(WRS) populations with the MHco3(ISE) susceptible genetic background [39], as well as after subsequent passage of the introgressed populations with further treatment. Consistent with the backcross design, pairwise comparisons were made between specific stages of the experiment and the MHco3(ISE) susceptible parent. We hypothesised that where backcross populations were not subjected to further ivermectin treatment ( Fig. 1 4 .noIVM), there would be relatively little genetic differentiation throughout the majority of the genome between the defined population and MHco3(ISE). By contrast, where populations were subjected to further ivermectin treatment, we hypothesised that genetic differentiation with MHco3(ISE) should increase close to any resistant allele, with high levels of differentiation in regions of the genome linked to an ivermectin resistance-conferring locus and low levels of differentiation elsewhere.
As hypothesised, little genetic differentiation between the post-backcross, no-selection population and the susceptible parental population was observed in either cross ( Fig. 3; MHco3/10.BC 4 .noIVM and MHco3/4.BC 4 .noIVM). After backcrossing, in the absence of selection, genetic material from the susceptible parent should comprise approximately 97% of the population due to the repeated backcrossing with MHco3(ISE) (resistant alleles comprise half of the initial cross population and reduce by a half with each backcross), making them largely indistinguishable from the susceptible parent. Selection increases the fraction of alleles derived from the resistant parent, but is limited by the backcross design; no more than 50% of the genetic material at any location may originate from the resistant parent. Upon further selection (  explicitly, we determined the correlation between change in F ST per each genomic window and progression through the cross; only the chromosome V region demonstrated progressive genetic differentiation over time (Additional file 4: Figure S4A,B), which was highly correlated between the two crosses (Additional file 4: Figure S4C). Collectively, these data suggest that there is a single introgression region on chromosome V representing a major QTL that is under selection upon drug exposure, and that this region is under selection in the two independent serial backcrosses performed. This conclusion was supported by an independent evolutionary analysis, which compared the observed frequencies of variants associated with the resistant parental population with the distribution of allele frequencies expected under the assumption of selective neutrality (Additional file 5: Figure S6). An over-representation of alleles (71 and 91% of the total number of such alleles for the MHco3/10 (Additional file 5: Figure S6A) and MHco3/4 (Additional file 5: Figure S6B) crosses, respectively) with atypical frequencies towards the right hand end of chromosome V were strongly inconsistent with the neutral expectation, suggesting that they had increased in frequency under the influence of linked selection (Fig. 3b).

Characterisation of the introgression region located on chromosome V
A comparison of genetic diversity on chromosome V between MHco3(ISE) and the end-point of both crosses (BC 4 .IVM.P 4 ) revealed a strikingly similar pattern between the crosses, with a major region of differentiation spanning 37-42 Mbp, and a second but lesser increase in genetic differentiation between 45 and 48 Mbp that is most prominent in the MHco3/10 population (Fig. 4a). Encouragingly, the Hcms8a20 microsatellite marker (located at position 36.16 Mbp along chromosome V) that was shown to be genetically linked to ivermectin resistance in the preliminary analysis of these backcrosses [39] lies adjacent to the QTL region, suggesting that while it is strongly linked to resistance, it is unlikely to be completely linked with a locus directly causing this phenotype. The F ST data are supported by a comparison of Tajima's D throughout this region ( Fig. 4b; top plots). Although variation in Tajima's D was observed throughout Fig. 3 Genome-wide analysis of ivermectin associated loci. a. Pairwise genetic diversity throughout the MHco3/10 (left plots) and MHco3/4 (right plots) backcrosses. In both crosses and at each point in the backcross, the experimental population was backcrossed against the MHco3(ISE) parental populationthe data are therefore presented to compare the genetic diversity (F ST measure in 10 kbp windows using popoolation2) between MHco3(ISE) and each sampled time point in the cross as per the cross scheme in Fig. 1. b. Output of the single-locus evolutionary model, describing sites which were inconsistent with a model of neutral evolution, measured using a likelihood threshold. Note that the X chromosome is not represented. Colours represent the chromosomal scaffolds as described in Fig. 2a the genome in both crosses (Additional file 6: Figure S7), the greatest variation between the susceptible parent and backcrossed lines was observed at approximately 39-40 Mbp ( Fig. 4b; bottom plots), which was associated with significantly negative Tajima's D in the backcrossed lines consistent with strong positive selection. In contrast to  4 .IVM.P 4 ; yellow) of the crosses (top panels). Tajima's D was calculated using npstat in 100 kbp windows spanning the genome. To emphasise the difference between the susceptible MHco3(ISE) and resistant passages 3 and 4 samples, we calculated the variance in the mean value of Tajima's D between samples, which is presented as red smoothed line (bottom panel). The null expectation is that variance between these samples will be low in regions of the genome not under selection. c. Inferred likelihoods from the multi-locus population genetic model. The mean likelihood of the best fitting single-locus model at each locus position is shown by the solid black line; the light blue interval around this line shows the 5-95% confidence interval for this statistic calculated by a bootstrapping process. The position of the maximum likelihood value is shown by the vertical black dotted line; a confidence interval for this position, calculated from the bootstrapping values, is shaded in grey. The mean likelihood of the best fitting constrained two-locus model is shown by the black dashed line; the value shown represents the best value of the model given that one of the selected loci is at the given position the F ST data, analysis using Tajima's D suggested a single peak, corresponding to a single site or region under positive selection.
A total of 74,424 variants (4.71% of the total variants found on chromosome V) were found to be segregating within the 37-42 Mbp region, of which 14,009 and 12,827 variants presented a signal of differentiation (greater than five standard deviations from the genome-wide mean P value from the Fisher's exact test) between the susceptible and CAVR and WRS lines, respectively, due to variable degrees of linked selection with the causative allele. As the genome assembly used in this study was not annotated, it was difficult to prioritise this variant list further in the context of their impact on coding sequences. However, many candidate genes have been proposed in the literature as being in association with ivermectin resistance; to determine their possible role in driving resistance here, we curated a list of genes that have been explored in H. contortus, and/or had been shown to confer ivermectin resistance when mutated in C. elegans. We determined the location of these genes in the current assembly, either via mapping the published gene models from the V1 annotation, or determining the closest H. contortus orthologous gene from C. elegans candidates using Wormbase Parasite [47] (Additional file 7: Table S2; Additional file 8: Figure S9). At least one candidate gene was located in each chromosome; in chromosome V, the location of six candidate genes were determined. However, none of these genes were found in the main introgression region defined by the F ST analysis ( Fig. 4a; vertical annotated lines). Three genes -Hco-lgc-55, Hco-avr-15, and Hco-pgp-1(9), the latter two of which have a strong association with ivermectin resistancelie to the telomere-side of the introgression region, however, they are located on the periphery of the F ST peaks rather than within them, suggesting that they are unlikely candidates to be under direct selection. Considering the absence of candidate resistance genes in the F ST peaks, we conclude that the driver(s) of ivermectin resistance in the MHco4(CAVR) and MHco4(WRS) populations are novel and have not been previously described in association with ivermectin resistance.
A multi-locus population genetic model provided clarity on the extent to which the location of the selected allele could be determined by our data. In contrast to F ST and Tajima's D, this model explicitly considered linkage between variant alleles arising from their common parental origin. Putative segregating sites in chromosome V between the resistant and susceptible parental populations were identified: A total of 474 such sites were identified in the MHco3/10 dataset, with 157 sites in the MHco3/4 dataset. Calculations were then performed upon data from these loci. The maximum likelihood of the allele under selection was in a broadly consistent location in each experiment, being inferred to exist at 40. 10 (Fig. 4c). These broad confidence intervals were explained by the population structures inferred by the model, which explicitly considered the location of crossover recombination events between the parental haplotypes throughout the experiment. Example structures generated using the maximum likelihood parameters for each population show how selection for resistance drives the accumulation of genetic material from the resistant parent through the course of the experiment (Fig. 5); these outputs may be contrasted with equivalent data generated under a model of selective neutrality (Additional file 9: Figure S10). Across 250 replicates, a mean of 177. An extended multi-locus model incorporating selection on alleles at two loci did not find evidence for selection at more than one allele; despite the second peak of differentiation found at approximately 45 Mbp in the F ST analysis, evidence for a second site was not identified. The identification of independent selected alleles requires them to be separated in the genome by recombination; on the basis of our knowledge of recombination in this system we searched for alleles located at least 2 Mbp apart in the genome. Under this constraint, the best fitting model identified variants under selection at the positions 38.6 Mbp and 45.7 Mbp in the MHco3/ 10 data (Additional file 10: Figure S8A), and at the positions 41.2 Mbp and 44.8 Mbp in the MHco3/4 data (Additional file 10: Figure S8B). However, these models fit the data less well than the single-driver model described above (Fig. 4c), favouring an explanation in which only one variant is under selection. We note that the single-driver model is as a special case of the two-driver model, for which both drivers are in precisely the same location and the selective effect of one of the drivers is set to zero; removing the constraint that the alleles be separated would result in a fit to data at least as good as the single-driver model. In conclusion, this analysis cannot exclude the presence of more than one drug-resistance variant in the identified region of chromosome V, but provides no evidence to support the existence of a second selected allele.

Strength of selection
We further used the multi-locus single driver model to infer the strength and manner of selection in favour of the drug resistance allele. Data in each case indicated strong selection for the resistant allele (Fig. 6), such that susceptible worms produce multiple times fewer offspring than the resistant worms under drug treatment. In each case the value of the dominance coefficient, h, was less than one, indicating either an additive effect for the MHco10(CAVR) case, whereby each copy of the  resistance allele contributes an equal amount of resistance, or weak dominance for the MHco4(WRS) data, whereby the first resistance allele contributes slightly more than the second. This is consistent with the phenotypic characterisation of the initial crosses, whereby the F 1 individuals that would have been heterozygous for resistant alleles were partially resistant to ivermectin [39]. As such, having two copies of the resistant allele had a greater phenotypic effect than one; these data therefore have implications for the mechanism of the drug resistance allele, and evolution of drug resistance in general terms, as discussed further below.

Discussion
Genetic mapping identifies a major ivermectin resistance QTL in two independent H. contortus populations from different continents We have analysed whole genome sequence data collected from the most recent phase of a multi-generational crossing experiment conducted in populations of the parasitic worm H. contortus. Our analysis unequivocally identified a single region of introgression within chromosome V that contains at least one major ivermectin resistance allele. This major ivermectin resistance QTL is common to two independent ivermectin resistant populations -MHco10(CAVR) and MHco4(WRS)that were originally isolated from Australia and Africa, respectively. This genomic region was clearly distinguished in the experimental data using different population genetic methods. Although this region in which we infer a selected allele to exist is relatively large (Conserva-tively~5 Mbp from 37 to 42 Mbp, comprising 1.73% of the genome), each of our analytical approaches consistently identified the same region under selection, with no other comparable regions of introgression elsewhere in the genome in either of the two backcross experiments. The annotation of the H. contortus genome is ongoing, and hence we can only make a best guess estimate of the number of putative genes in this region: we predict approximately 20,000 genes in the 279 Mbp genome, which would correspond to almost 360 genes in the introgression region (~72 genes/Mbp). Considering the high diversity between parental populations, and limited recombination in this window, it is difficult to reduce this number of genes further. However, our data suggests this introgression region is the most important ivermectin resistance QTL in the MHco10(CAVR) and MHco4(WRS) populations.
There has been much discussion and debate regarding the complexity of the genetic basis of ivermectin resistance, both in terms of the number of loci involved and the extent of geographical variation [22,32,33]. Our results, however, strongly suggest a single locus (or potentially multiple closely linked loci) is likely to be the major effector of ivermectin resistance in these two populations. We acknowledge that we cannot discount previously described candidate genes as mediators of resistance in other populations of H. contortus, or in other nematode species. Moreover, our data does not exclude a driver of transcriptional regulation within the introgression region that is under selection, which itself influences expression of other genes, including candidate genes, outside of the introgression region. However, most of the leading published candidate ivermectin resistance genes including Hco-glc-5, Hco-avr-14, Hco-lgc-37, Hco-pgp-2 and Hco-dyf-7 were not located in or near the QTL, which is consistent with a recent study using targeted sequencing of these genes in which none showed a signal of introgression in these two backcross experiments [31]. This suggests that none of these candidate genes show evidence of selection in response to ivermectin treatment in the MHco10(CAVR) and MHco4(WRS) populations. Although we identified three other previously described candidate genes -Hco-lgc-55, Hco-avr-15, and Hco-pgp-1(9)adjacent to the introgression region, none were found within the peak of the region suggesting that these also are unlikely to be major direct targets of ivermectin-mediated selection. This is in contrast to a recent genome-wide analysis of multi-drug resistance of a field population of Teladorsagia circumcincta, a gastrointestinal nematode of sheep and goats, in which a copy number variant of Tc-pgp-9 was expanded in an ivermectin resistant isolate. Additionally, Tc-lgc-55 was identified in a region of high differentiation between resistant and susceptible individuals [44]. Indeed, a key conclusion of this earlier work was that ivermectin resistance was likely to be highly multigenic. Given the fragmented nature of the T. circumcincta assembly, it is intriguing to speculate whether many of the signals observed in the earlier study, including one or both of these genes, are not direct mediators of resistance, but rather show evidence of selection due to being linked to nearby driver mutation, as is likely the case in H. contortus shown here.
What might account for the large number of candidate genes previously suggested to be associated with ivermectin resistance? The answer might lie in our direct comparison of genome-wide diversity between the parental populations (Fig. 2d). The extent of genetic divergence between these populations is striking, and makes it easier to understand how particular sequence polymorphisms might be naively attributed to being associated with resistance, when in fact they simply represent genetic variation that occurs as a result of the independent evolutionary histories and lack of interbreeding between the populations. Our results highlight the challenge of interpreting simple direct comparison of genetically distinct populations, even with genomic approaches, when trying to disentangle those genetic polymorphisms underlying resistance from a complex background of genetic variation independent from the resistance phenotype of individuals from a given population. The use of controlled genetic crosses as presented here and elsewhere [31,39,44], whereby population genetic structure can be explicitly accounted for in the analysis, provides a powerful way to mitigate the challenge of discriminating resistance-causing alleles from background genetic variation.

Population genetic and evolutionary modelling defined the boundaries of the ivermectin resistance QTL
Previous research has highlighted the value of novel methods in population genetics for analysing experimental cross populations [48]. In order to gain the maximum possible insight into the data collected, we extended previous statistical work [49,50] to account for stochasticity in the experiment. Specifically, the random location of recombination events, in the context of strong selection, and genetic drift imposed by population bottlenecks in the experimental design, leads to potential variation in the final outcome of the experiment. With prior knowledge of the recombination rate [21], we used evolutionary simulations reproducing the experimental design to explore the range of experimental outcomes arising under multiple scenarios of selection; this allowed for a direct inference of evolutionary parameters.
Inference of selection from the multi-locus model suggested the presence of stronger selection for the drug resistance allele in the MHco3/10 cross than in the MHco3/4 cross (Fig. 6). Further, although the data suggests that resistance is at least additive in both crosses, that is, heterozygotes are likely to confer some resistance and increased resistance is achieved by having two resistant alleles, there is some evidence for dominance in the MHco4(WRS) population, whereby the second resistant allele confers a lesser fitness advantage. A previous study reported resistance to be a dominant trait in the CAVR population [36] on the basis that no difference was observed in the resistance levels of heterozygote and homozygote resistance worms. Our finding of additivity arises from the evolution of the worm populations during passage following the backcross experiment (i.e., in BC 4 .IVM.P 3/4 samples versus the BC 4 .IVM data), and particularly the fixation of alleles from the resistant parent at putative segregating sites near the inferred position of resistance. In our model, alleles can reach fixation only where there is selection in favour of the homozygous resistant type over and above that for heterozygote resistant individuals.
Our ability to resolve the precise location of the variant under selection was limited by the number of recombination events in the worm population. Precise identification of the location of an allele under selection requires that the allele not be linked to other nearby alleles in at least some fraction of individuals in the population. While the inherent biological recombination rate here has a role, population bottlenecks in each round of the cross induced by the use of only a limited number of individuals -50-100 male and female worms per cross generationreduced our ability to resolve the region of introgression further toward a single causative gene or variant. These bottlenecks were due to the limited number of L 4 worms that it was possible to collect from each sheep and transplant in the successive generation of the cross. Genetic drift induced by successive population bottlenecks introduced considerable uncertainty in the outcome of the experiment, such that replicate sets of allele frequencies from our model showed considerable differences between each other (Additional file 11: Figure S5). This implies that the data from the experiment itself should be understood as the output of a stochastic process; beyond the clear large-scale patterns observable in the data and detailed above, more minor details of the output might not be seen again were the experiment to be repeated. The structure of the experiment thus imposes a limit on our ability to infer the location of a selected allele; an improved characterisation of selection would likely best be achieved by conducting further generations of cross as performed within this experimental framework to induce more recombination events, reducing the mean size of parental genomic blocks. However, such a course of action would be limited in scope were it not accompanied by the use of larger L 4 populations, for example by the simultaneous passage of the population through multiple animals. Clearly, a larger or longer experiment would have cost and welfare implications; the statistical framework we have developed would help to design and justify such an approach.
We note the importance of linkage between sites in the analysis of data from genetic cross populations. Whereas standard metrics such as F ST and Tajima's D do not explicitly account for linkage between sites, we have here implemented methods which explicitly account for the genomic structure arising from the history of the cross population, including the stochasticity that exists in that structure due to genetic drift and random recombination. While standard metrics can identify sites of maximum differentiation in a population with great precision, neglect of the inherent stochasticity in the outcome of an experiment can lead to an overconfidence in the extent to which they provide an accurate diagnosis of the causative variant of selection.

Importance of chromosome-level genome assemblies for genetic mapping and population genomics
The success of identifying a single region of introgression was dependent upon an improved chromosomal-scale reference genome assembly for the H. contortus isolate, MHco3(ISE).N1. This isolate was used in the original draft assembly of this species [19], which was derived via inbreeding of the same MHco3(ISE) population used in the backcrosses presented. Without a contiguous chromosomal-scale genome assembly, interpretation of these type of analyses can be extremely challenging; true genetic signal(s) can be obscured by technical artefacts associated with a fragmented assembly, such as short contigs lacking spatial orientation, multiple haplotypes present, collapsed paralogs, and poor resolution of repeat structures. As a consequence, both read mapping and variant calling in fragmented genomes can be suboptimal. These technical challenges are exacerbated in highly polymorphic species such as H. contortus and T. circumcincta [44], for which reference genomes were constructed from pools of individuals that, despite efforts to reduce genetic variation by inbreeding, were highly polymorphic. Perhaps the biggest problem with using a fragmented genome assembly for genome-wide analyses is the inability to determine linkage. Signals of selection will often be dispersed across multiple scaffolds of a fragmented draft genome assembly when in reality they are adjacent to each other in a single genomic region. This can potentially can give rise to the erroneous conclusion that multiple signals of selection are present, when in fact only a single selected locus exists. We illustrate these selection artefacts by presenting the sequence data we have generated here on different versions of the MHco3(ISE).N1 reference genome assembly (Fig. 7). In the fragmented draft genome assemblies, the signals of genetic differentiation between susceptible and resistant populations were numerous and dispersed across many assembly scaffolds, suggesting that many discrete regions of the genome may be under selection ( Fig. 7; top plot). However, only after significant improvement in chromosome contiguity does the introgression region on chromosome V become evident (Fig. 7; bottom plot). Genome contiguity (or lack of ) thus will be an increasingly important factor in the interpretation of analyses exploiting unfinished draft reference genomes of non-model species, particularly as whole genome sequencing becomes cheaper and more widely adopted.
The extensive genome-wide diversity between parental populations re-emphasises that H. contortus is highly genetically diverse both within and between populations. It is clear, however, that short read mapping-based analyses exploiting a single reference genome will underestimate this diversity. In this study, we have used only a subset of the total variation present, i.e., only single nucleotide polymorphisms, to characterise the introgression; however, understanding the functional consequences of population-specific diversity will rely on a more comprehensive description of all of the variation that defines a population. As an alternative to a linear reference genome, the use of population genome graphsa non-linear or branching reference that contains alternate paths representing known genetic variationmay be more suitable [51], and allow better characterisation of known variation that is too complex to be analysed using a linear reference [52]. This may be possible once a comprehensive analysis of population and perhaps global genetic diversity is made available [53]. Alternatively, de novo reference sequences from geographically diverse isolates may be required, ideally sequenced and assembled using long-read sequencing technology, to allow a more comprehensive description of genetic variation within a population while allowing large scale variation between reference populations to be characterised.

Conclusion
Our genome-wide analysis of two genetic crosses between ivermectin resistant and ivermectin sensitive isolates has identified a major ivermectin resistance QTL shared between two genetically and geographically distinct populations of Haemonchus contortus. Traditional population genetic analyses highlighted the extensive genetic variation both between and within the parental populations used to construct the cross, potentially explaining why candidate genes identified using approaches that did not control for this genetic diversity have not been validated, and by using novel population genetic methods, we could better account for the crossing procedure and its impact on the outcome of the experiment. This work represents the most comprehensive analysis of the genetics of anthelmintic resistance in a parasitic nematode to date, and demonstrates the power of genetic crossing and a contiguous genome assembly to eliminate false positive genetic signals typically linked to resistance. Importantly, these data show that many of the previously proposed candidate genes are not involved in ivermectin resistance in these isolates, and should focus future efforts on identifying the causal variant within the QTL we identify.

Background of H. contortus populations and the original backcross
The MHco3(ISE), MHco4(WRS) and MHco10(CAVR) are H. contortus populations maintained and stored at the Moredun Institute. MHco3(ISE) was originally derived by multiple rounds of inbreeding of the SE population [46]. The precise provenance of SE is not clearly recorded but is thought to be originally obtained from East Africa. MHco10(CAVR) is derived from the Chiswick Avermectin Resistant (CAVR) strain, an ivermectin resistant field population originally isolated in Australia [54]. MHco4(WRS) is derived from the ivermectin resistant White River Strain (WRS) originally isolated from South Africa [14].
The experiment described here extends two previouslydescribed backcross experiments. The construction, phenotypic validation, and initial microsatellite analysis of these backcrosses has been described previously [39], and is outlined in Fig. 1a and b. Briefly, two independent crosses were performed in parallel: (i) MHco3(ISE) x MHco10(CAVR), and (ii) MHco3(ISE) x MHco4(WRS). In the first generation of each cross, female worms of the ivermectin resistant parental populations [MHco10(-CAVR) or MHco4(WRS)] were crossed with male MHco3(ISE) worms, to generate lines designated MHco3/ 4 and MHco3/10, respectively (Fig. 1a). After the first cross, the F 1 generation females were mated to male MHco3(ISE), resulting in backcross generations designated MHco3/10.BC n and MHco3/4.BC n , where n is the number of backcross generations (BC). This involved the recovery of immature worms from the abomasum at day 14 post infection by necropsy of donor sheep; worms were washed in physiological saline (0.85% NaCl) after which their sex was determined [55], followed by surgical transfer of 45-100 male MHco3(ISE) and 50-100 female F 1 L 4 /immature adult worms into worm-free recipient lambs all within 2 h of the original collection. This process of backcrossing was repeated for a total of 4 generations, i.e., MHco3/10.BC 4 and MHco3/4.BC 4 (Fig. 1b). In vivo ivermectin selection was applied after mating and before collection of eggs to enrich for ivermectin resistant adults to be used in the subsequent round of the backcross. A controlled efficacy test was performed on the three parental-, MHco3/4.BC 4 , and MHco3/10.BC 4 populations to determine the initial levels of ivermectin efficacy, and the degree of resistance acquired in the introgressed lines. After four rounds of backcrossing, a selection experiment was performed. MHco3/10.BC 4 and MHco3/4.BC 4 populations were used to infect recipient sheep, after which eggs were collected and in vitro culture to L 3 . These infective larvae were used to infect two sheep per population, one that received 0.1 mg/kg ivermectin and one that remained drug naive. At 7 days post treatment, treated and naive L 4 /immature adults were recovered by necropsy and stored for molecular analysis.
The backcrosses described above were extended by performing a further four rounds of in vivo ivermectin selection, during which mating within the BC 4 population continued (Fig. 1c). For each cross, progeny of the BC 4 generation were cultured to L 3 and used to infect a donor sheep, which was subsequently treated with ivermectin (0.1 mg/kg in round 1, followed by 0.2 mg/kg in following rounds). Eggs from ivermectin-treated survivor adults were collected post-treatment, cultured to L 3 , and used to infect a new donor sheep. L 4 /immature adults were collected by necropsy after rounds three (i.e., BC 4 .IVM.P 3 ) and four (i.e., BC 4 .IVM.P 4 ) of selection and stored for molecular analysis.
The sheep used in this study were born and raised indoors at the Moredun Research Institute under worm free conditions. For those recipient sheep requiring surgery: anaesthesia was induced either by intravenous thiopentone injection, or using a halothane and oxygen mask. The sheep were then intubated and anaesthesia maintained using halothane and oxygen. Sheep were routinely injected with a non steriodal anti-inflammatory agent (1 mg/kg meloxicam; Metacam 20 mg/ml solution for injection; Boehringer Ingelheim) and antibiotic (7 mg/kg amoxicillin/1.75 mg/kg clavulanic acid, Synulox ready-to-use injection; Pfizer) and closely monitored on completion of the surgery. No adverse effects were noted. Sheep were humanely killed under schedule 1 to the Animals (Scientific Procedures) Act. . Genomic DNA from each pooled sample was prepared using a phenol chloroform extraction protocol as previously described. Sequencing libraries were prepared using a PCR-free protocol [56], and sequenced as described in Additional file 12: Table S1. We generated approximately 6.14 × 10 11 bp of sequence data, which equates to approximately 199.65× unmapped genome coverage per sample. Raw sequence data quality was analysed using FASTQC (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/) and was assessed using MultiQC [57] prior to further processing.
Copy number variation between parental strains was determined using cnv-seq [63]. Best read mapping hits were determined per bam (samtools-1.3 view -F 4 bam | perl -lane 'print "$F [2]\t$F [3]"'> bam.hits), before read counts for pairwise comparisons determined in 10 kbp sliding windows using cnv-seq.pl. The R package cnv (v 0.2-8) was used to determine pairwise chromosome-wide normalised log 2 ratios. Characterisation of structural variation in the parental populations was performed using the speedseq sv pipeline [64] to identify putative duplications, deletions and inversions in each population. This approach exploits all reads mapped using BWA-MEM, including split and discordant read pairs, which are subsequently scored using LUMPY and SVTyper. Conservative filtering was applied to retain only homozygous variants (1/1) with a minimum quality score of 100.

Population genetic modelling
Bespoke single-locus and multi-locus models were used for population genetic inference from the data. Common to both methods, variants in the genome potentially associated with resistance were identified. At each locus in the genome, a nucleotide {A,C,G,T} was defined as existing in the susceptible parental population if it was observed at a frequency of at least 1% in that population. Loci in the resistant parental population were then identified for which exactly one nucleotide that did not exist in the susceptible population was observed at a frequency of at least 1%; without prejudice as to its phenotypic effect, this was denoted the 'resistant' allele at that locus, being associated with the resistant parental population.

Single-locus model
A single-locus population genetic model was then used to identify variants with frequencies that were inconsistent with selective neutrality. The neutral expectation was calculated using a Wright-Fisher evolutionary model to simulate the progress of a variant allele, modelling the probability distribution of the frequencies of homozygous resistant (q 1 ), heterozygous (q h ), and homozygous susceptible (q 0 ) individuals throughout the course of the experiment under the assumption of selective neutrality. The Wright-Fisher model uses a simple multinomial process for propagation; if the next generation of the population contains N individuals, the probability of obtaining n 1 , n h , and n 0 individuals with each diploid variant is given by: At the stages in the experiment where the population was comprised of eggs, we assumed N to be large, such that the processes of crossing and backcrossing could be described deterministically. Under these circumstances the genotype frequencies following a cross are given by: Where a prime denotes the frequency in the next generation. Similarly, the genotype frequencies following a backcross with the homozygous susceptible population are given by: Evaluating this process gives a probability distribution for the frequency of the resistant allele at any given point in the experiment, dependent upon the number of resistant alleles in the initial resistant population. This number of resistant alleles, which we denote n r , is unknown; sequence data from the resistant parental population were used to generate a prior distribution for this value.
We first suppose that the frequency of the resistant allele in the resistant parental population is equal to some value, p r . Given that the initial experimental population contains 50 resistant worms, with 100 alleles, collected from this larger population, the distribution of n r . is then given by: We now suppose that sequencing the resistant parental population gave a r resistant alleles at the locus in question out of a total read depth of A r . The initial frequency p r is unknown; however, using the data an estimate can be made for this statistic, expressed as a distribution of the allele frequency. Specifically, under the assumption of a uniform prior, the underlying probability p r can be said to be distributed as a beta distribution with parameters a r and A ra r + 1. We therefore obtain the following distribution for the statistic n r : for any value of j between 0 and 100. Values of this distribution were calculated using numerical integration. This process generated the neutral expectation of the resistant allele frequency conditional on the observation of this frequency in the resistant population. If at the sampling point t, a total of N t worms were collected, the distribution of the number of worms n t with the resistant allele at that point is given by: for all 0 ≤ k ≤ N t . The extent to which observed allele frequencies were consistent with the neutral model was calculated using a likelihood model: where at time t, a t resistant alleles from a read depth of A t were observed. A low likelihood L t indicates deviation from the neutral expectation; Bonferroni correction was used to identify significance at the 95% level. Data from each sample were analysed; four-fold non-neutral sites, being loci for which a significant likelihood was calculated from all four samples collected throughout each cross experiment, were identified.

Multi-locus model
A multi-locus model was developed to describe the manner in which allele frequencies would be expected to change over the course of the experiment; this model exploits information about the location of variant loci provided in the H. contortus reference genome, and the genome-wide map of recombination rate in the worms [21]; this map was inferred by characterising recombination breakpoints in F 1 L 3 progeny by whole genome sequencing, from which recombination rates were determined by comparing genetic distances between SNPs against their physical location in the genome. Where fixed genetic differences occur between the parental populations, the dynamics of adaptation in the resulting cross are relatively straightforward [48,50]; this has been exploited to identify quantitative trait loci in yeast and malaria cross populations [49,65]. Here a heuristic approach was used to identify fixed genetic differences between the parental populations before modelling evolution at these loci to identify the location of alleles conveying drug resistance.
Three filters were used to identify putatively fixed differences between parental populations. Firstly, we identified genomic sites for which the frequency of the resistant allele was 95% or greater in the resistant parent, requiring a read depth of at least 50x coverage for such sites to achieve a good level of statistical certainty. Secondly, we noted that, following the backcross performed at the start of the experiment, no locus can be homozygous for the resistant allele and as such, the resistant allele frequency can be no greater than 50% in the population; by way of reducing noise, we removed any locus with resistant alleles at a frequency of 60% or greater in the MHco3/ 10.BC 4 .noIVM sequence data. Thirdly, we noted that at homozygous separating sites in a population following a cross, the allele frequency will change smoothly over time; where there are N genomes in the population, the allele frequency, considered as a function of chromosome position, will change by 1/N wherever a crossover recombination event occurs within a genome. A diffusion model of allele frequency change, described in a previous publication [49], was used to identify allele frequencies across the genome that were consistent with this pattern. This analysis fits a posterior distribution to the allele frequencies across the genome, also inferring the extent of noise in the sequence data via a beta-binomial model intrinsic to the fitting process. Conservatively, sites no more than eight standard deviations from the mean of the posterior frequencies were included in the analysis. These approaches resulted in data from a total of 1368 loci in the MHco10(CAVR) dataset, and from 219 loci in the MHco4(WRS) dataset, were retained; the full set of allele frequencies are shown in Additional file 11: Figure S5.
Considering these data, we used an individual-based Wright-Fisher simulation to model the outcome of the experiment, taking into account the backcrossing, selection, and bottlenecking according to the experimental design. This approach accounted for the stochastic nature of allele frequency changes due to the effect of repeated population bottlenecking. We suppose that selection acts in favour of the resistance allele at a given locus when worms are in a sheep being treated with ivermectin, such that the fitness of individuals that are homozygous for the resistant allele (w 1 ), heterozygous for the resistant allele (w h ), or homozygous for susceptible alleles (w 0 ) are given by: where s is the selection coefficient (fitness advantage of the homozygous resistant compared to homozygous susceptible) and h is the dominance coefficient (proportion of fitness change conveyed by a single copy of the allele). We assumed that all worms have equal fitness in the absence of the drug.
Our model is defined in terms of the manner in which selection acts upon the population, by the location in the genome of the selected allele, and the extent to which that allele conveys a fitness advantage to worms in the presence of drug treatment, this fitness advantage being characterised in terms of s and h. The likelihood of any given model was calculated using the beta-binomial likelihood function obtained from the diffusion process used above. Accounting for the stochasticity of the system, a set of at least 250 simulations were generated for each set of parameters, calculating the mean likelihood fit to the data across the simulations. A bootstrapping process was applied to quantify the uncertainty in this likelihood. Parameters giving the maximum likelihood fit to the data were identified.
Confidence intervals for the maximum likelihood locations of the selected allele were calculated. At the maximum likelihood position, we took the 250 simulations giving the maximum mean likelihood, calculating the mean of their likelihoods and the standard deviation of these values. We defined a threshold likelihood as the mean minus one standard deviation. For other allele positions, we also identified the parameters giving the maximum mean likelihood, constrained by allele position; we calculated a test likelihood equal to the mean plus one standard deviation of the replicate likelihoods generated by these parameters. Confidence intervals were then generated as the sites nearest the maximum likelihood position for which the test likelihood was lower than the threshold likelihood. We believe this gives a conservative estimate of the uncertainty in the location of the selected allele, given the stochasticity inherent in the experiment.
In an extension to this model, a two-driver scenario was considered, in which the resistant variant at each of two loci was under selection; to allow the exploration of model space in this case within reasonable computational time the assumption was made of additive selection at each locus. To distinguish this from the single-driver model, a requirement was imposed that the selected alleles be separated by at least 2 Mbp in the genome; this distance reflects the extent to which the process of recombination during the experiment allows the selective effects of distinct alleles to be observed.