A major locus for ivermectin resistance in a parasitic nematode

Infections with helminths cause an enormous disease burden in humans, livestock, and companion animals. Helminth control is heavily dependent on anthelmintic drugs, but resistance to these drugs is widespread, and the genetic determinants of resistance remain unresolved. Here, we build on two previous genetic crosses between ivermectin resistant and sensitive isolates of the parasitic nematode Haemonchus contortus, an economically important gastrointestinal parasite of small ruminants and a model for anthelmintic research. By combining genetic mapping and genome resequencing with population genetic modelling, we identify the first single genomic locus conclusively associated with ivermectin resistance in a parasitic nematode. This locus is common between two geographically and genetically divergent strains and does not include the leading candidate genes. This work is the first example of a major anthelmintic resistance locus to be identified by genetic mapping and represents the most comprehensive genetic analysis of this trait in a parasitic helminth to date.


Introduction
Parasitic worms, commonly termed helminths, are extremely diverse and frequently responsible for significant morbidity and mortality in their hosts. Control of those of human and veterinary importance is heavily dependent on the large-scale administration of anthelmintic drugs; for instance, the macrocyclic lactone ivermectin has been extremely successful in the control of a number of helminths in both humans and animals (Omura and Crump, 2004) . The importance of the discovery and development of this drug as an anthelmintic was recently recognised by the award of the 2015 Nobel Prize for Medicine or Physiology. However, these successes are now threatened by the emergence of drug resistance. In many species of parasitic nematodes of livestock, anthelmintic drug resistance is already a major problem for global agricultural production and animal welfare and its rapid acquisition has been documented against single and multiple drug classes (Kaplan and Vidyashankar, 2012;Rose et al., 2015) . There are growing concerns regarding the reduced efficacy of compounds used in mass drug administration (MDA) programs for some human-infective helminths, which could drive the emergence of resistance (Crellen et al., 2016;Doyle et al., 2017;Geerts and Gryseels, 2000;Osei-Atweneboana et al., 2011) . Consequently, anthelmintic drug resistance is an emerging threat to the sustainable control of human parasites, particularly those targeted by MDA programs. However, in spite of the importance of this issue, remarkably little is known regarding the molecular mechanisms of resistance to most anthelmintic drug groups, with the notable exception of the benzimidazole class. This is, at least in part, due to a lack of genomic resources, tools and techniques with which to study these experimentally challenging organisms.
Haemonchus contortus is a globally distributed parasite of wild and domesticated ruminants that has a major impact on the health and economic productivity of sheep and goats. Resistance of this parasite to almost all of the classes of anthelmintic drugs has been documented in multiple regions of the world (Yadav et al. 1995;Jabbar et al. 2013;Echevarria and Trindade 1989;Van den Brom et al. 2015;Howell et al. 2008) . Isolates of H. contortus that are resistant to multiple classes simultaneously are common and resistance can occur within a just a few years of introduction of a new drug class (Williamson et al., 2011;van Wyk and Malan, 1988) .
Partly for these reasons, H. contortus has emerged as a model parasitic nematode for anthelmintic resistance as well as drug and vaccine discovery research (Geary, 2016;Gilleard, 2013;Nisbet et al., 2016) . Its utility as a model is largely due to a greater amenability to experimentation than most parasitic nematodes; it is possible to establish and maintain isolates in vivo in the natural host, perform genetic crosses in vivo , and undertake in vitro culture for at least part of its life cycle, allowing drug assays and genetic manipulation such as RNAi to be performed (Britton et al., 2016) .
The ability to utilise these molecular approaches is complemented by extensive information about the structure of the genome and transcriptional differentiation of the major life stages (Doyle et al., 2018;Laing et al., 2013;Schwarz et al., 2013) .
Genetic characterisation of drug resistance in parasitic helminths will enable sensitive and specific tools to be developed for research, surveillance and clinical diagnostics to support sustainable parasite management programs. Understanding the mechanisms of resistance to current drugs should also provide important information to help direct the development of new drugs (Hefnawy et al. 2017;Cowell et al. 2018) . However, the identification of genetic loci associated with drug resistance in these organisms remains a significant challenge. The only anthelmintic drug class to date for which the major resistance loci have been unequivocally identified is the benzimidazoles, where three mutations in the the isotype-1 ß-tubulin gene of H. contortus and several other trichostrongylid nematodes species are the major causal loci (Elard and Humbert, 1999;Kwa et al., 1994;Rufener et al., 2009;Silvestre and Cabaret, 2002) . In the case of other anthelmintic drug classes, the genetic mediators of resistance remain obscure; for example, mutagenesis experiments in the free-living nematode Caenorhabditis elegans have identified many genetic loci capable of conferring low levels of ivermectin resistance, however, very few mutations confer high levels of resistance (Hunt, 1995) . Concurrent mutation of three genes, Ce-avr-14 , Ce-avr-15 and Ce-glc-1, each encoding a glutamate-gated chloride channel (GluCl), confers a high level of resistance to ivermectin (Dent et al., 2000) . However, mutations in orthologous GluCl genes have not been associated with ivermectin resistance in parasitic nematodes. In the case of parasitic helminths, largely due to the lack of genomic resources, research into the genetic basis of ivermectin resistance has been dominated by the examination of small numbers of candidates genes (Kotze et al., 2014) . Based on their potential roles in the mechanism of action or efflux of drugs, candidate genes are generally examined for associations between sequence polymorphisms and resistance phenotypes. However, the extremely high levels of genetic diversity of H. contortus, together with the limited number of well characterised ivermectin resistant isolates, has provided a major challenge to such studies. The overall outcome of such candidate gene studies to date has provided at best circumstantial and inconsistent support for the involvement of any of the leading candidate genes in ivermectin resistance. It has also led to substantial debate regarding the extent to which ivermectin resistance is polygenic with the phenotype being primarily due to the additive effects of a large number of minor effect loci.
Genome-wide and genetic mapping approaches are emerging for H. contortus due to progress in genetic crossing methodologies (Sargison et al., 2018) and an increasing complement of genomic resources. Two serial backcrosses have previously been undertaken between the genome reference susceptible strain MHco3(ISE) and two geographically independent ivermectin resistant isolates, MHco4(WRS) and MHco10(CAVR) (Redman et al., 2012) ( Figure 1 A,B ). Using a panel of microsatellite markers, the marker Hcms8a20 confirmed that these backcrosses had successfully introgressed ivermectin resistance loci from the two resistant strains into the susceptible MHco3(ISE) genomic background. Subsequent genetic studies showed that none of the leading candidate genes from the literature -  evidence of introgression (Rezansoff et al., 2016) . The increasing accessibility to high throughput sequencing of non-model organisms, together with an increasingly high quality reference for H. contortus , offers an opportunity to characterise precisely the genome-wide evidence of introgression and selection surrounding the Hcms8a20 locus.

Hco
Here we build upon these two previous genetic crosses, extending them for further generations of passage with ivermectin selection ( Figure 1 C ), and generating whole genome sequencing data from four steps in the experiment. We analyse these data with traditional population genetic and novel statistical methods, and exploit a chromosome-scale genome assembly to map ivermectin resistance in the two genetically and geographically distinct strains of H. contortus . Previous research has highlighted the value of novel methods in population genetics for analysing experimental cross populations (Illingworth and Mustonen, 2013) . In order to gain the maximum possible insight into the data collected, we extended previous statistical work (Abkallo et al., 2017;Illingworth et al., 2012) to account for stochasticity in the experiment. Specifically, the random location of recombination events, in the context of strong selection, and genetic drift imposed by population bottlenecks in the experimental design, lead to potential variation in the final outcome of the experiment. With prior knowledge of the recombination rate (Doyle et al. 2018) , we used evolutionary simulations reproducing the experimental design to explore the range of experimental outcomes arising under multiple scenarios of selection; this allowed for a direct inference of evolutionary parameters. Together, our methods identify a single region of genomic introgression under selection for resistance. This work represents the most comprehensive analysis of the genetics of anthelmintic resistance in a parasitic nematode to date. Further, the study demonstrates the power of using a genetic crossing approach, in the context of a contiguous genome assembly, to eliminate false positive genetic signals typically linked to resistance.

Figure 1. Outline of the initial crosses, backcrosses, in vivo passage and selection experiments. A.
Ivermectin resistant strains MHco10(CAVR) or MHco4(WRS) ("resistant" haplotypes are depicted as red lines) were crossed with an ivermectin sensitive strain MHco3(ISE) ("susceptible" haplotypes as blue lines) to generate heterozygous F 1 progeny. F 1 eggs were cultured in vitro to L 3 , before maturation in vivo to L 4 /immature adults, from which females were used to initiate the backcross. B. The first round of backcross was performed by crossing heterozygous females from the initial cross with susceptible MHco3(ISE) males in vivo , resulting in F 2 progeny with reduced heterozygosity due to enrichment of MHco3(ISE) haplotypes. Resistance alleles were maintained in the backcross population by selection with ivermectin, before seeding the next round of the backcross with cross-derived heterozygous females and new susceptible MHco3(ISE) males. This process was repeated for four rounds of backcrossing, resulting in the backcrossed population becoming genetically similar to the MHco3(ISE) parental line in all regions of the genome not linked to ivermectin resistance. After four rounds of backcrossing, introgressed L 3 progeny were used to infect a new recipient sheep, resulting in segregation of susceptible and resistant alleles in both haplotypes among the progeny (mixed red/blue haplotypes). Eggs were cultured to L 3 before infecting two sheep, one exposed to ivermectin and one that did not receive drug. Post ivermectin treatment, L 4 /immature adults from both sheep were recovered on necropsy for sequencing. C. Post-backcross L 3 were further passaged into a worm-naive sheep and treated with ivermectin, after which eggs were recovered and cultured to infective L 3 for reinfection. This process was repeated for four generations of passage with selection (but without backcrossing). L 4 /immature adults were recovered after passage three and four for sequence analysis. Table S1. Sequencing datasets used in this study   Figure S1A ), consistent with our recent finding that the X chromosome contains as little as 10% as much genetic diversity relative to the autosomes (Doyle et al., 2018) . The difference between sensitive and resistant strains is perhaps not surprising given their history;

Genetic diversity within and between parental H. contortus strains
MHco3(ISE) was subject to multiple rounds for inbreeding during is original derivation as a laboratory strain (Roos et al. 2004) , whereas the resistant strains have not been subjected to experimental inbreeding and were isolated from outbred populations more recently.
We examined short-range haplotype diversity in each strain by measuring linkage disequilibrium (LD) between pairs of SNPs detected in paired reads ( Figure S1B ).
Although restricted by the Pool-seq design to pairwise SNP comparisons less than 500 bp apart, i.e., within a read pair, we nevertheless observed considerable decay in LD over this distance. Moreover, the rate of LD decay was correlated with nucleotide diversity, such that there was a greater loss of LD in the MHco4(WRS) and MHco10(CAVR) strains over the 500 bp, and a greater proportion of variants in LD in the less diverse, susceptible MHco3(ISE) strain.
To understand the extent of genetic diversity that is private to, and thus unlikely to be shared between, each population, we identified variant positions in the genome whereby the variant frequency was enriched in one population but not the other two populations. Of the ~7.6 million biallelic SNPs distributed in the genomes of the samples analysed, ~514 k were private to MHco3(ISE) ( Figure S2A ), ~960 k to MHco10(CAVR) ( Figure S2B ) and ~685 k to MHco4(WRS) ( Figure S2C ). Each strain contained local regions of high diversity that differentiated it from the others.
We also explored larger structural variation within each of the strains, including putative deletions, duplications and inversions detected by the presence of discordant and split read pairs among the perfectly paired reads ( Figure S3 ). We note that structural variant calling in pooled individual, short read data will be insensitive and subject to false positive calls due to mismapping. However, even a conservative approach examining only high quality homozygous variants revealed Genetic diversity between each of the three strains was assessed by pairwise analysis (measured by F ST of single nucleotide polymorphisms [SNPs] in 10 kb windows), which confirmed significant genome-wide strain differentiation ( Figure   2D ). Most of this genetic differentiation reflects underlying genetic structure between strains, due to a degree of reproductive isolation due to their geographic distribution ( Figure 2B ). Collectively, these data emphasise the challenge of characterising genetic variation associated with phenotypes such as drug resistance by simply comparing genetic diversity between a susceptible and resistant strain without accounting for this background genetic variation. Institute, UK -while MHco3(ISE) has been maintained for decades, it was originally isolated in East Africa. C. Within-strain nucleotide diversity for each of the parental strains, calculated as mean diversity per 100 kb windows throughout the genome using npstat . D. Between strain diversity, calculated as pairwise F ST in 10 kb windows throughout the genome using popoolation2 . Colours here and throughout represent chromosomes as described in A.

Genome-wide analysis of genetic diversity reveals the same single large introgressed region in both backcross lines
A backcross has previously been performed to introgress resistance-conferring alleles from the MHco10(CAVR) and MHco4(WRS) strains into the MHco3(ISE) susceptible genetic background (Redman et al., 2012) . In the present study, we used whole genome pairwise comparisons of the genetic diversity throughout the backcross to assess the degree of introgression and to detect evidence of selection associated with resistance. Each pairwise comparison was between particular generations of each backcross and the MHco3(ISE) susceptible parent. We  Figure S4 A,B ), which was highly correlated between the two crosses ( Figure   S4C ).

Figure 3. Genome-wide analysis of ivermectin associated loci. A.
Pairwise genetic diversity throughout the MHco3/10 (left plots) and MHco3/4 (right plots) backcrosses. In both crosses and at each point in the backcross, the experimental strain was backcrossed against the MHco3(ISE) parental strain -the data is therefore presented to compare the genetic diversity (F ST measure in 10 kb windows using popoolation2 ) between MHco3(ISE) and each sampled time point in the cross as per the cross schema in Figure 1. B . Output of the single-locus evolutionary model, describing sites which were inconsistent with a model of neutral evolution, measured using a likelihood threshold. Note that the X chromosome is not represented. Colours represent chromosomes as described in Figure 2A.    Collectively, these data suggest that there is a single introgression region on chromosome V that is under selection upon drug exposure, and that this region is under selection in the two independent serial backcrosses performed. This conclusion was supported by an independent evolutionary analysis, which compared the observed frequencies of variants associated with the resistant parental population with the distribution of allele frequencies expected under the assumption of selective neutrality ( Figure S6 ). An over-representation of alleles with atypical frequencies further suggests that a single region in the parasite genome was selected towards the right hand end of chromosome V. In this region, the allele frequencies were strongly inconsistent with the neutral expectation, suggesting that they had potentially evolved under the influence of selection ( Figure 3B

Detailed analysis of introgression region on chromosome 5
To more precisely characterise the introgression region, we focus on chromosome V ( Figure 4 ). A comparison of genetic diversity between MHco3(ISE) and the end-point of both crosses (BC 4 .IVM.P 4 ) revealed a strikingly similar pattern between the crosses, with a major region of differentiation between 38-42 Mb, and a second but lesser increase in genetic differentiation between 45-48 Mb that is most prominent in the MHco3/10 ( Figure 4A ). Encouragingly, the major region of introgression lies immediately adjacent to the Hcms8a20 microsatellite marker (located at position 36.16 Mb along chromosome V) that was shown to be genetically linked to ivermectin resistance in the preliminary analysis of these backcrosses (Redman et al., 2012) .
The F ST data are supported by a comparison of Tajima's D throughout this region  MHco3/4.BC 4 .IVM.P 4 ; yellow) of the crosses. Tajima's D was calculated using npstat in 100 kb windows spanning the genome. The variance in the mean value of Tajima's D among MHco3(ISE) and passages 3 and 4 samples is presented as smoothed line (red). The null expectation is that variance between these samples will be low in regions of the genome not under selection. C . Inferred likelihoods from the multi-locus population genetic model. The mean likelihood of the best fitting single-locus model at each locus position is shown by the solid black line; the light blue interval around this line shows the 5-95% confidence interval for this statistic calculated by a bootstrapping process. The position of the maximum likelihood value of the likelihood is shown by the vertical black dotted line; a confidence interval for this position, calculated from the bootstrapping values, is shaded in gray. The mean likelihood of the best fitting two-locus model at including each locus position is shown by the black dashed line.     In order to investigate whether one or more than one selected allele caused the observed evolutionary signal, a multi-locus population genetic model was applied. In    removing the constraint that selected alleles be separated would guarantee a likelihood at least equal to that of the single-driver model). In conclusion, due to the limited extent of recombination in the experiment, we could not rule out the presence of more than one drug-resistance variant in the identified region of chromosome V, but we did not find evidence in favour of a second selected allele.
Many candidate genes have been proposed in the literature as being in association with ivermectin resistance. We curated a list of genes that have been explored in H.
contortus , and/or had been shown to confer ivermectin resistance when mutated in C. elegans . The current genome assembly is not annotated, so we determined their location, either via mapping the published gene models from the V1 annotation, or determining the closest H. contortus orthologous gene from C. elegans candidates using Wormbase Parasite (Howe et al., 2017) ( Table S2; Figure S9 ). At least one candidate gene was located in each chromosome; in chromosome V, the location of six candidate genes were determined. However, none of these genes were found in the main introgression region defined by the F ST analysis ( Figure 4 A ; vertical annotated lines). Three genes -Hco-lgc-55, Hco-avr-15, and Hco-pgp-1(9) , the latter two which have a strong association with ivermectin resistance -lie to the telomere-side of the introgression region, however, they are located on the periphery of the F ST peaks rather than within them, suggesting that they are unlikely candidates to be under direct selection. Considering the absence of candidate resistance genes in the F ST peaks, we conclude that the driver(s) of ivermectin resistance in the MHco4(CAVR) and MHco4(WRS) strains are novel and have not been previously described in association with ivermectin resistance.

Strength of selection
We further used the single locus model to infer the strength and manner of selection in favour of the drug resistance allele. Data in each case indicated strong selection for the resistant allele ( Figure 6 ), such that susceptible worms produce multiple times fewer offspring than the resistant worms under drug treatment. In each case the value of the dominance coefficient, h, was less than one, indicating either an additive effect for the MHco10(CAVR) case, whereby each copy of the resistance allele contributes an equal amount of resistance, or of weak dominance for the MHco4(WRS) data, whereby the first resistance allele contributes slightly more than the second. This is consistent with the phenotypic characterisation of the initial crosses, whereby the F 1 individuals that would have been heterozygous for resistant alleles were partially resistant to ivermectin (Redman et al., 2012) . As such, having two copies of the resistant allele had a greater phenotypic effect than one; these data therefore have implications for the mechanism of the drug resistance allele, and evolution of drug resistance in general terms, as discussed further below. Although there has been much discussion regarding the likely multigenic basis of ivermectin resistance, these results strongly suggest a single locus (or potentially multiple closely linked loci) is likely to be the major effector of ivermectin resistance in these two strains. We acknowledge that we cannot discount previously described candidate genes as mediators of resistance in other isolates of H. contortus , or in other nematode species. Moreover, our data does not exclude a driver of transcriptional regulation within the introgression region that is under selection, which itself influences expression of other genes, including candidate genes, outside of the introgression region. However, most of the leading published candidate ivermectin resistance genes including Hco-glc-5 , Hco-avr-14 , Hco-pgp-9 , Hco-lgc-37 , Hco-pgp-2 and Hco-dyf-7 were not located in or near the region of introgression, which is consistent with a recent study using targeted sequencing of these genes in which none of them showed a signal of introgression in these two backcross experiments (Rezansoff 2016). This suggests that none of these candidate genes are a major ivermectin resistance-conferring locus in the MHco10(CAVR) and MHco4(WRS) strains. Although we identified three other previously described candidate genes -Hco -lgc-55 , Hco-avr-15 , and Hco-pgp-1 -adjacent to the introgression region, none were found within the peak of the region suggesting that these also are unlikely to be major direct targets of ivermectin-mediated selection.
What might account for the large number of candidate genes previously suggested to be associated with ivermectin resistance? The answer might lie in our direct comparison of genetic diversity between the parental strains ( Figure 2D ). This visualisation of the extent of genetic divergence between these strains makes it easier to understand how particular sequence polymorphisms might be naively attributed to being associated with resistance when in fact they simply represent one of many strain specific genetic variations that occur as a result of the independent evolutionary histories and lack of interbreeding between the strains. Our results highlight the challenge of interpreting simple direct comparison of strains, even with genomic approaches, when trying to disentangle those genetic polymorphisms underlying resistance from a complex background of genetic variation independent of a strains resistance status. The use of controlled genetic crosses as presented here and elsewhere (Choi et al., 2017;Redman et al., 2012;Rezansoff et al., 2016) , whereby population genetic structure can be explicitly accounted for in the analysis, provides a powerful way to mitigate the challenge of discriminating resistance-causing alleles from background genetic variation. As such, the locus we have identified on chromosome V represents the only unequivocal major ivermectin resistance locus identified in any parasitic nematode species to date.

Population genetic and evolutionary modelling defined the boundaries of the ivermectin resistance conferring locus
Inference of selection from the multi-locus model suggested the presence of stronger selection for the drug resistance allele in the MHco3/10 cross than in the MHco3/4 cross ( Figure 6 ). Additionally, although the data suggests that resistance is at least additive in both crosses, i.e., heterozygotes are likely to confer some resistance, but  Figure S5) . This implies that the data from the experiment itself should be understood as the output of a stochastic process; beyond the clear large-scale patterns observable in the data and detailed above, more minor details of the output might not be seen again were the experiment to be repeated.
The structure of the experiment thus imposes a limit on our ability to infer the location of a selected allele, and an improved characterisation of selection would likely best be achieved through the use of alternative experimental procedures.
Conducting further generations of cross as performed within this experimental framework would induce more recombination events, reducing the mean size of parental genomic blocks. However, such a course of action would be limited in scope were it not accompanied by the use of larger L 4 populations, for example by the simultaneous passage of the population through multiple animals in parallel. Clearly, a larger or longer experiment would have cost and welfare implications: the statistical framework we have developed would help to design and justify such an approach.

Importance of chromosome-level genome assemblies for genetic mapping and population genomics
The success of identifying a single region of introgression was dependent upon a greatly improved and highly contiguous, chromosomal-scale reference genome assembly for the H. contortus isolate, MHco3(ISE).N1. This isolate was used in the original draft assembly of this species (Laing et al., 2013) , which was derived via inbreeding of the same MHco3(ISE) strain used in the backcrosses presented. In the absence of a contiguous chromosomal scale genome assembly, interpretation of these analyses can be extremely challenging; true genetic signal(s) can be obscured by technical artifacts associated with a fragmented assembly that can include short contigs lacking spatial orientation, multiple haplotypes present, collapsed paralogs, and poor resolution of repeat structures. As a consequence, read mapping and variant calling in fragmented genomes can be suboptimal. These technical challenges are exacerbated in highly polymorphic species such as H. contortus and Teladorsagia circumcincta (Choi et al., 2017) , in which reference genomes were constructed from pools of individuals that, despite inbreeding, still contained significant genetic variation. Perhaps the biggest problem with using a fragmented genome assembly for genome-wide analyses is the inability to determine linkage.
Signals of selection will often be dispersed across multiple scaffolds of a fragmented draft genome assembly when in reality they are co-located in a single genomic region. This can potentially can give rise to the erroneous conclusion that multiple independent loci show a signal of selection when it may be that only a single selected locus exists. For example, a major effect single locus that has a physically extensive signature of selection due to a hard selective sweep, and the associated   Figure 2A.
Our comprehensive genome-wide analysis of parental strain genetic diversity re-emphasises that H. contortus is highly genetically diverse both within and between strains. It is clear, however, that short read mapping-based analyses that aim to exploit a single reference genome are almost certain to underestimate the extent of genetic variation that distinguishes a divergent strain. We have exploited linked selection of a subset of this variation, i.e., only single nucleotide polymorphisms, to characterise the introgression event in these analyses; however, understanding the functional consequences of this strain specific diversity will rely on a more comprehensive description of all of the variation that defines a strain. As an alternative to a linear reference genome, the use of population genome graphs -a non-linear or branching reference that contains alternate paths representing known genetic variation -may be more suitable (Paten et al., 2017) , and allow better characterisation of known variation that is too complex to be analysed using a linear reference (Maciuca et al., 2016) . This could be made possible once a comprehensive analysis of population and perhaps global genetic diversity reference set of genetic variation is made available (Sallé et al. , in preparation). Alternatively, such variation may require geographically diverse isolates to each have a de novo reference sequence, ideally sequenced and assembled using long-read sequencing technology, to allow a more comprehensive description of genetic variation within a population while allowing large scale variation between reference strains to be characterised.

Conclusion
Our genome-wide analysis of two genetic crosses between ivermectin resistant and ivermectin sensitive isolates has identified a major ivermectin resistance locus in two genetically and geographically distinct strains of Haemonchus contortus . This result depended on the development of novel population genetic methods to understand the genetics of the crossing procedure, and on the availability of a chromosome-scale genome assembly. More traditional population genetic analysis highlights the high level of variation both between and within the parental populations used to construct the cross, potentially explaining the difficulty in validating previous candidate genes that were identified using approaches that neither experimentally nor statistically controlled for this genetic diversity. This work represents the most comprehensive analysis of the genetics of anthelmintic resistance in a parasitic nematode to date and demonstrates the power of genetic crossing and contiguous genome assembly to eliminate false positive genetic signals typically linked to resistance. Our results eliminate the possibility that many of the previously proposed candidate genes are involved in ivermectin resistance in these isolates, and should focus future efforts on identifying the causal variant within the region we identify.

Code availability
Code used in this project is available from https://github.com/cjri/HCCross Sallé for constructive feedback on the manuscript.

Competing interests
The authors declare no competing interests.

Background of strains and original backcross
The MHco3(ISE), MHco4(WRS) and MHco10(CAVR) are H. contortus strains maintained and stored at the Moredun Institute. The MHco3(ISE) strain was originally derived by multiple rounds of inbreeding of the SE strain (Roos et al. 2004) .
The precise provenance of the SE strain is not clearly recorded but is thought to be originally obtained from East Africa. MHco10 ( The experiment described here extends two previously-described backcross experiments. The construction, phenotypic validation, and initial microsatellite analysis of these backcrosses has been described previously (Redman et al., 2012) , and is outlined in Figure 1A  (0.85% NaCl) after which worm sex was determined (Denham, 1969) , followed by surgical transfer of 45-100 male MHco3(ISE) and 50-100 female F 1 L 4 /immature adult worms into worm-free recipient lambs all within 2 h of the original collection.
This process of backcrossing was repeated for a total of 4 generations, i.e., MHco3/10.BC 4 and MHco3/4.BC 4 ( Figure 1B ). In vivo ivermectin selection was applied after mating and before collection of eggs to enrich for ivermectin resistant adults to be used in the subsequent round of the backcross. A controlled efficacy test was performed on the three adult strains and MHco3/4.BC 4 and MHco3/10.BC 4 strains to determine the initial levels of ivermectin efficacy, and the degree of resistance acquired in the introgressed lines. After four rounds of backcrossing, a selection experiment was performed. MHco3/10.BC 4 and MHco3/4.BC 4 strains were used to infect recipient sheep, after which eggs were collected and in vitro culture to L 3 . These infective larvae were used to infect two sheep per strain, one that received 0.1 mg/kg ivermectin and one that remained drug naive. At 7 days post treatment, treated and naive L 4 /immature adults were recovered by necropy and stored for molecular analysis.
The backcrosses described above were extended by performing a further four rounds of in vivo ivermectin selection, during which mating within the BC 4 population continued ( Figure 1C ). For each cross, progeny of the BC 4 generation were cultured to L 3 and used to infect a donor sheep, which was subsequently treated with ivermectin (0.1 mg/kg in round 1, followed by 0.2 mg/kg in following rounds). Eggs from ivermectin-treated survivor adults were collected post-treatment, cultured to L 3 , and used to infect a new donor sheep. L 4 /immature adults were collected by necropsy after rounds three (i.e., BC 4 .IVM.P 3 ) and four (i.e., BC 4 .IVM.P 4 ) of selection and stored for molecular analysis.

Library preparation and sequencing
Whole genome sequencing was performed on the three parental strains  Table S1 . We generated approximately 6.14×10 11 bp of sequence data, which equates to approximately 199.65× unmapped genome coverage per sample. Raw sequence data quality was analysed using FASTQC ( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) and was assessed using MultiQC (Ewels et al., 2016) prior to further processing.
The raw sequencing data generated during the current study are available in the European Nucleotide Archive repository (http://www.ebi.ac.uk/ena/) under the study accession number PRJEB2353 ( Table S1 ).
Genome-wide variants were determined for each sample using samtools-1.3 mpileup ( -F 0.25 -d 500 ). Nucleotide diversity and Tajima's D was determined using npstat (Ferretti et al., 2013) , which required pileup files (which we derived from the mpileup generated above) that were split per sample per chromosome as input. We determined short-range linkage disequilibrium between pairs of variants in single and paired reads using LDx (Feder et al., 2012) . We compared these data with estimates of LD decay over genetic distance as proposed by Hill and Weir (Feder et al., 2012;Hill and Weir, 1988) . P opoolation2 (Kofler et al., 2011)  Copy number variation between parental strains was determined using cnv-seq (Xie and Tammi, 2009) . Best read mapping hits were determined per bam ( samtools-1.3

view -F 4 bam | perl -lane 'print "$F[2]\t$F[3]"' > bam.hits ), before read counts for
pairwise comparisons determined in 10 kb sliding windows using cnv-seq.pl . The R package cnv (v 0.2-8) was used to determine pairwise chromosome-wide normalised log 2 ratios. Characterisation of structural variation in the parental strains was performed using the speedseq sv pipeline (Chiang et al., 2015) to identify putative duplications, deletions and inversions in each strain. This approach exploits all reads mapped using BWA-MEM, including split and discordant read pairs, which are subsequently scored using LUMPY and SVTyper. Conservative filtering was applied to retain only homozygous variants (1/1) with a minimum quality score of 100.

Population genetic modelling
Bespoke single-locus and multi-locus models were used for population genetic inference from the data. Common to both methods, variants in the genome potentially associated with resistance were identified. At each locus in the genome, a nucleotide {A,C,G,T} was defined as existing in the susceptible parental population if it was observed at a frequency of at least 1% in that population. Loci in the resistant parental population were then identified for which exactly one nucleotide that did not exist in the susceptible population was observed at a frequency of at least 1%; without prejudice as to its phenotypic effect, this was denoted the 'resistant' allele at that locus, being associated with the resistant parental population.

Single-locus model
At the stages in the experiment where the population was comprised of eggs, we assumed to be large, such that the processes of crossing and backcrossing could N be described deterministically. Under these circumstances the genotype frequencies following a cross, denoted , , and , are given by: while the genotype frequencies following a backcross with the homozygous susceptible strain are given by: Evaluating this process gives a probability distribution for the frequency of the resistant allele at any given point in the experiment, dependent upon the number of resistant alleles in the initial resistant population. This number of resistant alleles, which we denote , is unknown; sequence data from the resistant parental strain n r were used to generate a prior distribution for this value.
We first suppose that the frequency of the resistant allele in the resistant parental population is equal to some value, . Given that the initial experimental population p r contains 50 r esistant worms, with 100 alleles, collected from this larger population, the distribution of is then given by: We now suppose that sequencing the resistant parental population gave resistant a r alleles at the locus in question out of a total read depth of . The initial frequency A r is unknown; however using the data an estimate can be made for this statistic, p r expressed as a distribution of the allele frequency. Specifically, under the assumption of a uniform prior, the underlying probability can be said to be p r distributed as a beta distribution with parameters and . We therefore a r 1 A r − a r + obtain the following distribution for the statistic : The extent to which observed allele frequencies were consistent with the neutral model was calculated using a likelihood model: where at time , resistant alleles from a read depth of were observed. A low t a t A t likelihood indicates deviation from the neutral expectation; Bonferroni correction L t was used to identify significance at the 95% level. Data from each sample were analysed; four-fold non-neutral sites, being loci for which a significant likelihood was calculated from all four samples collected throughout each cross experiment, were identified.

Multi-locus model
A multi-locus model was developed to describe the manner in which allele frequencies would be expected to change over the course of the experiment; this model exploits information about the location of variant loci provided in the H. contortus reference genome, and the genome-wide map of recombination rate in the worms (Doyle et al., 2018) ; this map was inferred by characterising recombination breakpoints in F 1 L 3 progeny by whole genome sequencing, from which recombination rates were determined by comparing genetic distances between SNPs against their physical location in the genome. Where fixed genetic differences occur between parental strains, the dynamics of adaptation in the resulting cross are relatively straightforward (Illingworth and Mustonen, 2013;Illingworth et al., 2012) ; this has been exploited to identify quantitative trait loci in yeast and malaria cross populations (Abkallo et al., 2017;Parts et al., 2011) . Here a heuristic approach was used to identify fixed genetic differences between the parental strains before modelling evolution at these loci to identify the location of alleles conveying drug resistance.
Three filters were used to identify putatively fixed differences between parental strains. Firstly, we identified genomic sites for which the frequency of the resistant allele was 95% or greater in the resistant parent, requiring a read depth of at least 50x coverage for such sites to achieve a good level of statistical certainty. Secondly, we noted that, following the backcross performed at the start of the experiment, no locus can be homozygous for the resistant allele and as such, the resistant allele frequency can be no greater than 50% in the population; by way of reducing noise, we removed any locus with resistant alleles at a frequency of 60% or greater in the MHco3/10.BC 4 .noIVM sequence data. Thirdly, we noted that at homozygous separating sites in a population following a cross, the allele frequency will change smoothly over time; where there are genomes in the population, the allele N frequency, considered as a function of chromosome position, will change by /N 1 wherever a crossover recombination event occurs within a genome. A diffusion model of allele frequency change, described in a previous publication (Abkallo et al., 2017) , was used to identify allele frequencies across the genome that were consistent with this pattern. This analysis fits a posterior distribution to the allele frequencies across the genome, also inferring the extent of noise in the sequence data via a beta-binomial model intrinsic to the fitting process. Conservatively, sites no more than eight standard deviations from the mean of the posterior frequencies were included in the analysis. These approaches resulted in data from a total of 1 w 0 = where is the selection coefficient (fitness advantage of the homozygous resistant s compared to homozygous susceptible) and is the dominance coefficient h (proportion of fitness change conveyed by a single copy of the allele). We assumed that all worms have equal fitness in the absence of the drug. The likelihood of any given inference was calculated using the beta-binomial likelihood function obtained from the diffusion model above. Accounting for the stochasticity of the system, a set of at least 250 simulations were generated for each set of parameters, calculating the mean likelihood fit to the data across the simulations. A bootstrapping process was applied to quantify the uncertainty in this likelihood.
In an extension to this model, a two-driver was considered, in which the resistant variant at each of two loci was under selection; to allow the exploration of model space in this case within reasonable computational time the assumption was made of additive selection at each locus. To distinguish this from the previous model, a requirement was imposed that the selected alleles be separated by at least 2 Mb in the genome. -See accompanying xlsx document Figure S1. Characterisation of within-strain diversity. A. Within strain nucleotide diversity per chromosome, summarising genome-wide data presented in Figure 2C. Colours represent chromosomes as described in Figure 2A. B . Linkage disequilibrium between variants present in paired reads was estimated using LDx for each parental strain. Line represents expected LD decay over genetic distance (Feder et al., 2012;Hill and Weir, 1988) .     Figure S6. Location of significantly non-neutral loci identified using the single-locus population genetic model. Corresponding peaks in the location of significant sites can be seen in the MHco3/10 ( A ) and MHco3/4 datasets ( B ). A total of 70.6% of significant sites in the MHco3/10 dataset, and 90.6% of significant sites in the MHco3/4 dataset, were found in chromosome V. Data are binned in 1 Mb windows spanning the genome. Colours represent chromosomes as described in Figure 2A.    Table S2. Colours represent chromosomes as described in Figure 2A.