Mapping QTL influencing gastrointestinal nematode burden in Dutch Holstein-Friesian dairy cattle

Background Parasitic gastroenteritis caused by nematodes is only second to mastitis in terms of health costs to dairy farmers in developed countries. Sustainable control strategies complementing anthelmintics are desired, including selective breeding for enhanced resistance. Results and Conclusion To quantify and characterize the genetic contribution to variation in resistance to gastro-intestinal parasites, we measured the heritability of faecal egg and larval counts in the Dutch Holstein-Friesian dairy cattle population. The heritability of faecal egg counts ranged from 7 to 21% and was generally higher than for larval counts. We performed a whole genome scan in 12 paternal half-daughter groups for a total of 768 cows, corresponding to the ~10% most and least infected daughters within each family (selective genotyping). Two genome-wide significant QTL were identified in an across-family analysis, respectively on chromosomes 9 and 19, coinciding with previous findings in orthologous chromosomal regions in sheep. We identified six more suggestive QTL by within-family analysis. An additional 73 informative SNPs were genotyped on chromosome 19 and the ensuing high density map used in a variance component approach to simultaneously exploit linkage and linkage disequilibrium in an initial inconclusive attempt to refine the QTL map position.


Background
Parasitic gastroenteritis (PGE) caused by trichostrongylids and strongylids remains an important issue for cattle husbandry world-wide including in developed countries. Treatment and prophylaxis relies to a large extent on the use of broad spectrum anthelmintics combined with proper management practices. Although these measures are rather effective, nematode infestation is only second to mastitis in terms of health costs to dairy farmers, estimated at 90 million € annually for the Netherlands only. These costs result not only from the use of anthelmintics but also from the production losses due to sub-clinical infestation (e.g. [1]).
Despite their efficacy, the systematic and extensive use of anthelmintics causes concerns with regards to (i) the negative effect on the development of natural immunity, (ii) consumer concerns regarding drug residues in food products and the environment, and (iii) the increasing incidence of parasite resistance against available anthelmintics. Complementary control strategies are thus desirable and a number of approaches are being explored including the use of nematophagous fungi, tannins, immunonutrition, vaccination (e.g. [2,3]), as well as selective breeding for enhanced resistance or resilience (e.g. [4]).
Identifying QTL and gene variants influencing parasite burden paves the way towards marker assisted selection (MAS) for increased resistance. MAS may be particularly effective for this trait as it has relatively modest heritability and is tedious to measure. Parasite burden is typically "overdispersed" with most animals being virtually devoid of parasites and a few being heavily infected and responsible for most of the infestation pressure. Identifying such "shedders" (contributing most to pasture contamination) based on their inherited predisposition, might be an effective way to reduce overall parasite transmission.
We herein describe a whole genome scan to map QTL influencing gastro-intestinal nematode burden in a Holstein-Friesian "daughter design" [32] in which we exploit selective genotyping (e.g. [33]). We subsequently combine linkage and linkage disequilibrium mapping (e.g. [34][35][36]) in an attempt to refine the map position of one of the QTL identified in the genome scan.

Estimating the heritability of gastrointestinal nematode burden in the Dutch Holstein-Friesian dairy cattle population
To estimate the heritability of gastrointestinal nematode burden in Dutch dairy cattle, we collected faeces from 1,420 cows between June and August 2000. Selected animals were between two and six years of age, more than two months away from the last and next calving date to avoid variation associated with peri-parturient relaxation in immunity, and grazing as infestation is known to be pasture-borne [37,38]. They originated from 605 herds. Nematode eggs per gram of faeces (EPG) were counted as described [39]. To estimate the parasite burden by species, we performed coprocultures and counted the number of genus or species-specific (Bunostomum spp, Cooperia oncophora, Cooperia punctata, Haemonchus contortus, Oesophagostomum spp, Ostertagia ostertagi, Trichostrongylus spp) larvae per gram of faeces (LPG) as described [40]. Fig.  1 shows the frequency distribution of EPG and LPG in this data set.

A whole genome scan using selective genotyping identifies two QTL influencing nematode burden in dairy cattle
To map QTL influencing nematode burden in dairy cattle, we collected faeces and venous blood from 4,053 Dutch Holstein-Friesian dairy cows between July and September 2002. The cows were selected on age (2 nd eggs per gram of faeces (EPG) were determined for each animal using the rapid variant of a high-throughput isolation procedure described in Mes et al. [39], which is described in more detail in the Materials and Methods.
Resulting EPG were log-transformed (LEPG), and breeding values (BV LEPG ) computed for each cow using the same mixed model as in the previous analysis augmented with a regression on the number of "days in milk" (e.g [41]). Variance components were estimated using AI-REML [42]. LEPG decreased very significantly with increasing parity (parity 2: 0.12; 3: 0.04; 4: 0.00; p = 1.7 × 10 -11 ), and by 0.0006 per day in milk (p = 0.003). The heritability of LEPG in this data set was only 0.074 (standard error 0.027), thus considerably lower than the previous estimate. The herd effect accounted for 22.6% of the variance (standard error 0.013). Half-sib ANOVA analysis of LEPG using a sire model highlighted the extreme significance of the sire effect (p < 4 × 10 -14 ), but led to a comparably low h 2 estimate of 0.09 assuming that the correlation between half-sibs ("intraclass correlation") equals h 2 /4 (e.g. [41]). Fig. 2 shows the distribution of EPG and BV LEPG values across the entire data set, as well as the mean and range of BV LEPG values for each of the twelve sire families. As EPG were measured using distinct procedures in the 2000 and 2002 campaigns, we did not perform a joint analysis of both data sets.
Within each sire family, we selected the top and bottom 10% of the daughters on BV LEPG for a total of 768 (2 × 384) cows (Fig. 2C). Selected daughters and their sires were genotyped for a panel of 153 microsatellite markers spanning the bovine autosomes. The markers were chosen based on their chromosomal location and heterozygosity in the 12 sires. We constructed marker maps using CRIMAP [43]. Marker order was in perfect agreement with published maps (e.g. [44]). The maps spanned a combined 28.79 Morgan (Kosambi), in good agreement with the 31.59 Morgan of the Ihara et al. [44] male map length. The average distance between adjacent markers was thus 22.85 cM (Kosambi). Fig. 3A shows the information content [45] of the corresponding marker map. QTL analysis was performed using a previously described non parametric rank-based method for outbred half-sib pedigrees that is particularly suited for phenotypes that are not normally distributed including parasite counts [45]. Significance thresholds were determined by permutation [46]. QTL analysis was performed both for BV LEPG and for EPG. Fig. 3B shows the results obtained by acrossfamily analysis in this first pass genome scan. The suggestive threshold (log 10 (1/p) = 0.2), i.e. the threshold expected to occur on average once per genome scan under the null hypothesis [47], was exceeded on four chromosomes: BTA7, 9, 14 and 19.
For these chromosomes, we increased the microsatellite density from six to 17 (BTA7), seven to 26 (BTA9), six to 18 (BTA14) and seven to 29 (BTA19). Maps were re-constructed using CRIMAP and their information content shown to increase by 23% on average (Fig. 3A). We repeated the whole genome QTL scan including the genome-wide permutations test. While the statistical evidence supporting the QTL did not increase for chromosomes 7 and 14, on the contrary, the signal intensity increased quite dramatically for chromosomes 9 and 19, reaching genome-wide significance for BTA9 (p = 0.025) and near genome-wide significance for BTA19 (p = 0.08) (Fig. 3C). Note that the associated nominal p-values are respectively 6 × 10 -4 and 1.8 × 10 -3 . Fig. 4A and 4B show the detailed location scores obtained on BTA9 and BTA19 including the confidence intervals for the QTL location determined by bootstrapping [48,49]. Within family analysis indicates that at least two of the 12 sires are likely to be heterozygous for the BTA9 QTL and three for the BTA19 QTL ( Fig. 4A and 4B).
Further examination of the within-family results across the entire genome revealed an additional six chromosomal locations with genome-wide suggestive evidence for QTL influencing nematode burden ( Table 2). Note that the corresponding threshold does not account for the analysis of multiple (albeit correlated) phenotypes (two) and multiple families (12 families).

Attempting to refine the map position of the BTA19 QTL using combined linkage and linkage disequilibrium (LD) analysis
To refine the map position of the BTA19 QTL, two SNPlex R sets (Applied Biosystems, Foster City) were used to genotype the 12 sires and their selected daughters for an additional 96 SNPs. The corresponding BTA19 SNPs were selected from the available public domain SNPs http:// www.hgsc.bcm.tmc.edu/ on the basis of their predicted chromosomal location, detection in Holstein-Friesian sequence traces, and prior information about minor allele frequency (MAF). 73 of the 96 SNPs proved to be polymorphic in our pedigree material. The distribution of MAFs in the Dutch Holstein-Friesian population is shown in the additional file 1. The 73 polymorphic SNPs were included in the BTA19 map using CRIMAP [43]. Markers that could not be positioned by linkage analysis were placed according to their location on the bovine genomic sequence (Build 4.0). The obtained marker map is shown in Fig. 5B with additional details provided in additional file 2. It comprises a total of 101 markers with an average distance between adjacent markers of 1.7 cM. We measured the level of LD for all marker pairs using both D' and r 2 . The resulting LD map is shown in Fig. 5A. This higher density BTA19 map was used in a variance component analysis that simultaneously extracts information from linkage as well as LD (e.g. [34][35][36]). The phenotype considered was LEPG, corrected for herd effect, parity effect and effect of days in milk as estimated from the complete data set (4,053 cows). The corresponding mixed model included an individual genotype effect, an individual animal effect, and a random error. The obtained results are shown in Fig. 5B.
Exploiting only the linkage signal provided by the paternal chromosomes (i.e. considering that the 24 paternal chromosomes and 768 maternal chromosomes are unrelated, and letting the covariance between the genotype effects of half-sisters only depend on the sharing of paternal chromosomes) in essence recapitulates the results obtained with the rank-based QTL mapping using the lower density marker map (Fig. 4B). The most likely position of the QTL is at position 54.48 cM (z = 2.1) coinciding with the URB026-TGLMA51 interval. Extracting LD information from the 24 paternal chromosomes yielded a maximum lod score of 2.7 (i.e. an increase of 0.6 units) at map position 54.51 cM, corresponding to the rs29027286-rs29021779 interval. The corresponding nominal p-value is 0.0025. Marked drops in the lod score profiles on either side of the peak defined the rs29027283 -rs29013747 interval selected for examination of positional candidates (Additional file 3).
Including LD information from the maternal chromosomes resulted in a decrease of the lod scores in the area corresponding to the linkage peak. Thus the maternal chromosomes did not provide LD information that would independently corroborate the linkage signal, on the contrary (Fig. 5B). One possible explanation for this is that the QTL effect estimated from the segregation of the paternal chromosomes was likely inflated.
The same combined linkage and LD analysis was also applied to the available BTA9 data (i.e. the marker density was not increased), but did not improve either significance or mapping precision (data not shown).

Discussion and Conlusion
In this work we have first estimated the heritability of gastrointestinal nematode burden in the Dutch Holstein-Friesian dairy cattle population. Parasite load was quantified using faecal egg counts (EPG) in animals of at least two years of age. In a first data set, h 2 estimates for EPG obtained were relatively high, of the order of 0.20. Stratifying the parasites by species using faecal larval counts did not increase h 2 , with the noticeable exception for Haemonchus sp. Although Haemonchus sp. can infect cattle (e.g., [50]), this high h 2 is puzzling, because until now evidence for a genetic component of host resistance to this ovine parasite has not been found. In general, herd effects were of the same magnitude as corresponding h 2 , including for Haemonchus sp larval counts. The significant herd effect in case of Haemonchus sp can in part be explained by the fact that this species (H. contortus) only occurs in cattle if sheep are present on the farm. In a second data set, however, h 2 for EPG, estimated using both an animal model and a half-sib design (e.g. [41]), were less than 0.10. This drop may be due to the use of a faster but less sensitive egg counting procedure in this cohort.
We then conducted a whole genome scan to identify QTL affecting faecal egg count. The study was certainly amongst the largest conducted to date as it involved the phenotyping of over 4,000 animals and genotyping of more than 750 within-family extremes. It is worthwhile noting, however, that other QTL mapping studies targeting parasite resistance typically used experimentally challenged animals and/or relied on repeated phenotypic measurements of, for instance, faecal egg counts, which is expected to increase the heritability of the analyzed traits (e.g. [27][28][29]). The genome scan revealed two chromosome segments with strong evidence for QTL, respectively on BTA9 and BTA19. Although these findings certainly await independent confirmation in cattle, it is noteworthy that in sheep, Crawford et al. [29] detected suggestive QTL influencing faecal egg counts at the orthologous positions on OAR8 (BTA9) and OAR11 (BTA19), while Beh et al. [27] detected a suggestive QTL for faecal egg count at the (A) Pairwise LD between BTA19 markers measured as D' (lower left) and as r 2 (upper right) orthologous OAR11 position. These coincident findings considerably strengthen the support for the detected QTL.
In addition we report six loci with suggestive within-family evidence for QTL affecting faecal egg count. This information might be useful for comparison with results of future studies. Note that because our experimental design was primarily based on linkage analysis in paternal halfsib families, we have not included markers on the X chromosome in this study [51].
We have attempted to refine the map position of the BTA19 QTL using a denser map and by exploiting LD. This effort did not result in substantial gains in terms of signal strength and resolution. This is possibly due to the fact that the marker density was still insufficient to effectively capture an LD signal (Fig. 5A). In particular, this might be the case if the segregating QTL alleles are old and have therefore had ample opportunity to recombine with flanking markers. Alternatively the QTL may be characterized by multiple, individually rare causative alleles that would be more difficult to detect by association mapping.
Genome-wide SNP arrays with a marker density that is at least one order of magnitude higher than what was used in the present study (e.g. [52]) are becoming available in livestock including cattle. It may be worthwhile to genotype the pedigree material used in the present study and attempt to find additional QTL by performing genomewide association studies. One advantage of associationbased designs is that they extract information from the maternal chromosomes whereas the daughter design [32] only uses information from the paternal chromosomes albeit in a very robust manner. LD-based approaches thus have the potential to double the amount of usable information.
The strongest signal for the BTA19 QTL was obtained in an interval flanked by markers rs29027283 and rs29013747. The corresponding chromosome segment measures ~3.3 Mb in the bovine (Btau4.0). Ninety-five genes are annotated to the orthologous genome segment in human (Additional file 3). At least one of these is worth mentioning: ITGAE. This gene codes for an integrin alpha chain that is preferentially expressed on the surface of intestinal intraepithelial T lymphocytes, which are thought to be important for immune surveillance and immune responses to mucosal pathogens [53][54][55].
It is noteworthy that BTA19 harbours several QTL influencing a range of phenotypes of economic importance in dairy cattle, including milk yield and composition and fertility (Coppieters et al., unpublished observations). The impact of selecting for increased parasite resistance on other traits will thus have to be evaluated carefully prior to any MAS attempt.

Determining faecal egg and larval counts
Faeces were collected rectally. Egg flotation and precipitation based on dense salt-and sucrose solutions was used to determine EPG [39]. In brief, for the 2000 cohort 3 -6 g of faeces was suspended in 25 ml of 13% w/v NaCl solution, followed by filtration over a coarse sieve. After centrifugation, the supernatant was diluted once with tap water, mixed, and precipitated by centrifugation. Subsequently, a flotation and precipitation step in 25 ml of 17% w/v sucrose solution was used to improve clarity of the egg preparations. After removal of the supernatant, digital egg preparations were made as described [39]. For the 2002 cohort, only 2-3 g of faeces was used to be able to analyze larger numbers of faecal samples. Larval cultures to determine larval counts and to identify the involved genera and species were carried out according to [40].

Variance component analysis
Variance component analyses were conducted with AI-REML [42]. Asymptotic standard errors of the parameters were obtained from the inverse of the Average Information matrix.

Map construction and information content
Marker maps were constructed using CRIMAP [43]. Linkage information content of the maps was computed as previously described [45]. The information content quantifies the amount of information that is extracted from the marker genotypes as a fraction of the theoretical maximum, i.e. assuming that one could unambiguously determine which paternal homologue was transmitted to each one of the offspring at a given map position.

QTL mapping
QTL mapping was performed using the rank-based nonparametric option of the previously described HSQM software for QTL mapping in multiple outbred half-sib pedigrees [43]. Briefly, phenotypic values (in this case EPG or BV LEPG ) are converted to ranks within half-sib pedigrees. Evidence for the segregation of a QTL at a given map position in a given half-sib pedigree, is measured using a test statistic that is closely related to Wilcoxon's sum of rank statistic, yet accounts for the probabilities of inheritance of the grand-maternal versus grand-paternal homologues from the sire. The latter transmission probabilities are computed conditional on marker genotypes. Under the null hypothesis, this pedigree-specific test statistic has a standard normal distribution. Pedigree-specific test statistics are summed over sire families to yield an overall chi-squared statistic with degrees of freedom corre-sponding to the number of sires. The genome is systematically scanned for the presence using a typical interval mapping procedure, in which the position of the hypothetical QTL slides across the genome at one centimorgan intervals.
The genome-wide significance of this chi-squared statistics (i.e. accounting for the testing of multiple locations across the genome) is determined by comparison with the distribution of the largest genome-wide test statistics obtained from the analysis of 10,000 phenotype permutations [46]. The statistical significance of a QTL was expressed as log(1/p), where p is the proportion of phenotype permutations for which the QTL test statistic was exceeded anywhere across the genome. QTL were considered significant when log(1/p) was superior to 1.30 corresponding to a p value of 0.05. Following Lander and Kruglyak [47], QTL were considered suggestive if the test statistic exceeded a threshold reached on average once per phenotype permutation. Assuming that "thresholdexceeding-events" are Poisson distributed, the proportion of permutations for which the suggestive threshold is not exceeded anywhere in the genome is e -1 = 0.37. The p value corresponding to the suggestive threshold thus corresponds to the proportion of permutations for which the suggestive threshold is exceeded at least once across the genome which is 1-0.37 = 0.63, and the corresponding log(1/p) = 0.2. Confidence intervals for QTL locations were determined by bootstrapping [48]. Bootstrap samples were generated by randomly sampling daughters with replacement, yet keeping the number of daughters constant per family. QTL mapping was conducted on each bootstrap sample using HSQM, and the position with most significant signal stored for each sample. The QTL confidence interval was then defined as the shortest continuous chromosome segment encompassing 95% of these most significant positions.

Linkage disequilibrium
Pair-wise linkage disequilibrium was quantified using D' and r 2 which were computed from the maternal chromosomes as [56][57][58].
QTL fine-mapping QTL fine-mapping was performed using a previously described variance component approach that simultaneously exploits linkage and linkage disequilibrium [36]. Briefly, we first determined the most likely marker linkage phase for all genotyped animals (i.e. sires and daughters) using purpose-built software (Farnir, unpublished). For a given map position (queried for the location of a QTL), we then (i) computed transmission probabilities for alternate paternal homologues conditional on marker data using standard procedures (e.g. [45]), and (ii) computed identity-by-descent (IBD) probabilities between all pairs of "founder" chromosomes following Meuwissen and Goddard [35]. Founder chromosomes are defined as the 24 chromosomes of the 12 sires as well as the 768 maternal chromosomes of the daughters. Founder chromosomes were clustered on the basis of the pair-wise IBD probabilities using an agglomerative clustering method and the complete link criterion function implemented with CLUTO [36]. The phenotypes of the daughters (LEPG) were first precorrected for fixed effects (herd, parity, days in milk) estimated from the complete data set, and then modelled using a mixed model including a random "founder cluster" effect and a random individual animal effect [41]. Daughters were linked to the clusters of founder chromosomes using coefficients of one (or zero when extracting information from the paternal chromosomes only) for the cluster including the maternal chromosome, coefficients corresponding to the transmission probabilities (or zero when extracting information from the maternal chromosomes only) for the clusters including the respective paternal chromosomes, and coefficients of zero for all other clusters. Corresponding coefficients were summed when two or more parental founder chromosomes belonged to the same cluster. Variance components were estimated by REML [41]. Covariances between chromosome clusters were assumed to be zero, while covariances between individual animal effects were assumed to be proportionate to twice the coefficient of coancestry. The likelihood of the data under this "full" model was compared with the likelihood of the data under a reduced model without cluster effect. Evidence for a QTL at a given map position was expressed as the log 10 of the corresponding likelihood ratio (LR). The corresponding log 10 (LR) was computed in the middle of each marker interval to generate a location score profile.

Authors' contributions
WC coordinated microsatellite genotyping, constructed linkage maps, performed QTL and bioinformatic analyses, and participated in writing the manuscript. THMM designed the experiment, collected samples and phenotypes (faecal egg and larval counts) and participated in writing the manuscript. TD & FF estimated heritabilities and performed the combined linkage and LD QTL mapping. NT performed microsatellite genotyping. CS selected the animals to be sampled and compiled genealogical information. AWCAC designed and coordinated the experiment. MG designed and coordinated the experiment, analyzed data and wrote the manuscript. HWP designed the experiment, collected samples and phenotypes (faecal egg and larval counts) and participated in writing the manuscript. All authors read and approved the final manuscript.