Combining evidence of selection with association analysis increases power to detect regions influencing complex traits in dairy cattle
- Hermann Schwarzenbacher†1, 3,
- Marlies Dolezal†2Email author,
- Krzysztof Flisikowski1,
- Franz Seefried1,
- Christine Wurmser1,
- Christian Schlötterer2 and
- Ruedi Fries1
© Schwarzenbacher et al; licensee BioMed Central Ltd. 2012
Received: 21 April 2011
Accepted: 30 January 2012
Published: 30 January 2012
Hitchhiking mapping and association studies are two popular approaches to map genotypes to phenotypes. In this study we combine both approaches to complement their specific strengths and weaknesses, resulting in a method with higher statistical power and fewer false positive signals. We applied our approach to dairy cattle as they underwent extremely successful selection for milk production traits and since an excellent phenotypic record is available. We performed whole genome association tests with a new mixed model approach to account for stratification, which we validated via Monte Carlo simulations. Selection signatures were inferred with the integrated haplotype score and a locus specific permutation based integrated haplotype score that works with a folded frequency spectrum and provides a formal test of signifance to identify selection signatures.
About 1,600 out of 34,851 SNPs showed signatures of selection and the locus specific permutation based integrated haplotype score showed overall good accordance with the whole genome association study. Each approach provides distinct information about the genomic regions that influence complex traits. Combining whole genome association with hitchhiking mapping yielded two significant loci for the trait protein yield. These regions agree well with previous results from other selection signature scans and whole genome association studies in cattle.
We show that the combination of whole genome association and selection signature mapping based on the same SNPs increases the power to detect loci influencing complex traits. The locus specific permutation based integrated haplotype score provides a formal test of significance in selection signature mapping. Importantly it does not rely on knowledge of ancestral and derived allele states.
Keywordsselection signature whole genome association cattle complex trait
Linking genotype to phenotype is one of the central questions in biological sciences. Current approaches to map intraspecific variation to causative sequence variation use either a quantitative genetics framework (association mapping) or rely on population genetic theory (hitchhiking mapping).
Population genetic theory predicts that a favorably selected allele is either lost or increases in frequency until fixation . With the spread of a beneficial allele, linked, non-selected sites also increase in frequency, a phenomenon that has been termed hitchhiking .
Based on this principle, genome scans were performed in a large number of species such as human, maize, Drosophila, Arabidopsis thaliana and Plasmodium falciparum[2–10]. Selection signatures in cattle based on SNP data on single chromosomes were reported on Bos taurus (BTA) chromosomes 6 , 19  and 29 . Barendse et al. , Gibbs et al.  and Hayes et al.  published genome wide maps of diversifying selection between Bos taurus dairy and beef cattle, Flori et al.  between three different French dairy cattle breeds, and Gautier et al.  among several West African cattle breeds. Qanbari et al.  employed an extended haplotype homozygosity test and published a genome wide map of recent selection within the German Holstein dairy cattle population. Gautier et al.  also used this signature of selection within a recently admixed Caribbean cattle breed. Furthermore these authors employed a modified version of Rsb scores proposed by  to detect local excess or deficiency from a given ancestry relative to the average genome admixture levels. Qanbari et al.  recently published a genome scan in several dairy and beef breeds including German Brown Swiss cattle based on integrated haplotype scores and when contrasting breeds employing FST statistics. However, disentangling selection from nuisance signals caused by the demographic history of a breed or species based on genome wide polymorphism data remains challenging.
Stringent artificial selection resulted in an enormous improvement of production traits over the last couple of decades, especially for traits with moderate to high heritability. In combination with the availability of high density SNP arrays and high quality phenotypes, this intense selection renders the genome of dairy cattle an optimal model to look for signatures of recent positive selection.
While for genetic model organisms very powerful genomic tools are available, these species frequently lack phenotypic records to link signatures of selection in the genome to actual variation in phenotype unless a huge additional phenotyping effort is undertaken. This is the great advantage of using livestock species, as numerous production- and fitness traits are routinely recorded and used in breeding value estimation.
The estimated breeding value (EBV) expresses the genetic merit of a breeding animal estimated based on their own performance and performances of all available relatives. In the case of dairy bulls this typically includes hundreds to thousands of daughters. Furthermore EBVs are corrected for systematic environmental effects. Therefore the breeding value of an animal is the sum of its genes' additive effects based on Fisher's infinitesimal model , which assumes a very large (effectively infinite) number of loci each with very small effect. Although only approximatively correct, application of this model in selection paved the way for efficient livestock breeding.
Since Sax's experiments with beans in 1923  we know however, that there are so called quantitative trait loci (QTL) that have a bigger than infinitesimal effect and that these loci can be mapped i.e. via linkage analyses. Such QTL mapping studies as a quantitative genetics approach have been very successful in cattle, see [24–26] for a summary.
Rapid improvements in high throughput SNP genotyping technologies and commercially available high density SNP arrays for livestock species allowed livestock geneticists to turn towards whole genome association (WGA) mapping approaches in the recent past e.g. [27–29] or see  for a review in livestock. The number of individuals that need to be genotyped to achieve reasonable power in a stand alone WGA is nevertheless still limiting .
Population genetics provides information that is independent of phenotypic information on putative loci under strong directional artificial or natural selection. We show in this paper that combining a population genetics signal with association tests based on quantitative genetics in a composite statistic, increases power and reduces the number of false positive signals for localizing the source of selection.
In a similar vein,  proposed a composite test statistic of several selection signature signals to increase power to detect selection. Barendse et al.  discussed the potential of combining genome wide scans for selection and whole genome association studies. However, as these authors were looking for signatures of diversifying selection based on FST values the combination with association results is not straightforward. Akey at al.  followed up a region on dog (Canis familiaris) autosome 13 that showed evidence for selection in the Shar-Pei dog breed with association mapping and finally dissected the molecular basis of the typical skin wrinkling phenotype in this breed. Ayodo et al.  found that in a case-control candidate gene approach in humans the statistical power to detect disease variants can be increased by orders of magnitude by weighting candidates by their evidence of natural selection.
Our composite statistic combines a long-range haplotype statistic, based on genomic signatures of (new) positive mutations that are not yet fixed in a single population, and the regression coefficient based on allele-count indicator variables of a WGA - as the quantitative genetic approach. Both estimators rely on the underlying linkage disequilibrium (LD) between the causal variant and the genotyped SNP. We further propose a new mixed model approach to account for stratification in population based association studies, and we introduce a modified extended integrated haplotype score test statistic to detect selection. Using computer simulations and real data we show that the combination of both tests increases the power for localizing the target of selection relative to a single test and reduces the number of false positive signals.
The highest selection pressure in the overall breeding goal in Brown Swiss cattle over the last decades was put on protein yield, the main trait of interest in this study to ensure high power for both mapping approaches.
The 140 highest and 148 lowest bulls with respect to protein yield EBV and a minimal EBV-accuracy (r2, degree of determination) of 0.9 were chosen out of 973 progeny tested Brown Swiss bulls for selective genotyping . Up to two generations were present among the genotyped bulls. The bulls descend from 90 different sires and 121 maternal grandsires. Sire and maternal grandsire family size ranged from 1 to 20 and 1 to 34 members, respectively.
Sire EBVs were obtained from the genetic evaluation centre LfL Grub, Germany from the August 2008 genetic evaluation for PY. EBVs for protein yield are in kilogram units.
Genomic DNA was prepared from semen straws following standard protocols using proteinase K digestion and phenol-chloroform. Across all samples the concentration was set to 50 ng/μl. Bulls were genotyped according to the manufacturer instructions with the Illumina BovineSNP 50K Bead chip® comprising 54,001 SNPs at the Institute of Human Genetics of Helmholz Zentrum München, Germany. Genotypes of one individual were omitted due to a call rate of < 90%. The average call rate of the remaining 287 bulls was 98.6% corresponding to approximately 53,230 genotypes obtained per individual. The software PLINK, version 1.03  was used to filter raw genotype data. SNPs with known genomic location on autosomes, with a minor allele frequencies of > 5%, that were missing in less than 10% of bulls were considered. We then filtered for all SNPs for which the ancestral state of the allele was reported by . The final dataset contained 34,851 SNPs. Haplotypes were inferred with fastPHASE, version 1.2 . Parameters in fastPHASE were set to 10 random starts for the EM algorithm and 10 clusters. Haplotypes were inferred for whole chromosomes ignoring pedigree information.
Detection of Selection Signatures
Since standardisation is based on the frequency of the derived allele this sets an upper limit to the age of the mutation. This test statistic answers the question of how unusual the length of a haplotype is, assuming the same age of allele across all observed selection coefficients acting on any core SNP with a similar derived allele frequency in the genome. It therefore does not provide a formal test of significance. Furthermore if different outgroups are used to define ancestral and derived states this sets different age boundaries to the mutations resulting in less precise standardisation.
A locus specific permutation-based iHS
When the rate of EHH decay is similar for the ancestral and derived allele, as expected for a neutral locus, uiHS is ~ 0 .
Voight et al.  showed via simulation that extremely positive and negative iHS scores are both potentially interesting selection signals and polarisation with the ancestral allele results in a change of sign, but does not change the magnitude of the uiHS test statistic.
In the following we introduce a locus specific permutation based approach that relies on minor and major allele frequencies rather than ancestral and derived states, respectively. Most importantly this test statistic provides significance of deviations of uiHS from its neutral expectation.
Since the empirical mean of permuted iHS statistics is approximately 0 (see Additional file 1, Figure S1) our test is a formal test of significance, given the allele frequency of the core site and the LD structure in the surrounding region. This is a property of crucial importance of a test statistic, especially since we want to combine our results with an association test from a WGA study.
This final test statistic is approximately standard normally distributed.
Since no high resolution genetic map was available for the SNPs in this study, physical distances between SNPs were used for calculating all integrated haplotype scores.
Whole Genome Association Study
with , where β i is the effect of the i-th SNP (i from 1 to N),Var(β i ) the variance of the estimate and 0.456 the median of the distribution .
Recently, linear mixed models were proposed to effectively account for different levels of relatedness by incorporating pairwise genetic relatedness into the model . This approach relies on the fact that the phenotypes of two genetically related animals are more similar than those of genetically distant individuals. Estimation of covariance between individuals is assisted by the availability of a marker based kinship matrix, which can be estimated more accurately using genotype data from the WGA experiment than from pedigree information.
where y is a vector of sire EBVs for protein yield, X is the design matrix in which SNP genotypes were coded 0, 1 and 2, counting the number of minor alleles and b the vector of regression coefficients on recoded SNP genotypes. Z denotes the design matrix for random effects with a ~ N (0, G σa2) being the vector of polygenic effects, σa2 the additive genetic variance and G the genetic covariance matrix and e ~ N (0, I σe2), a vector of residual effects. G was obtained from pairwise identical by descent (IBD) estimates using genome wide SNP data as implemented in PLINK , in which the IBD state is estimated by a hidden Markov model, given the observed identity by state (IBS) sharing and genome wide levels of relatedness between the pairs. Diagonal elements of G were calculated as 1+F, with F being the inbreeding coefficient estimated from SNP data using PLINK .
Mixed models were solved in R (http://www.cran.r-project.org) via direct matrix inversion. Empirical P- values were calculated by an adaptive permutation procedure, shuffling the vector of genotype codes among phenotypes. This does not destroy the relationship between IBD status and phenotypes, but breaks up any association between SNP genotypes and phenotypes. This leaves LD patterns unperturbed and hence does not control for stratification. The number of permutations was sequentially increased up to 1 × 106 permutations if the SNP indicated association. The empirical P- value was calculated as the number of test statistics obtained on permuted sets being greater than or equal to the observed test statistic.
All 34,851 SNPs were tested one after the other for association with the protein yield (PY) phenotype.
where gtijk is the recoded genotype code 0, 1 and 2, counting the number of minor alleles, sire is the fixed effect of sire i aand MGS the fixed effect of maternal grandsire j and εijk~ N (0, Iσe2), the vector of random residual effects. Sire- and maternal grandsire families smaller than five were merged into one group.
Residuals ε ijik were used instead of raw recorded genotypes (0, 1 and 2) in the design matrix X of equation (1), henceforth termed method "MIXStrat".
Evaluation of WGA via Monte Carlo Simulation
The proposed method to account for stratification is specific to situations typically observed in intensively selected livestock species and populations. We evaluated the effectiveness of MIXStrat by Monte Carlo simulations. Phenotypes, sire- and maternal grandsire family structure were taken from the population under consideration. Genotypes for 287 bulls and 10,000 diallelic sites were sampled based on the following procedure:
First, the allele frequency p of the first allele at a SNP was drawn from a uniform distribution, the allele frequency for the second allele q at this SNP is then given by q = 1-p. Two alleles each were sampled for all sires and maternal grandsire according to these frequencies. Bulls inherited sire and maternal grandsire alleles following Mendelian rules. Alleles inherited via the dam were sampled corresponding to the population allele frequencies. This simulates the null model (e.g. no effect of the locus on the phenotype) taking into consideration the observed population structure. Association was tested for, using the models MIX and MIXStrat.
Rate of False Positives
with αBonf being the 5% Bonferroni – corrected type I error threshold of 2.5 × 10-5 and m being the number of random Monte Carlo repetitions.
Composite Test Combined Significance Test and False Discovery Rate
We used Stouffer's method  to combine P - Value s from the association study with those from the selection signature analysis (P COMB ).
where Z is the standard normal variable under H0, z(Pi) is the P - Value from test i transformed to Z and k is the number of tests that are combined in the test statistic. P - Values PCOMB were obtained using the quantile function of the standard normal distribution. The tail area based false discovery rate (FDR) was calculated from PCOMB values using the R package fdrtool, v1.2.5 . Significance was declared if the q value (FDR corrected P - Value) was < 0.10.
Evaluating the locus specific permutation of the iHS test statistic to detect signatures of selection and comparison to iHSVoight
We mapped selection signatures with iHSVoight and our newly proposed iHS to detect sites under selection.
Means and standard deviations (SD) in defined frequency bins for uncorrected integrated haplotype score (uiHS) test statistics to calculate iHSVoight.
Frequency of derived allele
0.1 - 0.2
0.2 - 0.3
0.3 - 0.4
0.4 - 0.5
0.5 - 0.6
0.6 - 0.7
0.7 - 0.8
0.8 - 0.9
For SNPs with low minor allele frequencies we found a relatively higher proportion of extreme unscaled iHS statistics. We postulate that this is due to increased rates of false positives, since power simulations by  and  show that iHSVoight is powerful for loci with intermediate allele frequencies and that the power of the test drops substantially when the selective sweep is close to fixation, in other words for SNPs with low MAF.
Our permutation based standardization allows a formal test against the null hypothesis of neutrality at a core SNP (expectation zero). Our standardization is against 1000 permuted test statistics at the same locus in the same LD background. We therefore do not need to define the state of ancestral and derived allele.
Additional file 1, Figure S2 shows a histogram of derived allele frequencies and Additional file 1, Figure S3 a histogram of minor allele frequencies of the 34,851 SNPs used in this study. Additional file 1, Figures S4 and S5 show histograms of P - Values for iHSVoight and iHS, respectively.
Detection of Selection Signatures in the Brown Swiss dairy cattle population
Manhattan plots for iHSVoight and iHS for each autosome except BTA 6 are shown in Additional file 2, Figure S6 - S33 plots A and B.
Among the 34,851 SNPs tested genome wide 1,710 and 1,621 SNPs had a test statistics > |1.96| with method iHSVoight and iHS, respectively.
Distribution among chromosomes is remarkably uneven: BTA 5, 6, 12, 19 harbor 148, 124, 98, 89 sites, respectively which corresponds to 8 - 11% of all investigated SNP on the corresponding chromosomes that show significance applying iHS. On other chromosomes, namely BTA 28 and 17 ~ 1% of investigated SNPs exhibit significant selection signatures.
The same is true for iHSVoight BTA 5, 6, 12, 16 and 19 have 171, 131, 148, 136 and 112 SNPs that show an iHSVoight test statistic > |1.96| which corresponds to 8 - 14% of all SNPs on these chromosomes. BTA 7, 25 and 27 have only around ~ 1% sites with extreme iHSVoight test statistics.
However, there is growing evidence for additional polymorphisms in the DGAT1 gene and its neighborhood that cause phenotypic variation for milk production traits eg [52, 53]. Of particular interest is a QTL mapping study in the German-Austrian-Italian BS population , that reported significant QTL for milk yield and protein percent in the DGAT1 region although all bulls in this study were shown to be homozyogous for the p.K232A polymorphism . This finding is supported by the large SNP effects estimated for fat and protein percent in the US - BS population (http://aipl.arsusda.gov/Report_Data/Marker_Effects/marker_effects.cfm?Breed=BS) albeit the near fixation of allele A in this breed. So it is likely that the selection signal that is picked up by iHS is not purely for the DGAT1 p.K232A polymorphism but for the proximal region of BTA14 as a whole including the VNTR polymorphism in the promoter region of the DGAT1 reported by .
Association Study on PY
We therefore developed a new strategy to reduce the number of erroneous association signals in our data (method MIXStrat). Both the quantile - quantile plot (Figure 9) as well as an inflation factor λ of 1.02 confirmed that the MIXStrat model successfully controlled for spurious results caused by stratification of our sample. Nevertheless, we also experienced a drop in power, as expected. The SNP with the smallest q value in method MIXStrat was 0.5639561 (tail-area based false discovery rate (FDR)) calculated with R package fdrtool, v1.2.5 , corresponding to a nominal P-value of 4.778973e-04. Note that the flattening out of the P -Value curve for method MIX is a consequence of the adaptive permutation procedure.
Evaluation of WGA via Monte Carlo Simulation
Computer simulations showed that using MIXStrat the sample size in this study is sufficient to only detect strong effects explaining at least 10% of the phenotypic variation. The Monte Carlo simulation did not account for LD because conservative significance thresholds using Bonferroni correction were used. Nevertheless, it assesses the influence of population substructuring in single SNP regression whole genome association studies. Our simulations show clearly that the sire-, paternal grandsire- and maternal grandsire structure in dairy cattle populations alone can create significant results without any association between genotype and phenotype.
Additional file 3, Figure S34 shows a histogram of allele substitution effects across all 34,851 SNPs tested.
Rate of False Positives
Results from power calculations of the Monte Carlo simulation; the underlying models of MIX and MIXStrat are described in the "Methods section" of the paper.
QTL size in EBV variance
power of MIXStrat
As further shown in Table 2 our dataset has sufficient statistical power to detect QTL explaining > 10% of the variance in EBV. Effect sizes of that magnitude are expected to be rare in livestock species . MIXStrat without integration of information on selection signatures has insufficient power to detect loci explaining only 1% of the variance.
Consensus of Selection Signature Signals and Association Signals
A positive iHS value indicates that the minor SNP allele, relative to the major allele, is associated with the larger integrated EHH statistic and was possibly selected for. Likewise the estimated regression coefficient in the association analysis (βMIXStrat) represents the estimated increase in trait value per additional copy of the minor allele. Thus alike signs of iHS test statistics and βMIXStrat indicate that the SNP is causative by itself or is in LD with a causative site that is under positive selection. Opposite signs of iHS and βMIXStrat may be observed when sites have pleiotropic effects and were selected on a different, possibly unobserved, trait. Generally one would expect to see a higher proportion of like signs as compared to opposite signs and a positive correlation coefficient for traits of major economic importance in the selection history of a breed.
Pearson correlation coefficients (95% confidence intervals) of different iHS statistics with regression coefficients from association study for protein yield.
SNPs with MAF < 10%
all (N= 4,387)
top 1% |iHS| (N = 42)
all (N = 34,851)
top 1% |iHS| (N = 349)
top 0.1% |iHS| (N = 35)
Combining Signatures of Selection with Association Tests
Additional file 2, Figures S6 - S33 show Manhattan plots for each of the bovine autosomes, combining model MIXstrat with iHSVoight (plot C) and MIXstrat with iHS (plot D). All Manhattan plots are annotated with selection signature signals among the top 5% found by  applying iHSVoight in windows of 500 kB in BS cattle (symbol o) and in any of the other breeds investigated, symbol (x). All plots are further annotated with QTL results reported from whole genome association studies in the cattle QTL database "Cattle QTLdb" . We downloaded the gff3 file for btau4 at [57, 58]http://www.animalgenome.org/cgi-bin/QTLdb/BT/download?file=gbpBTAU. QTL positions are annotated at the midpoints between start and end position of the reported QTL. QTL annotated outside the assembled bovine autosomes and in reverse direction (end position further distal than start of QTL) were filtered. Capital letters summarize QTL trait ontology classes: B for meat (beef) traits, E for exterior traits, H for health traits, M for milk traits, P for production traits, R for reproduction traits as classified at animalgenome.org
Only QTL annotated from WGA studies were considered, because of the large confidence intervals of QTL positions from linkage studies.
Additional file 3, Figure S35 shows a histogram of Stouffer's P - Values combining whole genome association results with model MIXstrat and iHSVoight while Additional file 3, Figure S36 a histogram of Stouffer's P - Values combining whole genome association results with model MIXstrat and iHS.
Hayes et al.  do not provide a supplemental table of iHS test statistics, we could therefore not annotate our Manhattan plots with their results. Nevertheless the topology of their Manhattan plot for BTA 6 is strikingly similar to our results and results reported by . When comparing plot C and D in more detail it becomes evident that combining iHSVoight and WGA results does not give as good agreement between the combined iHS and WGA test. This is supported by the lower correlation among the top 1% iHSVoight test statistics and regression coefficients from WGA (Table 3).
Mixed model and method to control for stratification
The pairwise IBD matrix obtained by PLINK  based on genome wide SNP data most likely underestimated the relatedness among bulls because the underlying algorithm estimates population allele frequencies from a presumably unrelated sample. This is supported by the observation that the average IBD estimate was exactly 0.254 between 795 paternal half-sib pairs and not, as expected, elevated due to underlying distant relatedness. Stich et al.  used SPAGEDI software  to estimate the IBD matrix and noted a similar problem. SPAGEDI also assumes that random pairs of individuals are unrelated and assigns them a kinship coefficient of zero.
The „Q+K" method, proposed by , is a mixed model with Q, a matrix containing population substructure to estimate v, the vector of population effects and the kinship matrix K, which allows estimation of polygenic background effects based on information on familial relatedness from recent coancestry. The authors claimed improved control of the type I and type II error rates over other methods.
Applying method MIX instead of a least squares allelic regression substantially reduced the inflation factor λ from 2.02 to 1.34 for PY. When we extended method MIX by Q, the matrix on population substructure based on clusters, estimated using the „pairwise population concordance" criteria , λ was further reduced to 1.16 (data not shown) but still did not control for all of the stratification. The here proposed method MIXStrat was able to remove stratification (λ = 1.02) and proved an advantage over method „Q+K".
The Monte Carlo simulation confirms that the proposed MIXStrat approach deals correctly with all stratification in the data, as under the simulated H0 the observed -log P - Value distribution follows their expectation for the dataset as highly substructured as dairy cattle. If our two-step approach had resulted in an overcorrection we would expect to see deflation in the quantile - quantile plot.
Detection of Selective Sweeps
Alleles under positive selection increase in frequency in a population and leave distinct signatures in the DNA sequence. One of these population-genetics based signatures is the increased length of the haplotype carrying the advantageous allele  which is caused by a rapid rise in frequency of the mutated allele. This creates temporary LD with nearby loci. Extended haplotype homozygosity statistics  contrast this signature between the ancestral and the derived allele at each locus.
The challenge is to determine whether a signature is due to selection or to confounding effects of population demographic history, such as bottlenecks, population expansions and population subdivision or simply due to drift in a finite population. Two striking bottlenecks were estimated by  in data from 14 European and African Bos taurus and Bos indicus cattle populations. The first and most prominent bottleneck occurred roughly 1,500 generations ago, which corresponds well with the time of domestication in cattle. The second less pronounced bottleneck, which occurred approximately 50 - 100 generations ago, is most likely caused by breed formation. We therefore expect substantial demographic noise in our set of selection signature test statistics. Furthermore consequent assortative mating is expected to leave signatures in the genome that can easily be mistaken as a signature of selection.
We mapped selection signatures with iHSVoight. Large negative values indicate regions in which newly derived alleles are increasing in frequency in the population. Large positive test statistics advocate so called soft sweeps, sweep from standing natural variation where the ancestral allele is increasing in frequency for iHSVoight. As changes in the selection regime of dairy cattle are well documented and make sweeps from standing genetic variation likely we believe that it is important to consider both extreme positive and negative iHS test statistics as potentially interesting regions in the cattle genome. We developed a permutation - based extension to the iHS statistic proposed by  for which there is no need to determine the ancestral and derived state of the alleles but contrasts minor and major allele. Our method obtains locus specific standard deviations of iHS in simulating the null hypothesis and contrasting against an expectation of zero. Compared to (iHSVoight)  our method is more conservative for loci with low minor allele frequencies. A higher correlation coefficient between our iHS and βMIXStrat indicates that this is a consequence of a decreased rate of false positive detections rather than reduced power. Despite successful selection signature scans in cattle we note that protein yield is a typical quantitative trait for which selection is essentially multigenic and therefore likely to undergo simultaneous selective sweeps. Chevin and Hospital  showed that for quantitative traits selection at specific quantitative trait loci may strongly vary in time and depend on the genetic background of the trait. This can blur the signature of selection and the corresponding region will go undetected in a genome scan . Given the long generation intervals in cattle the number of generations of intense artificial selection is still small which could result in weak selection signals for alleles with small effects. Selection signature mapping applied to livestock with similarly strong selection but shorter generation intervals could be even more powerful.
Method to combine Selection Signatures with Association signals
We propose a novel approach to increase the power to detect association signals. In this study the statistical power to detect an association signal was quite limited, but by combining two independent sources of information for QTL detection in genome wide studies: association and signatures of selection, we were able to increase power and to reduce the false positive rate. Loci that explain variation in economically important traits are likely under selection and will often show incomplete selective sweeps. Thus there is a good chance to observe extreme iHS values among loci that show association. This is supported by the positive correlation of 0.446 between βMIXStrat and iHS for loci among the top 1% iHS test statistics. Although many of the associations identified by our method are not yet confirmed, the concordance with prior results from WGA studies indicates that we were successful in detecting interesting loci. Fine mapping of QTL involves genotying of many more SNPs in the associated region possibly supported by resequencing a subset of extreme individuals  and is often tedious and costly. Thus it is highly desirable to eliminate false positive associations prior to further investigations.
Our combined approach has highest power at intermediate allele frequencies, as both independent sources of information (selection signature mapping and WGA) have highest power at intermediate allele frequencies. Alleles that are not allowed to go to fixation are either likely to be under balancing selection (heterozygote advantage) or have pleiotropic effects with positive and negative effects for the traits under selection. Such loci are not expected to show a signature of recent positive selection. WGA, given the same size of effect, will have equal power to identify such loci and loci under positive selection.
The combination of WGA with hitchhiking mapping to identify a bona fide set of SNPs for candidate gene studies is very promising. We argue that our method improves power of QTL detection and reduces type I error rate by combining two independent sources of information. Our approach can of course be extended to all routinely recorded phenotypes, but for a proof of principle we restricted our analyses to PY as this trait was under most stringent selection over the last couple of decades and the bulls were selectively genotyped for PY to increase power for the whole genome association study.
Stratification is a substantial problem in WGA studies, particularly when carried out in livestock populations. Our MIXStrat approach controls the type I error rate, however at the cost of reduced power.
We accomplished a whole genome hitchhiking mapping study and identified roughly 1,600 SNPs displaying selection signatures that show generally good accordance with effects estimated in the WGA study. Our extension to the iHS test statistic proposed by  resulted in a reduced false positive rate in the MAF class < 10%, however, it provides reliable P - Values only after extensive Monte Carlo simulations.
Given the substantial increase in power and the reduction in false positive signals we recommend using our combined strategy rather than stand alone WGA. This is especially important in small populations where it is not possible to genotype additional animals.
These studies were internally funded by the Technische Universität München. We thank the following artificial insemination stations for providing us with semen samples: Besamungstation J. Bauer GmbH & Co. KG, Besamungsverein Neustadt a. d. Aisch e. V. Meggle Besamungsstation Rottmoos GmbH, Niederbayerische Besamungsgenossenschaft Landshut-Pocking e. G., Prüf- und Besamungsstation München-Grub e. V., Rinderbesamungsgenossenschaft Memmingen e. G., Besamungsstation Birkenberg, Besamungsanstalt Gleisdorf, Oberösterreichische Besamungsstation, NÖ-Genetik Wieselburg. We thank T. Meitinger and P. Lichtner from Institute of Human Genetics from Helmholtz Zentrum München for generating and validating genotypes. MD was supported by Austrian Science Fund (FWF): project number L403-B11 to CS. We thank three anonymous reviewers for helpful comments and criticisms on earlier versions of this manuscript. We thank R. Emmerling from the Bavarian State Research Center for Agriculture (LfL) in Poing-Grub, Germany for the provision of EBVs.
- Maynard Smith J, Haith J: The hitch-hiking effect of a favourable gene. Genet Res. 1974, 23: 23-35. 10.1017/S0016672300014634.View Article
- Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2001, 12: 1805-14.View Article
- Kauer MO, Dieringer D, Schloetterer C: A Microsatellite Variability Screen for Positive Selection Associated With the "Out of Africa" Habitat Expansion of Drosophila melanogaster. Genetics. 2003, 165: 1-11.
- Kimura R, Fujimoto A, Tokunaga K, Ohashi J: A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS ONE. 2007, 2: e286-10.1371/journal.pone.0000286.PubMed CentralView ArticlePubMed
- Payseur BA, Cutter AD, Nachman MW: Searching for Evidence of Positive Selection in the Human Genome Using Patterns of Microsatellite Variability. Mol Biol Evol. 2002, 19: 1-7. 10.1093/oxfordjournals.molbev.a003973.View Article
- Sabeti PC, Reich DE, Higgins JM, Levine HZP, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.View ArticlePubMed
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie XH, Byrne EH, McCarroll SA, Gaudet R: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449: 913-918. 10.1038/nature06250.PubMed CentralView ArticlePubMed
- Schofl G, Schloetterer C: Patterns of Microsatellite Variability Among X Chromosomes and Autosomes Indicate a High Frequency of Beneficial Mutations in Non-African D. simulans. Mol Biol Evol. 2004, 21: 1-7.View Article
- Voight BF, Kudaravalli S, Wen XQ, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biology. 2006, 4: e72-10.1371/journal.pbio.0040072.PubMed CentralView ArticlePubMed
- Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, Gaut BS: The Effects of Artificial Selection on the Maize Genome. Science. 2005, 308: 1310-1314. 10.1126/science.1107891.View ArticlePubMed
- Hayes BJ, Lien S, Nilsen H, Olsen HG, Berg P, Maceachern S, Potter S, Meuwissen TH: The origin of selection signatures on bovine chromosome 6. Anim Genet. 2008, 39: 105-111. 10.1111/j.1365-2052.2007.01683.x.View ArticlePubMed
- Prasad A, Schnabel RD, McKay SD, Murdoch B, Stothard P, Kolbehdari D, Wang Z, Taylor JF, Moore SS: Linkage disequilibrium and signatures of selection on chromosomes 19 and 29 in beef and dairy cattle. Anim Genet. 2008, 39: 597-605. 10.1111/j.1365-2052.2008.01772.x.PubMed CentralView ArticlePubMed
- Barendse W, Harrison B, Bunch R, Thomas M, Turner L: Genome wide signatures of positive selection: The comparison of independent samples and the identification of regions associated to traits. BMC Genomics. 2009, 10: 178-10.1186/1471-2164-10-178.PubMed CentralView ArticlePubMed
- Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, Green RD, Hamernick DL, Kappes SM, Lien S: Genome wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science. 2009, 324: 528-532.View ArticlePubMed
- Hayes BJ, Chamberlain AJ, Maceachern S, Savin K, McPartlan H, MacLeod I, Sethuraman L, Goddard ME: A genome map of divergent artificial selection between Bos taurus dairy cattle and Bos taurus beef cattle. Anim Genet. 2009, 40: 176-84. 10.1111/j.1365-2052.2008.01815.x.View ArticlePubMed
- Flori L, Fritz S, Jaffrezic F, Boussaha M, Gut I, Heath S, Foulley JL, Gautier M: The Genome Response to Artificial Selection: A Case Study in Dairy Cattle. PLoS ONE. 2009, 4: e6595-10.1371/journal.pone.0006595.PubMed CentralView ArticlePubMed
- Gautier M, Flori L, Riebler A, Jaffrezic F, Laloe D, Gut I, Moazami-Goudarzi K, Foulley JL: A whole genome Bayesian scan for adaptive genetic divergence in West African cattle. BMC Genomics. 2009, 10: 550-10.1186/1471-2164-10-550.PubMed CentralView ArticlePubMed
- Qanbari S, Pimentel ECG, Tetens J, Thaller G, Lichtner P, Sharifi AR, Simianer H: A genome-wide scan for signatures of recent selection in Holstein cattle. Anim Genet. 2010, 41: 377-389.PubMed
- Gautier M, Naves M: Footprints of selection in the ancestral admixture of a New World Creole cattle breed. Mol Ecol. 2011, 20: 3128-3143. 10.1111/j.1365-294X.2011.05163.x.View ArticlePubMed
- Tang K, Thornton KR, Stoneking M: A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome. PLoS Biol. 2007, 5: e171-10.1371/journal.pbio.0050171.PubMed CentralView ArticlePubMed
- Qanbari S, Gianola D, Hayes B, Schenkel F, Miller S, Moore S, Thaller G, Simianer H: Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle. BMC Genomics. 2011, 12: 318-10.1186/1471-2164-12-318.PubMed CentralView ArticlePubMed
- Fisher RA: The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc Edin. 1918, 52: 399-433.View Article
- Sax K: The Association of Size Differences with Seed-Coat Pattern and Pigmentation in PHASEOLUS VULGARIS. Genetics. 1923, 8: 552-560.PubMed CentralPubMed
- Hu ZL, Reecy JM: Animal QTLdb: beyond a repository. A public platform for QTL comparisons and integration with diverse types of structural genomic information. Mamm Genome. 2007, 18: 1-4. 10.1007/s00335-006-0105-8.View ArticlePubMed
- Khatkar MS, Thomson PC, Tammen I, Raadsma HW: Quantitative trait loci mapping in dairy cattle: review and meta-analysis. Genet Sel Evol. 2004, 36: 163-190. 10.1186/1297-9686-36-2-163.PubMed CentralView ArticlePubMed
- Polineni P, Aragonda P, Xavier SR, Furuta R, Adelson DL: The Bovine QTL Viewer: A Web Accessible Database Of Bovine Quantitative Trait Loci. BMC Bioinformatics. 2006, 7: 283-10.1186/1471-2105-7-283.PubMed CentralView ArticlePubMed
- Daetwyler HD, Schenkel FS, Sargolzaei M, Robinson JAB: A Genome Scan to Detect Quantitative Trait Loci for Economically Important Traits in Holstein Cattle Using Two Methods and a Dense Single Nucleotide Polymorphism Map. J Dairy Sci. 2008, 91: 3225-3236. 10.3168/jds.2007-0333.View ArticlePubMed
- Pausch H, Flisikowski K, Jung S, Emmerling R, Edel C, Gotz KU, Fries R: Genomewide Association Study Identifies Two Major Loci Affecting Calving Ease and Growth Related Traits in Cattle. Genetics. 2010, 187: 289-97.View ArticlePubMed
- Pryce JE, Bolormaa S, Chamberlain AJ, Bowman PJ, Savin K, Goddard ME, Hayes BJ: A validated genome-wide association study in 2 dairy cattle breeds for milk production and fertility traits using variable length haplotypes. J Dairy Sci. 2010, 93: 3331-3345. 10.3168/jds.2009-2893.View ArticlePubMed
- Goddard ME, Hayes BJ: Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet. 2009, 10: 381-391. 10.1038/nrg2575.View ArticlePubMed
- MacLeod IM, Hayes BJ, Savin KW, Chamberlain AJ, McPartlan HC, Goddard ME: Power of a genome scan to detect and locate quantitative trait loci in cattle using dense single nucleotide polymorphisms. J Anim Breed Genet. 2010, 127: 133-142. 10.1111/j.1439-0388.2009.00831.x.View ArticlePubMed
- Grossman SR, Shylakhter I, Karlsson EK, Byrne EH, Morales S, Frieden G, Hostetter E, Angelino E, Garber M, Zuk O, et al: A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection. Science. 2010, 327: 883-886. 10.1126/science.1183863.View ArticlePubMed
- Akey JM, Ruhe AL, Akey DT, Wong AK, Connelly CF, Madeoy J, Nicholas TJ, Neff MW: Tracking footprints of artificial selection in the dog genome. PNAS. 2010, 107: 1160-5. 10.1073/pnas.0909918107.PubMed CentralView ArticlePubMed
- Ayodo G, Price AL, Keinan A, Ajwang A, Otieno MF, Orago ASS, Patterson N, Reich D: Combining Evidence of Natural Selection with Association Analysis Increases Power to Detect Malaria-Resistance Variants. Am J Hum Genet. 2007, 81: 234-242. 10.1086/519221.PubMed CentralView ArticlePubMed
- Darvasi A, Soller M: Selective genotyping for determination of linkage between a marker locus and a quantitative trait locus. Theor Appl Genet. 1992, 85: 353-359.PubMed
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMed CentralView ArticlePubMed
- Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O'Connell J, Moore SS, Smith TPL, Sonstegard TS, et al: Development and Characterization of a High Density SNP Genotyping Assay for Cattle. PLoS ONE. 2009, 4: e5350-10.1371/journal.pone.0005350.PubMed CentralView ArticlePubMed
- Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78: 629-644. 10.1086/502802.PubMed CentralView ArticlePubMed
- Aulchenko YS, de Koning D-J, Haley C: Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method for Genomewide Pedigree- Based Quantitative Trait Loci Association Analysis. Genetics. 2007, 177: 1-9.View Article
- Yu J, Pressoir G, Briggs WH, Vroh B, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, et al: A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006, 38: 203-208. 10.1038/ng1702.View ArticlePubMed
- Amin N, van Duijn CM, Aulchenko YS: A genomic background based method for association analysis in related individuals. PLoS ONE. 2007, 2: e1274-10.1371/journal.pone.0001274.PubMed CentralView ArticlePubMed
- Lynch M, Walsh B: Genetics and Analysis of Quantitative Traits. 1997, Sinauer Assoc., Sunderland
- Stouffer SA, Suchman EA, DeVinney LC, Star SA, Williams RM: The American Soldier, Vol.1: Adjustment during Army Life. 1949, Princeton (NJ): Princeton University Press
- Strimmer K: fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics. 2008, 24: 1461-1462. 10.1093/bioinformatics/btn209.View ArticlePubMed
- Huff CD, Harpending HC, Rogers AR: Detecting positive selection from genome scans of linkage disequilibrium. BMC Genomics. 2010, 11: 8-10.1186/1471-2164-11-8.PubMed CentralView ArticlePubMed
- Banos G, Woolliams JA, Woodward BW, Forbes AB, Coffey MP: Impact of Single Nucleotide Polymorphisms in Leptin, Leptin Receptor, Growth Hormone Receptor, and Diacylglycerol Acyltransferase (DGAT1) Gene Loci on Milk Production, Feed, and Body Energy Traits of UK Dairy Cows. J Dairy Sci. 2008, 91: 3190-3200. 10.3168/jds.2007-0930.View ArticlePubMed
- Grisart B, Coppieters W, Farnir F, Karim L, Ford C, Berzi P, Cambisano N, Mni M, Reid S, Simon P, et al: Positional candidate cloning of a QTL in dairy cattle: identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition. Genome Res. 2002, 12: 222-231. 10.1101/gr.224202.View ArticlePubMed
- Kaupe B, Winter A, Fries R, Erhardt G: DGAT1 polymorphism in Bos indicus and Bos taurus cattle breeds. J Dairy Res. 2004, 71: 182-187. 10.1017/S0022029904000032.View ArticlePubMed
- Naslund J, Fikse WF, Pielberg GR, Lunden A: Frequency and Effect of the Bovine Acyl-CoA:Diacylglycerol Acyltransferase 1 (DGAT1) K232A Polymorphism in Swedish Dairy Cattle. J Dairy Sci. 2008, 91: 2127-2134. 10.3168/jds.2007-0330.View ArticlePubMed
- Winter A, Kramer W, Werner FA, Kollers S, Kata S, Durstewitz G, Buitkamp J, Womack JE, Thaller G, Fries R: Association of a lysine-232/alanine polymorphism in a bovine gene encoding acyl-CoA:diacylglycerol acyltransferase (DGAT1) with variation at a quantitative trait locus for milk fat content. PNAS. 2002, 99: 9300-9305. 10.1073/pnas.142293799.PubMed CentralView ArticlePubMed
- Scotti E, Fontanesi L, Schiavini F, La Mattina V, Bagnato A, Russo V: DGAT1 p.K232A polymorphism in dairy and dual purpose Italian cattle breeds. Ital J Anim Sci. 2010, DOI: 10.4081/ijas.2010.e16
- Fontanesi L, Scotti E, Pecorari D, Zambonelli P, Bigi D, Dall'Olio S, Davoli R, Lipkin E, Soller M, Russo V: The BovMAS Consortium: investigation of bovine chromosome 14 for quantitative trait loci affecting milk production and quality traits in the Italian Holstein Friesian breed. Ital J Anim Sci. 2010, DOI: 10.4081/ijas.2005.2s.16
- Kuhn C, Thaller G, Winter A, Bininda-Emonds OR, Kaupe B, Erhardt G, Bennewitz J, Schwerin M, Fries R: Evidence for multiple alleles at the DGAT1 locus better explains a quantitative trait locus with major effect on milk fat content in cattle. Genetics. 2004, 167: 1873-1881. 10.1534/genetics.103.022749.PubMed CentralView ArticlePubMed
- Bagnato A, Schiavini F, Rossoni A, Maltecca C, Dolezal M, Medugorac I, Soelkner J, Russo V, Fontanesi L, Friedmann A, et al: Quantitative trait loci affecting milk yield and protein percentage in a three-country Brown Swiss population. J Dairy Sci. 2008, 91: 767-783. 10.3168/jds.2007-0507.View ArticlePubMed
- Bagnato A, Schiavini F, Dolezal M, Dubini S, Rossoni A, Maltecca C, Santus E, Medugorac I, Soelkner J, Fontanesi L, et al: The BovMAS Consortium: identification of QTL for milk yield and milk protein percent on chromosome 14 in the Brown Swiss breed. Ital J Anim Sci. 2010, DOI: 10.4081/ijas.2005.2s.13
- Hayes BJ, Goddard ME: The distribution of the effects of genes affecting quantitative traits in livestock. Gen Sel Evol. 2001, 33: 209-229. 10.1186/1297-9686-33-3-209.View Article
- Hu ZL, Park CA, Fritz ER, Reecy JM: QTLdb: A comprehensible database tool building bridges between genotypes and phenotypes. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany 2010. Edited by: German Society for Animal Science. 2010, [http://www.kongressband.de/wcgalp2010/assets/html/0017.htm]
- Cattle QTLdb. 2011, [http://www.animalgenome.org/cgi-bin/QTLdb/BT/index]
- Sodeland M, Grove H, Kent M, Taylor S, Svendsen M, Hayes BJ, Lien S: Molecular characterization of a long range haplotype affecting protein yield and mastitis susceptibility in Norwegian Red cattle. BMC Genet. 2011, 12: 70-PubMed CentralView ArticlePubMed
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al: PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMed CentralView ArticlePubMed
- Stich B, Mohring J, Piepho H-P, Heckenberger M, Buckler ES, Melchinger AE: Comparison of Mixed-Model Approaches for Association Mapping. Genetics. 2008, 178: 1745-1754. 10.1534/genetics.107.079707.PubMed CentralView ArticlePubMed
- Hardy OJ, Vekemans X: SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes. 2002, 2: 618-620. 10.1046/j.1471-8286.2002.00305.x.View Article
- Gautier M, Faraut T, Moazami-Goudarzi K, Navratil V, Foglio M, Grohs C, Boland A, Garnier J-G, Boichard D, Lathrop GM, et al: Genetic and Haplotypic Structure in 14 European and African Cattle Breeds. Genetics. 2007, 177: 1059-1070. 10.1534/genetics.107.075804.PubMed CentralView ArticlePubMed
- Chevin LM, Hospital F: Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics. 2008, 180: 1645-1660. 10.1534/genetics.108.093351.PubMed CentralView ArticlePubMed
- Donnelly P: Progress and challenges in genome-wide association studies in humans. Nature. 2008, 456: 728-731. 10.1038/nature07631.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.