Selection for complex traits leaves little or no classic signatures of selection
© Kemper et al.; licensee BioMed Central Ltd. 2014
Received: 3 January 2014
Accepted: 20 March 2014
Published: 28 March 2014
Skip to main content
© Kemper et al.; licensee BioMed Central Ltd. 2014
Received: 3 January 2014
Accepted: 20 March 2014
Published: 28 March 2014
Selection signatures aim to identify genomic regions underlying recent adaptations in populations. However, the effects of selection in the genome are difficult to distinguish from random processes, such as genetic drift. Often associations between selection signatures and selected variants for complex traits is assumed even though this is rarely (if ever) tested. In this paper, we use 8 breeds of domestic cattle under strong artificial selection to investigate if selection signatures are co-located in genomic regions which are likely to be under selection.
Our approaches to identify selection signatures (haplotype heterozygosity, integrated haplotype score and F ST ) identified strong and recent selection near many loci with mutations affecting simple traits under strong selection, such as coat colour. However, there was little evidence for a genome-wide association between strong selection signatures and regions affecting complex traits under selection, such as milk yield in dairy cattle. Even identifying selection signatures near some major loci was hindered by factors including allelic heterogeneity, selection for ancestral alleles and interactions with nearby selected loci.
Selection signatures detect loci with large effects under strong selection. However, the methodology is often assumed to also detect loci affecting complex traits where the selection pressure at an individual locus is weak. We present empirical evidence to suggests little discernible ‘selection signature’ for complex traits in the genome of dairy cattle despite very strong and recent artificial selection.
Evolutionary change in a population, in response to a change in environment, consists of an increase in the frequency of favourable mutations. If the mutation was recent and the selection is strong, all alleles on the same chromosome segment as the mutant allele will increase in frequency by hitchhiking, generating a characteristic selection sweep or selection signature . On the other hand, if selection at individual loci is weak or if the mutation is old, and therefore part of the standing variation when selection commences, little evidence of the selection may be left in the genome e.g. . Many statistics have been proposed to detect signatures of selection but they all suffer from a severe problem – the distribution of the statistic under the null hypothesis of no selection is usually unknown. This is because the distribution depends on the demography of the population, including changes in effective population size and migration, which are difficult to define. Consequently, no formal test that a statistic comes from the null distribution is possible. Generally, the most extreme values of the statistic are simply assumed to be due to selection and there have been many papers claiming to find evidence for signatures of selection. The evidence for selection sweeps at a small number of loci, such as for lactase persistence in humans  and skin wrinkling in Shar-Pei dogs , is well documented and convincing, but in other cases it is hard to evaluate the strength of evidence. Certainly the evidence and persuasiveness of authors advocating adaptation via standing polymorphisms is increasing [5–7] and the influential paradigm of ‘hard sweep’ selection signatures is beginning to lose favour as the primary mechanism of adaptation .
In this study we have taken a different approach – we study sites in the genome at which we know selection has occurred to see if a signature of selection has been left behind. By studying a variety of selected loci, we are able to describe when a selection signature is generated and when it is not. Domestic cattle have been under quite strong, recent and well documented selection for several traits and hence their genomes should contain evidence of this selection. We use 8 domestic Bos taurus cattle breeds and three types of loci which have been under selection: type 1 loci are genes that are part of the definition of a breed, such as absence of horns and coat colour; type 2 loci have a large effect on quantitative traits, such as stature and milk yield, and type 3 loci are quantitative trait loci (QTL) for milk production traits in dairy cattle. We consider two statistics that indicate selection signatures within a breed and F ST (which indicates a difference between breeds in a segment of the genome that could be caused by different selection histories between the breeds). Our results show clear signatures of selection when intense selection has been applied to a single locus because it causes a trait defining the breed such as coat colour. However, we find weak evidence for selection signatures at regions of the genome associated with complex traits under selection. This paper calls into question the reliability of selection signatures to identify mutations affecting complex traits under selection and provides empirical evidence for the ability to generate substantial genetic change between populations in complex traits without clear evidence for classic selection signatures.
The dataset consists of 23,641 domestic cattle with > 610,123 (real or imputed) genome-wide autosomal SNP from 8 B. taurus breeds. Breeds were of European origin and have had previous, recent selection for milk (Holstein, Jersey) or meat (Angus, Charolais, Hereford, Limousin, Murray Grey, Shorthorn) production. There were between of 61 (Limousin) and 13,501 (Holstein) animals genotyped per breed.
Three statistics were calculated to test for evidence of selection: a modified version of Depaulis-Veuille’s H-test (referred to as haplotype homozygosity, HAPH) , the integrated haplotype score (|iHS|) , and Wright’s measure of population differentiation (F ST ). The measure of haplotype homozygosity (HAPH) measures selection within breed and is defined as the variance of haplotypes frequencies at a particular position in the genome, i.e. where p i is the (within breed) frequency of the i th haplotype and N is the total number of haplotypes at the position. The haplotypes consist of 30 or 31 consecutive SNPs. This statistic is high if one or more haplotypes are at high frequency while most haplotypes exist at low frequency. Similarly, |iHS| identifies within breed selection and SNP where one allele is found on one or few long haplotypes whereas the other allele is associated with many haplotypes. Both HAPH and |iHS| are efficient for identification of sweeps which have not yet reached fixation, an essential feature for an association with type 3 loci (i.e. genomic regions with segregating mutations for complex traits under selection). In contrast, the F ST measurement is most efficient when there are large allele frequency differences between pairs of breeds. Selection is indicated by high values of F ST near the selected mutations because, for example, a population in which selection has taken place is expected to differ from other populations (that have not undergone the same selection) in the allele frequency for markers near the mutation.
The 3 measures of selection were calculated in 250 kb windows across the genome, where the value for each window was the mean HAPH, the maximum observed |iHS| or the average between breed F ST . To correct for average differences within and between breeds for HAPH and F ST , the values are standardised by dividing the window value by the mean value for all windows. Consequently the standardised estimates of selection have a mean of 1. |iHS| was calculated following , and is thus standardised such that |iHS| can be interpreted as standard deviations from the mean. The estimates (per window) of HAPH, |iHS| and breed comparisons for F ST are given in Additional file 1 (where Additional file 2 provides definitions of the columns for Additional file 1). We examined the 5% of the genome with the strongest evidence for selection.
Breed-defining loci (type 1) and large effect QTL (type 2) were identified from the literature and the Online Inheritance in Animals database . For type 3 loci, we used the Holstein and Jersey breeds to identify QTL regions in the genome for milk production traits using the ‘genomic selection’ methodology . These two breeds have been under strong selection for milk production for at least the last 100 years  and especially since the 1970’s (Additional file 3: Figure S1-S3). In genomic selection, the prediction of genetic merit is a linear regression in which each SNP genotype is multiplied by the estimated effect of a SNP and summed to yield an estimated breeding value (EBV) for the animal. In our case, we want to attach variation in the trait to each chromosome segment. Thus we estimated the effect of each SNP using the genomic selection methodology and then calculated the variance across animals for a local 250 kb EBV e.g. . The 5% of windows with the highest variance were considered to have QTL and defined as type 3 loci.
Description of (type 1) loci with a large effects on breed-defining traits, such as coat colour, in in domestic cattle and likely to be segregating in our populations
Determines the presence and absence of horns. Two identified alleles: P C (Celtic-origin) a 212 bp insertion-deletion at 1.706 Mbp; and P F (Holstein Friesian-origin) which segregates as a 260 kb haplotype (from 1.649 – 1.989 Mbp) in Holstein and Jersey [18, 19]. No known associated gene. Most domestic cattle are horned but Angus and Murray Grey breeds are exclusively polled and the POLLED locus segregates in other breeds.
BTA18 14.75 Mbp
The main determinant of coat colour in cattle . Two identified alleles: E D (p.L99P) which produces a black coat; and e (inducing a premature stop codon) which is recessive produces a red coat when homozygous .
BTA5 57.67 Mbp
BTA6 71.85 Mbp
Locus associated with piebald colour in Hereford  and degree of white-spotting in Holstein . No known causative mutations but the different coat colour patterns in these breeds, suggests different KIT mutations.
BTA5 18.34 Mbp
Evidence for within and between breed selection at breed-defining (type 1) loci
Evidence for selection*
Differentiation between breeds3
1. Holstein with Angus, Murray Grey and Limousin
1. Breeds with black (E D ) allele (Holstein, Angus, Murray Grey) with breeds with recessive red (e) allele (Charolais, Limousin, Shorthorn, Hereford)
Additional file 3: Figure S4
2. Jersey (E+ allele) with all other breeds, except Hereford
1. Charolais with all other breeds
Additional file 3: Figure S5
2. Murray Grey with all breeds, excluding Jersey
3. Shorthorn and Jersey
1. Hereford with all other breeds.
Additional file 3: Figure S6
2. Holstein with all breeds, except Jersey
3. Shorthorn with all breeds, except Jersey
4. Jersey with Angus, Charolais and Limousin
1. Hereford will all other breeds, except Murray Grey
Additional file 3: Figure S7
2. Murray Grey and Charolais with each other, and with Holstein, Angus and Limousin
3. Shorthorn with Augus
The observed frequency of the selected haplotype played an important role in determining the ability of the three test statistic to indicate selection. At POLLED, for example, neither HAPH nor |iHS| indicated evidence of within breed selection in Murray Grey despite all animals of this breed being polled. This is because this region is homozygous in Murray Grey and neither of these statistics indicates selection in homozygous regions, being either undefined (|iHS|) or with values close to zero (HAPH). Further at PMEL, long selected haplotypes were indicated by HAPH and F ST in Murray Grey but there was no |iHS| selection signature near the locus. The results show that F ST is most efficient when the region is near fixation (homozygous) in alternate breeds, |iHS| is most efficient for intermediate frequency (or segregating) variants  and HAPH is midway between the two measures.
The mode of action and favoured phenotype also determined if loci indicated selection. In Shorthorn, for example, there was no within breed selection signature near KITLG despite a roan coat (where white hairs are intermingled with coloured hairs) being a characteristic of this breed . This can be explained by balancing selection, where heterozygotes express the roan phenotype and homozygotes have either a solid coloured or white coat, which would not be efficiently detected by any method. There was also evidence for a within breed selection near KITLG in Hereford. Herefords do not have a roan phenotype and, considering results in Fleckvieh cattle , this may indicate that a KITLG mutation contributes to the characteristic white spotting pattern seen in Hereford and Fleckvieh.
Description of (type 2) loci with large effects on complex traits under selection in domestic cattle, such as milk and meat yield, and likely to be segregating in our populations
BTA14 25.00 Mbp
Region affecting many traits, including stature  and fertility . Originally identified in Jersey-Holstein cross, Jersey are thought to be near fixation for the ancestral allele while Holstein and other breeds are near fixation for the alternate allele [29, 32].
BTA14 1.80 Mbp
Dinucleotide substitution causing a lysine to alanine substitution (p.K232A) , where the mutant A allele decreases fat yield, and increases protein yield and milk volume [34, 35]. The mutant DGAT A allele is at high frequency or fixed in Hereford, Angus and Charolais; and at lower frequencies in Holstein and Jersey .
BTA20 32.05 Mbp
A SNP mutation causing a missense phenylalanine to tyrosine substitution (p.F279Y). Effects on milk volume and composition .
BTA6 37.97 Mbp
A SNP mutation causes a missense tyrosine to serine (p.Y581S) mutation which increases milk yield and decreases milk solids . Identified in Israeli Holsteins where the frequency of the ABCG2 C allele had increased in response to selection for milk yield and then decreased when selection changed to focus on increased milk solids . The ABCG2 C allele is at low frequencies (< 10%) in US and German Holsteins, Angus, British Frisian, Charolais and Hereford .
BTA2 6.22 Mbp
A negative regulator of muscle development, multiple mutations have been described that cause ‘double muscling’ or extreme muscular hypertrophy [32, 39, 40]. In Limousin, a mutation associated with a mild increase in muscling, F94L, has been identified .
Evidence for selection and quantitative trait loci (QTL) at major loci affecting complex traits (type 2 loci)
Evidence for selection*
Evidence for dairy QTL**
Differentiation between breeds3
1. Jersey with all other breeds
2. Limousin with all breeds, except Hereford
3. Hereford with all breeds, except Limousin and Angus
4. Murray Grey with all breeds, except Shorthorn and Holstein
1. Holstein or Jersey with Charolais, Limousin, Hereford and Shorthorn
Holstein and Jersey: Milk yield, fat yield, protein yield, FPC and PPC.
Additional file 3: Figure S8
2. Murray Grey with Hereford
1. Holstein with Jersey, Charolais & Limousin
Holstein: Milk yield, fat yield, protein yield, FPC and PPC.
Additional file 3: Figure S9
2. Angus with Jersey, Charolais & Murray Grey
Jersey: Milk yield, FPC and PPC.
3. Jersey with Holstein, Angus & Shorthorn
1. All contrasts between Jersey, Hereford and Charolais
Holstein: Fat yield, protein yield and PPC.
Additional file 3: Figure S10
1. Limousin with all other breeds
Additional file 3: Figure S11
Aligning selection signatures and QTL in dairy cattle was also not always straight forward. Sometimes this was because alleles did not segregate within the dairy breeds and sometimes because recent selection was for the ancestral (rather than the derived) allele. For example, there was no stature QTL for Holstein or Jersey near PLAG1 because Jerseys have a high frequency of the ancestral allele and Holstein have a high frequency of the (proposed) mutant allele . Further, our QTL results confirm the segregation of the DGAT1 mutation in both dairy breeds (Jersey and Holstein) but DGAT1 showed within breed selection signatures only in the beef breeds. It is possible that selection some time ago was for the mutant allele (in both dairy and beef cattle) because it increased milk volume but more recent selection in Jersey and Holstein has been for the ancestral allele because it increases milk fat. Thus the recent selection in dairy breeds is not detected within either Jerseys or Holsteins because selection has been for the ancestral allele which is likely to be carried on a variety of haplotype backgrounds and so is unlikely to show a discernible selection signature.
Type 3 loci are regions of the genome which show genetic variation in Holstein and Jersey cattle for 7 different production traits (fat, milk and protein yield; stature; fertility; and percentage of fat and protein in milk). Most of these traits have been under strong recent selection (Additional file 3: Figure S1-S3). We used a chi-squared test to investigate if there was greater overlap, than expected by chance, between the windows identified as containing QTL (i.e. type 3 loci, top 5% of windows with the highest variance) and windows identified with selection signatures (i.e. top 5% of HAPH, |iHS| or F ST values). The within breed measures of selection (HAPH, |iHS|) assess haplotype frequencies and should be efficient at detecting on-going recent selection while, in contrast, high F ST between dairy by beef breeds will identify areas of the genome where there is differentiation between dairy and beef breeds, but not within either group.
Association between measures of selection and genome-wide quantitative trait loci (i.e. type 3 loci) in Holstein and Jersey cattle
(a) QTL Holstein
(b) QTL Jersey
(c) QTL Holstein
(d) QTL Jersey
(e) QTL Holstein or Jersey
F ST Dairy vs. Beef
(f) QTL Holstein
Windows with high F ST values between beef and dairy breeds were not enriched for QTL affecting production traits (Table 5) even when the proportion of the genome considered was increased to 20%. Thus despite many generations of selection for increased milk production in dairy cattle, we do not find big differences in allele frequency between beef and dairy breeds near QTL for milk production. This may indicate that genetic drift between beef and dairy breeds is greater than the effects of selection. Our finding are in contrast to other studies , which used fewer SNP and fewer breeds than in the current analysis. However, windows containing QTL in Holstein were significantly over-represented (by 1.8 - 2.1 times) in the windows with QTL for the same trait in Jersey (Bonferroni corrected; P < 0.05), for all traits except fertility. Thus at least some QTL appear to segregate in both breeds. If the same alleles segregate in both breeds, this implies that either the polymorphisms existed since before the breeds diverged or it may be the result of admixture among our dairy cattle populations. Given that some QTL segregate across breeds, it is perhaps surprising that selection has not caused both dairy breeds to differ from the beef breeds as measured by F ST .
Genomic regions with evidence of recent selection using haplotype homozygosity
Sweep location & size (Mbp)
Type 1 & 2 loci
Six of the 25 long regions of high HAPH could be ascribed to the type 1 and type 2 loci. The strong selection sweep on BTA13 in Shorthorn contains the agouti (ASIP) locus (Table 6), which is known to affect coat colour in several species . However, phenotypic expression of ASIP requires an agouti-susceptible allele at MC1-R, such as the wild-type E + allele found in Jerseys . Thus most of our other breeds will not show a coat colour phenotype from ASIP mutations. There seems to be a selected mutation specific to British breeds (i.e. Shorthorn, Angus, Murray Grey and Hereford; Additional file 3: Figure S13) and, although ASIP mutations are unlikely to affect coat colour in these cattle, the locus may have affected coat colour in ancestors without the MC1-R mutation or the mutation may affected other traits such as fatness and homeostasis .
Other strong selection sweeps for several breeds were located on BTA 16 (41 – 47 Mbp) and BTA 7 (42 – 47 Mbp) (Table 6). However, unlike the ASIP region, F ST in these two regions did not indicate clear differentiation patterns between the breeds and breeds within the selected group frequently differed from each other. The selected region on BTA7 was particularly gene dense and includes, among others, 23 olfactory receptor loci. Interestingly, this region was also identified in an independent study of Fleckvieh cattle . The large sweep identified in Shorthorn on BTA3 (69.75 – 88.4 Mbp) contains LEPR (leptin receptor, 80.1 Mbp) which has been reported to be associated with multiple growth and fatness traits in beef cattle . The longest identified selected region in Holstein, where we had the largest number of genotyped animals (n = 13,501), was on BTA26. In a region also supported by a high |iHS| value, a promising candidate is FGF8 (fibroblast growth factor 8 (androgen-induced)) (Additional file 3: Figure S14). There is functional evidence for the involvement of FGF8 in lactation, as it has been found to be highly expressed in lactating (human) breast tissue and milk . The selection signature on BTA3 was also identified by Stella et al.. The region contains SLC35A3 (solute carrier family 35 (UDP-N-acetylglucosamine (UDP-GlcNAc) transporter), member A3; at 43.4 Mbp) which is the gene at which a recessive lethal mutation causes complex vertebral malformations (CVM) in Holstein cattle . A lethal recessive mutation would not cause the type of selection signature detected here but selection at a nearby linked locus could explain why the mutation in SLC35A3 has drifted to high frequency.
Some of the long selection sweeps reported in Table 6 could be the result of random processes, such as genetic drift or demographic changes, rather than selection. However, we find that strong selection (or ‘hard’) sweeps are relatively rare in our 8 breeds of cattle. This is despite strong, recent selection for numerous traits and particularly for milk production traits in our dairy breeds. Thus one can conclude the substantial genetic improvement in milk yield in dairy cattle has not generated many clear signatures of selection.
We searched for selection signatures at locations in the genome which were likely to be under selection using dense SNP genotypes in the genomes of 8 domestic B. taurus cattle breeds. The evidence is consistent with one or more mutant alleles having been selected to high frequency in some of the eight breeds for some of loci we investigated. Consistent with a ‘hard sweep’ model of selection, the breeds carrying the mutant allele show a common long haplotype (indicated by high values of HAPH) and a large genetic distance (F ST ) from the breeds carrying the ancestral allele or a different mutant allele in the region. We clearly observed this type of selection pattern at PMEL and MSTN. However, selection signatures at loci with a large effect on complex traits under selection (type 2 loci) were weaker, and almost absent for most QTL for traits under selection (type 3 loci). How can these results be explained?
A classic ‘hard sweep’ is expected when the environment changes such that a mutation that would previously been detrimental becomes favourable. Typically there is a lag and then the frequency of the favoured allele increases slowly until it reaches a modest frequency after which it is swept quickly to fixation. This is the pattern seen, for instance, in insecticide resistance . Our data on POLLED, MC1-R, KIT, KITLG, PMEL, PLAG1 and MSTN are consistent with this explanation although here the changed ‘environment’ is one in which cattle owners control which animals will be allowed to breed. The selected mutations were probably deleterious in the wild and this natural selection may still operate in domestic cattle along with the artificial selection applied by cattle owners. Therefore to drive a mutation rapidly to high frequency, artificial selection must be strong and natural selection weak. This combination is likely for some coat colour mutations – if a breed is defined to be red, then selection for a red mutation will be very strong while natural selection against the mutation may be weak, particularly if natural selection was related to environmental factors that have been reduced through the process of domestication (i.e. camouflaged from predators).
On the other hand, mutations with a large effect on growth, reproduction or milk production are likely to have detrimental side effects even under domestication. Pleiotropy is commonly observed for large-effect mutations, such as PLAG1 affecting fertility and stature  or DGAT1 affecting both milk volume and solids (fat and protein) , and it is unlikely that the overall effect of a particular mutation would always be favourable. Consequently, few mutations affecting these types of traits will be driven rapidly to high frequency and leave a clear selection signature. Occasionally large-effect mutations with small or inconspicuous pleiotropic effects are observed as under strong selection. We observed strong selection in Limousin at MSTN and there is strong, recent selection near the PLAG1 region in Brahman cattle despite its negative effects on fertility .
Thus the results for type 1, 2 and 3 loci are best reconciled by considering the selection on each locus. Selection for simple (monogenic) traits applies strong selection pressure to a mutation and the results are consistent with a ‘hard sweep’ model of selection. However, complex traits in our data were not associated with classic selection signatures and ‘hard sweeps’ are relatively rare despite the recent selection for milk traits in our dairy cattle. This suggests the selection response is caused by weak selection at many sites across the genome, probably for previously segregating variants. Weak selection is expected since each QTL has a small effect the on phenotype e.g. [51, 52]. Since there are many loci, each with small effect, selection will not change the allele frequency rapidly and there will be little evidence of a selection sweep. Small changes to allele frequencies at many loci can combine to make large changes to a phenotype, consistent with the large selection response observed for the complex traits in our data. The ability to detect selection sweeps would be further hampered if selection was conducted on genetic variants already segregating in the population. Innan & Kim , for example, find the initial frequency of the selected alleles to be one of the primary determinants for the ability to detect a selection event using classic selection signatures.
The explanation of weak selection on old genetic variation for complex traits, although speculative, is supported by other evidence. One key and consistent observation in support of selection on standing variants is the rapid and immediate response to selection observed for most (if not all) heritable characters in domestic and experimental populations . This supports frequency changes to mutations already segregating in the population because, given the rapid response, there is insufficient time for accumulation of new favourable mutations. The selection response does not usually show an acceleration, as seen with insecticide resistance, but is approximately linear and can be predicted from estimates of the genetic variance prior to selection. Nor does the selection response diminish and reach a plateau e.g. , except in small populations, indicating that few genes of large effect have reached fixation. Historically, debate on the mutations underlying the response to selection was divided by strong selection at a few loci or relatively weak selection at many loci. However in Holstein, for example, there has been large increases in milk production with very few ‘hard sweeps’ observed in the genome and few observations of large-effect QTL.
Although we show that most selection for complex traits does not leave a classic signature of selection, we do not imply that selection does not change the allele frequency at sites causing variation in complex traits. Turchin et al. show that mutations affecting human height have been subject to selection because, at many loci, the alleles for increased height have higher frequency in northern than in southern Europe. However, Turchin et al. present no evidence that a selection signature could be discerned if the sites associated with variation in height were not already known. In human height and in cattle milk yield, selection has no doubt changed allele frequencies at causal loci but not enough to leave a selection signature that is recognisable in the absence of prior knowledge of loci associated with height or milk yield or indeed most complex traits. An implication of this conclusion is that searching for classic selection signatures is not a powerful method to map genes for complex traits even if the traits have been under selection.
Identification of genomic regions under selection for complex traits requires approaches more sensitive to detect subtle changes in allele frequencies over time and with greater flexibility to detect selection on segregating variants. At least in domestic animals, the explicit use of the pedigree structure in may be more appropriate to detect genomic regions responsible for recent selection e.g. [57, 58]. We did find a weak association between selection signatures (|iHS|) and QTL for milk production traits by considering 20% of the genome. However, finding such a weak association over such a large part of the genome is not very useful in practice. This weak association occurred despite the advantages of using genomic selection methodologies to identify QTL . For example, compared to single SNP regressions, our approach to identify QTL can capture a higher proportion of the genetic variance  and has an improved ability to account for population stratification .
The detection of clear selection signatures is compromised by a number of other factors that are illustrated by the individual loci that we examined. There are many traits subject to natural and artificial selection and many genes affect each trait. Therefore the genome contains many possible sites of selection and this complicates the interpretation of the data. For instance, we examined the region surrounding ABCG2 but may well have detected selection at NCAPG-LCORL. The large number of loci segregating for many traits possibly also leads to complex results on BTA20 where there are > 1 QTL for milk production . Also multiple alleles at a locus under selection seems to be common and could cloud the interpretation. We found or confirmed multiple alleles at POLLED, MC1-R, KIT, KITLG and PMEL. Migration or introgression of a selected mutation from one breed to another leaves an unusual selection signature as shown by PLAG1 in Limousin where F ST between Limousin and other breeds is high except at the position of the selected mutation. This pattern is expected if the common ancestor of all PLAG1 mutant alleles in Limousin is a Limousin haplotype that differs except at the PLAG1 mutation from haplotypes in other breeds carrying the same mutation. In the case of DGAT1 there has been recent selection for the ancestral allele after possible earlier selection for the mutant. Thus many of the small sample of genes studied display properties that complicate the interpretation of the data and decrease our ability to find clear evidence of classic selection signatures.
We conclude that the conditions that give rise to a clear selection signatures (i.e. strong selection for a mutation that would previously have been detrimental) are rare. More usually the response to selection is based on small frequency changes at many loci that were already polymorphic in the population before selection began. Consequently, many of the claims for identifying loci affecting complex traits using selection signatures must be treated with caution.
We obtained real and imputed Illumina Bovine high-density genotypes from 8 cattle selected primarily for dairy or beef production (dairy breeds: Holstein, Jersey; Beef breeds: Angus, Charolais, Limousin, Hereford, Murray Grey, Shorthorn). Sliding windows of 250 kb were constructed across the genome, where each 250 kb length was separated by 50 kb. A window size of 250 kb was chosen because its approximate time to coalescence is 2,000 years (i.e. 1/0.0025 Morgan = 400 generations or 2,000 years assuming 5 years per generation; following ), which should represent chromosome segments segregating in domesticated cattle prior to breed formation. For each window, we calculated statistics which would identify within breed selection (i.e. HAPH and |iHS| defined below), computed the divergence between the breeds using Wright’s F ST and calculated the variance in genomic estimated breeding values (GEBV) for Jersey and Holstein breeds for dairy traits (milk, fat and protein yield; fat and protein concentration; stature and fertility). We tested for over-representation of the top 5% of windows with selection signatures (within either Holstein or Jersey, and across dairy and beef breeds) that were also in the top 5% of windows for genetic variance in dairy traits. The significance of this over-representation was assessed by a chi-squared test on a 2x2 contingency table. The 3 selection statistics and annotated genomic features for each 250 kb window are contained in Additional file 1.
Datasets from dairy and beef cattle were available for analysis. We analysed only autosomal SNP. The dairy dataset consisted of 616,350 SNP for 13,501 Holstein and 5240 Jersey animals. The beef dataset consisted of 692,527 SNP for 2510 Angus, 463 Charolais, 744 Hereford, 61 Limousin, 254 Murray Grey and 868 Shorthorn cattle. Genotype quality control and imputation methods for the dairy data are described by Erbe et al. and Bolormaa et al. describes the beef data.
Haplotype segments were constructed for dairy and beef datasets using phased data from Beagle  and non-overlapping segments of 30 or 31 SNP. For each chromosome segment we calculated a modified version of Depaulis-Veuille’s H-test , referred to as HAPH, where HAPH = , where p i is the (within breed) frequency of the i th haplotype and N is the total number of haplotypes observed for the breed at the position. Chromosome segments were allocated to 250 kb windows in which their mid-point fell and the average calculated for each 250 kb window. HAPH was then standardized by dividing this value by the breed average over all windows 'Hard sweeps' (i.e. Table 6) were identified by windows in the top 5% of HAPH values and separated by less than 1 Mb.
|iHS| was calculated within breed for each SNP in dairy and beef datasets following Voight et al.. iHS is a measure of haplotype homozygosity surrounding the derived allele at a SNP compared to the haplotype homozygosity surrounding the ancestral allele at the SNP. To determine the ancestral allele, genotypes for 750,948 SNP from the Bovine HD chip were obtained for 2 Banteng, 7 Bison and 8 Buffalo animals. All genotype calls were used and the ancestral allele was taken as the most frequent allele observed in these out-group animals. Only one allele was observed for most (85%) SNP. Next, the integrated extended haplotype homozygosity (iEHH) was calculated within breed for the ancestral and derived SNP allele using the ‘rehh’ package in R [64, 65]. The homozygosity decay threshold for iEHH was 0.5 and all SNP had a minor allele frequency > 0.001. Finally, the log10 ratio of iEHH for the ancestral compared to the derived allele was standardised to a mean of zero and standard deviation of 1 in 20 bins, where bins were determined by frequency of the ancestral allele [i.e. (log 10 x – μ)/σ, when x is the iEHH of the derived allele divided by the ancestral allele, and μ and σ are the mean and standard deviation of log10 iEHH ratios for each bin]. The final statistic, the integrated haplotype score (iHS), therefore measured the haplotype homozygosity surrounding a derived SNP allele compared to that surrounding the ancestral SNP allele. Although a negative iHS indicates greater homozygosity surrounding the ancestral allele and a positive iHS indicates greater homozygosity surrounding the derived allele, we analysed the absolute value of iHS so that the measure was independent of the allele classification. This is because either SNP allele might be on the same chromosome segment as the causative mutation. The maximum value of |iHS| was used for each 250 kb window.
where j is each SNP in the 250 kb window, p ij is the allele frequency for breed i at SNP j, and is the mean allele frequency of the breeds at SNP j. On average there were 60 SNP per window (range: 1 to 173 SNP; SD: 22 SNP).
To find windows where dairy breeds differed most from beef breeds the F ST values between pairs of breeds where one was a dairy breed and one was a beef breed (e.g. Holstein with Angus) were compared to F ST values between breeds where both were either dairy (Holstein with Jersey) or type 1 and type 2 loci beef breeds (e.g. Angus with Charolais). F ST values for a window were divided by the mean F ST over all windows for that pair of breeds and then compared using a one-sided non-parametric Mann–Whitney U test.
Phenotypes and genotypes were obtained from the Australian Dairy Herd Improvement Scheme (ADHIS) for 3,391 Holstein and 1,014 Jersey bulls. Bull genotypes were a subset of animals used to detect the selection signatures. The effect of each SNP was estimated using BayesR, using the same process as Erbe et al., which simultaneously estimates the mean, a polygenic effect and the effects of all SNP. Separate analysis were conducted for each trait by breed combination, where each analysis used 50,000 iterations (30,000 discarded as burn in) and SNP effects were the mean of 5 replicate chains. For each trait we estimated the genetic value of each 250 kb window in each animal (its local GEBV) by (i.e. X is a matrix of genotypes, and is the estimated SNP effect from BayesR). The variance across animals of GEBVs at a window indicates the windows contribution to genetic variance for that trait. The windows with the top 5% of values for this variance for each breed by trait combination were assumed to contain putative QTL.
The locations of genomic features were downloaded using BioMart  on 15th March 2013. Genes were mapped to each 250-kb window using their gene start and stop positions using their Ensemble ID and associated gene name (when available). All map positions of SNP and genomic features used UMD3. The loci used as type 1 and type 2 loci were a selection of loci available from the literature, including some identified from the Online Inheritance in Animals  database.
The top 5% of windows for HAPH, |iHS| and the dairy by beef F ST test were deemed to indicate evidence of selection. A chi-squared test with 1 df was used to determine if the number of windows which ranked in the top 5% for the indicator of selection and the top 5% for the variance in GEBV for the production trait was more than expected by chance. The chi-squared test used the average of 5 non-overlapping sets of windows by dividing the actual number of overlapping windows by 5 (i.e. the number of times each segment of the genome was counted in a window). For the dairy by beef breed comparison, windows were counted if they were in the top 5% of windows for GEBV variance in either Holstein or Jersey.
No animal experiments were performed specifically for this manuscript. Where data were obtained from existing sources, references for these experiments are provided.
Authors thank the Beef Cooperative Research Centre for Beef Genetic Technologies and Dairy Futures Co-operative Research Centre for funding and the provision of data to conduct this research. We thank Gert Nieuwhof from the Australian Dairy Herd Improvement Scheme (ADHIS) for providing the data to create genetic trends in dairy cattle (Additional file 3: Figures S1-S3). This research was supported under Australian Research Council’s Discovery Projects funding scheme (project DP1093502). The views expressed herein are those of the authors and are not necessarily those of the Australian Research Council.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.