- Research article
- Open Access
An examination of positive selection and changing effective population size in Angus and Holstein cattle populations (Bos taurus) using a high density SNP genotyping platform and the contribution of ancient polymorphism to genomic diversity in Domestic cattle
© MacEachern et al; licensee BioMed Central Ltd. 2009
- Received: 29 January 2008
- Accepted: 24 April 2009
- Published: 24 April 2009
Identifying recent positive selection signatures in domesticated animals could provide information on genome response to strong directional selection from domestication and artificial selection. With the completion of the cattle genome, private companies are now providing large numbers of polymorphic markers for probing variation in domestic cattle (Bos taurus). We analysed over 7,500 polymorphic single nucleotide polymorphisms (SNP) in beef (Angus) and dairy (Holstein) cattle and outgroup species Bison, Yak and Banteng in an indirect test of inbreeding and positive selection in Domestic cattle.
Outgroup species: Bison, Yak and Banteng, were genotyped with high levels of success (90%) and used to determine ancestral and derived allele states in domestic cattle. Frequency spectrums of the derived alleles in Angus and Holstein were examined using Fay and Wu's H test. Significant divergences from the predicted frequency spectrums expected under neutrality were identified. This appeared to be the result of combined influences of positive selection, inbreeding and ascertainment bias for moderately frequent SNP. Approximately 10% of all polymorphisms identified as segregating in B. taurus were also segregating in Bison, Yak or Banteng; highlighting a large number of polymorphisms that are ancient in origin.
These results suggest that a large effective population size (Ne) of approximately 90,000 or more existed in B. taurus since they shared a common ancestor with Bison, Yak and Banteng ~1–2 million years ago (MYA). More recently Ne decreased sharply probably associated with domestication. This may partially explain the paradox of high levels of polymorphism in Domestic cattle and the relatively small recent Ne in this species. The period of inbreeding caused Fay and Wu's H statistic to depart from its expectation under neutrality mimicking the effect of selection. However, there was also evidence for selection, because high frequency derived alleles tended to cluster near each other on the genome.
- Effective Population Size
- Single Nucleotide Polymorphism
- Wild Relative
- Ascertainment Bias
- Simulated Population
Identifying positive genomic selection in domestic animals is a major challenge in contemporary agricultural research. To date only a small number of examples have successfully identified genomic regions subject to positive selection in domestic animals [1–10]. Increasing the understanding of positive selection and how it shapes genetic variation in domestic animals has the potential to provide powerful insights into the mechanisms involved in evolution, help target loci for selection and possibly highlight the genetic basis of phenotypic diversity for complex traits [5, 11]. Domestic animals provide a unique opportunity to detect positive selection due to their extensive diversity amongst breeds, increasing availability of sequence data and large databases of polymorphisms that are accruing in domestic species like Bos taurus.
Data on polymorphisms can provide evidence of selection if the patterns in the data are incompatible with a neutral model . For instance, the neutral model with constant effective population size predicts that most polymorphisms will have one common allele and one rare allele. More specifically, if p is the frequency of one of the two alleles chosen at random and f(p) is the distribution or spectrum of all polymorphisms where one allele has frequency p, then f(p) = k/(p(1-p)) where k is a constant. Tajima's D statistic  measures the extent to which real data differs from this theoretical expectation. Tajima  suggests that changes in the frequency spectrum of neutral polymorphic alleles can be used to detect a hitchhiking effect due to the spread of linked advantageous mutations. Therefore, high values of D indicate that common polymorphisms are more frequent than expected from the neutral theory and this is a result of genetic hitchhiking. However, polymorphisms are discovered by methods that tend to find common variants and this ascertainment bias can also generate an excess of polymorphisms with intermediate allele frequency.
The test for departure from expectation can be made more powerful if it is possible to distinguish the ancestral allele from the derived or mutant allele at each locus. If p is the frequency of the derived allele, then the distribution of all derived alleles is f(p) = k/p. Fay and Wu measure departure from this expectation with their H statistic . If derived alleles are found at high frequency more often than expected, then H will be positive. They suggest that selection causes a positive H statistic, because selection sometimes drives the derived allele to high frequency. This can occur if the polymorphisms observed are subject to selection themselves, but can also occur at neutral loci as a result of hitchhiking caused by selection acting on linked loci. This makes H a very useful test for selection because most polymorphisms are discovered randomly and few of them are likely to be directly subject to selection.
Unfortunately, both D and H can depart from expectation for reasons other than selection [13, 15–17]. The way in which polymorphisms are discovered usually means that low frequency polymorphisms are less likely to be discovered than one with alleles at intermediate frequency. D and H are also affected by changing effective population size (Ne). If Ne declines, polymorphisms with one rare allele become less frequent and the frequency spectrum becomes flatter. In this way a decline in Ne (i.e. inbreeding) can mimic selection [16–18]. Therefore, detecting unambiguous examples of positive selection has been difficult due the difficulty of many methods to differentiate between positive selection and demographic history. This is of particular concern in domestic species where SNP discovery typically involves some ascertainment bias and demographic fluctuations coupled with strong directional (artificial) selection, which have played important roles in the formation of domestic breeds .
The problem of ascertainment bias will result in an observed allele frequency spectrum that is more flat than that predicted by theory. However, it is possible to construct a test that is not affected by this ascertainment bias if derived and ancestral alleles can be distinguished. Since f(p) = k/p for derived alleles with frequency p , the frequency spectrum for all ancestral alleles with frequency 1-p is f(1-p). The spectrum for all alleles with derived or ancestral allele frequencies p or 1-p is then f(p) + f(1-p), which is equal to f(p(1-p)), see above. So neutrality predicts that the proportion of these alleles where the ancestral allele is p is f(1-p)/[f(p)+f(1-p)], which is equal to p. Assuming that the polymorphism discovery method cannot distinguish ancestral and derived alleles, this expectation for different p intervals is not affected by the ascertainment bias. It has only been tested for p from 0 to 0.5, since the value of any f(1-p)/[f(p)+f(1-p)] is 1-(value at 1-p). Also, because selection does not typically affect all parts of the genome equally, selection and demographic phenomena can be compared. For instance, a selected allele can drag derived alleles that are closely linked to high frequencies by hitchhiking. Therefore, selection should cause an autocorrelation of high frequency derived alleles between one locus and the next on the chromosome. To test if the observed autocorrelation could be due to inbreeding, we have used a simulation study to demonstrate the effect of inbreeding in the absence of selection and compared the results with those found in real data.
Recently it has become possible to assay large numbers of polymorphisms in cattle and this offers a new source of data with which to detect evidence of selection. In this paper we use data from two breeds of cattle (Angus and Holstein) each genotyped for over 7,500 SNPs using the Parallele/Affymetrix platform. By also genotyping these SNPs on 3 species related to Bos taurus (Bison, Yak and Banteng) we have been able to distinguish the derived and ancestral allele at each locus and use this information to test for deviations from neutrality.
The comparison between the allele frequencies in the Angus and Holstein breeds might also contain evidence of selection since they have been selected for different traits. However, their allele frequencies also differ due to genetic drift caused by finite population size or inbreeding. The difference in allele frequencies can be quantified by the statistic Fst. Inbreeding should affect all loci equally and genetic drift should affect loci randomly and not show any linkage disequilibrium between adjacent loci, but we hypothesise that selection will drive linked derived alleles to high frequency in one breed but not the other. Therefore selection should cause higher values of Fst among loci where the derived allele is common than when the ancestral allele is common. We examine how Fst between Angus and Holstein changes with allele frequency and compare the result to that obtained with the simulated data.
Amplification of B. taurus designed markers in wild relatives
Summary of successful genotypes for loci amplified in Angus, Holstein and at least one wild species
Proportion and numbers of alleles showing successful genotypes in individual wild relatives and overall total where at least one animal was successfully genotyped from the 9,323 SNP genotyped in Angus
≥ 1 species
Fay and Wu's H test
Summary of H statistics from a survey of over 7,746 SNP in Holstein and Angus cattle
p < 0.001
p < 0.001
Frequency spectrum of derived and ancient alleles
Genomic distribution of high frequency derived polymorphisms
The frequency spectrum of derived alleles appears to follow a pattern contrary to that expected under neutrality, as there are too many derived mutations with relatively high frequency. If these mutations are clustered in certain areas of the genome, this may be evidence of positive selection driving these alleles to high frequencies. Alternatively if they are distributed randomly throughout the genome, this may be a result of inbreeding randomly increasing the frequency of some derived (new) mutations.
Autocorrelation for derived allele frequency and the frequency of adjacent derived polymorphism for B. taurus and simulated populations
p < 0.001
p < 0.001
p > 0.05
p > 0.05
In order to visually summarise the data presented in table 4 we plotted the derived allele frequency against genomic position over the entire genome, we also contrasted this with Fst at the same positions. Due to the volume of information, these plots have been included as supplementary figures 1–30 A and B for all Holstein and Angus chromosomes (see Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 and 30). In general if we just examine the derived allele frequency some grouping of derived alleles at high frequencies can be found throughout the genome. The largest autocorrelations were found on chromosomes 2 and 13 in Holstein and Angus (Table 4, Additional files 2 and 13). This could be due to positive selection independently increasing the frequency of derived polymorphisms in both breeds, or more likely it could be due to selection acting in the common ancestor of Holstein and Angus. For chromosomes 25, 26 and 29 high autocorrelations in one breed and not the other where identified, which may be evidence of breed specific selection (Table 4, Additional files 25, 26 and 29). Also, lack of any significant correlation in the simulated populations in table 4 suggests that inbreeding cannot solely create this clustering of high frequency derived alleles.
Phylogenetic analysis and the frequency of derived alleles in Holstein and Angus
We used a neutral phylogenetic tree to overlay the average frequency of derived alleles between Angus and Holstein cattle (data not shown). At the 7,611 polymorphic sites analysed, the Holstein breed had a higher average derived allele frequency (0.362) when compared to Angus (0.359), which is similar to the finding we observed in Table 4. However, the difference between breeds was not significant.
Average Fst and derived allele distributions
Frequency spectrum of ancient polymorphisms in B. taurus
In total the 8,677 SNP markers in Angus that successfully genotyped in at least one of the wild species were examined for evidence of ancestral polymorphisms. Of these, we identified 931 (10.7%) that appear to be ancestral in origin, in that the three groups were not fixed for a single allele (see Methods).
Ancestral polymorphism and effective population size
The 931 ancestral polymorphisms found in B. taurus suggest that despite any recent bottlenecks in B. taurus that occurred during the domestication process, very large populations must have been maintained in the ancestral Auroch (B. taurus primigenius) population prior to its domestication some 10,000 years ago [23–25]. Thus, 931/8,677 = 0.11 of SNP in B. taurus have remained polymorphic for over ~2 million years.
For the purpose of estimating the effective population size of the ancestral species it was assumed that the number of polymorphisms was similar between ancestral and contemporary populations. The divergence times estimated by MacEachern et al  were used to calculate the average divergence time for Bison, Yak and Banteng, which was approximately 2.1 MYA. If the average generation time for these animals is 5 years, t should be roughly 4.2 × 105 generations. Thus, for equation 2, setting e-t/2Ne = 0.11, gives an effective population size of ~90,000 animals, and of course the actual number of animals would have been some degree larger than this. It is important to note that this estimate is affected by our estimate of generation time, which may have been greater than 5 years. However, the rate of fixation in domestic cattle most likely increased with decreasing population size and therefore our estimate should be a good approximation.
Ancestral alleles and phylogenetic relationships in closely related bovids
Examination of Bison, Yak and Banteng where both B. taurus alleles are detected
Bison anomaly grand total:
Yak anomaly grand total:
Banteng anomaly grand total:
Number of heterozygous markers, the proportion they contribute to aberrant SNPs and the average heterozygosity calculated from the total of successfully amplified markers
Proportion of aberrant SNPs
Genotyping and wild specimens
We have found that a high proportion (~0.9) of SNP markers, designed from the B. taurus genome, were successfully assayed in a number of wild relatives from the Bovina subtribe, which diverged from B. taurus between 1–2 MYA, while no successful genotypes were recorded from the Bubalina subtribe, which diverged some < 9 MYA. It is unclear whether the lack of amplification in the Bubalina was the result of low selective constraint over SNP, or if there was a problem with DNA. However, a recent study by Gautier et al.  identified a high rate of successful SNP assays in goat (Capra hircus), which diverged at least 15 MYA from cattle. Thus members of the Bovina subtribe should be suitable outgroups for SNP genotyping studies in B. taurus.
Genome wide polymorphism scans reveal a deviation from neutrality in two breeds of B. taurus
Researchers utilising the bovine genome have uncovered large number of SNPs, which has presented researchers the opportunity to examine patterns of DNA sequence variation for indirect evidence of positive selection by cataloguing levels of genetic hitchhiking. To this end, we analysed over 7,500 SNP spaced throughout the genome of Holstein and Angus cattle (B. taurus). We identified significantly large deviations from neutrality in each B. taurus population using the H statistic developed by Fay and Wu , which may result from either recent positive selection, ascertainment bias or inbreeding.
Scans examining the frequency distribution of polymorphisms for deviations from neutral expectations often run into difficulty when trying to differentiate between the effects of positive selection and demographics. This is because the null hypotheses used to test for significance unrealistically assumes that the demographic history of the sample population was a random mating population with an unchanged Ne [16, 26]. Ascertainment bias for common alleles can also affect scans for positive selection from polymorphism data, which may mimic the results expected for positive selection or those produced from demographic processes such as inbreeding. The ~10,000 SNPs that were provided as part of the genotyping platform by Parallele Bio Sciences were discovered using the cattle genome project, based on sequences from only one or a few animals (information regarding their discovery is at ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Btaurus/snp/Btau20040927/bovine-snp.txt). The small numbers of animals sampled during SNP discovery would suggest that there is some ascertainment bias. Hence, inferences of positive selection from tests that come directly from allele frequency distributions should be made with caution.
By examining several thousand loci it was possible to distinguish between the effects of population structure, positive selection and ascertainment bias on the frequency spectrum of ancestral and derived alleles. As indicated by the significant H test, the results detected an excess of high frequency derived alleles in Holstein and Angus populations. The frequency spectrums in figure 1 are generally too flat when compared to the distributions expected under neutrality. The frequency distribution highlights a number of discrepancies at medium frequencies (0.4–0.6) for Angus and Holstein, which would be expected if there was ascertainment bias towards common polymorphisms. However, a flat distribution was also found in figure 2 and this plot removes any effects from ascertainment bias on the frequency distribution as the plot examines the frequency of the derived allele compared to the ancestral allele, using the ratio f(1-p)/[f(p)+f(1-p)] and information regarding the derived and ancestral alleles are irrelevant during the SNP discovery. Hence, the flat distribution witnessed in figure 2 indicates that derived alleles occur at high frequency more often than expected and that this was not due to any ascertainment bias. This new metric may be of use to researchers trying to identify selective sweeps to datasets that are influenced by ascertainment bias.
Reduced Ne or inbreeding could, however, cause the distribution observed in figure 2 because it leads to a random dispersal of allele frequencies. We used a transition matrix method to calculate the amount of inbreeding necessary to generate an allele frequency spectrum that matched that observed. This was done by starting with the allele frequency spectrum expected under the neutral model (ie f(p) = k/p) and using a transition matrix which calculated the spectrum one generation later assuming a population of effective size Ne and no mutation. The matrix multiplication was repeated multiple times until the spectrum matches the observed spectrum. We found that to replicate a frequency distribution similar to that displayed in figure 1 would require enough generations to reach an inbreeding coefficient of 0.5.
An example of the effect of genetic drift is shown by the frequency spectrum of ancient polymorphisms presented in figure 2, which shows a very flat distribution. This is expected for very old polymorphisms. It is unlikely that these polymorphisms have all been maintained due to overdominance and so we assume that they have been maintained in the historically large populations that once existed in bovids. Slight peaks are witnessed at the extremes of the distribution < 0.02 and > 0.98. Because all fixed alleles with frequencies of 0 and 1.0 were removed from the analysis, these may represent typing errors or be evidence of alleles that have been positively selected towards fixation.
Genomic distribution of high frequency derived alleles
The large number of high frequency derived alleles found in Angus and Holstein populations are unexpected under neutrality with constant Ne . To distinguish between the effects of positive selection and inbreeding on the frequency spectrum of derived mutants in B. taurus we examined the tendency of derived alleles to cluster together in the genomes of Angus and Holstein populations using the autocorrelation between frequencies of derived alleles. A positive autocorrelation for derived allele frequencies between neighbouring loci indicated that there is an association between high frequency derived alleles in the genomes of both cattle breeds. This is consistent with positive selection and not changes in population size or ascertainment bias. A simulated population that had been inbred to similar levels as contemporary populations, without the influence of positive selection, failed to show a similar autocorrelation between high frequency derived alleles. Therefore, this suggests that hitchhiking events are common throughout the genomes of both breeds of B. taurus and this is consistent with positive selection for some loci. If this is the case, it appears to have influenced the Holstein genome more than the Angus genome. There is a possibility that our findings are the result of sampling error. However, as this study is based on a fairly large sample size in both breeds (n > 300) we believe the findings are indicative of stronger artificial selection in the Holstein breed.
Fst distribution and inbreeding and selection in B. taurus
If different selection pressures operated in Holstein and Angus, then different derived alleles might be driven to high frequency in the two breeds. This would cause Fst between the breeds to be higher when the frequency of the derived allele was high. We estimated Fst per locus and plotted their average values against the average derived allele frequency in B. taurus and in the simulated populations (Figure 3). Initial examinations of the frequency distribution of Fst between B. taurus breeds do not appear to find any overwhelming signatures of positive selection as the distribution of Fst is fairly symmetrical with respect to allele frequency. Nor has examining Fst plots for the simulated and observed populations identified any convincing differences. However, plots of derived allele frequency overlayed with Fst identified regions on chromosomes 8, 20 and 24 in Angus where large regions have had the derived alleles driven to near fixation generating higher Fst values for these regions than the overall average (Figure 4A–F). A quick examination of the genes underlying these regions has not identified any remarkable candidates for positive selection, perhaps except for FGF1 in beef cattle, but these regions are associated with QTL identified for body composition and carcass yield [21, 22]. Hence, our results may be of future interest for identifying signatures of recent positive artificial selection between the two breeds or as additional evidence for any polymorphisms that show associations with beef or milk traits.
A recurring theme that appears when examining polymorphisms in B. taurus is the high proportion of polymorphisms that appear to be segregating in wild species. In the phylogenetic analysis presented by MacEachern et al.  a surprisingly large proportion (8.7%) of nucleotide substitutions between species did not follow a simple tree implying that these sites were polymorphic in an ancestral species that had subsequently undergone lineage sorting in the extant bovids. These ancestral polymorphisms were found from sequencing a number of B. taurus breeds and a subset of wild relatives from the Bovinae subfamily without any knowledge about whether they were still segregating in B. taurus. In contrast, work completed in the current study focuses on nucleotides that have been identified to be segregating in B. taurus. However, we have also found a large proportion of B. taurus polymorphisms (10.7%) to be segregating in the wild relatives. Despite the difference in the method by which they were detected, the possible explanations for them are similar. That is, they must have arisen due to a double mutation, introgressive hybridisation or alternatively they are due to the presence of ancestral polymorphisms that are still segregating in B. taurus. These loci may in fact still be segregating in the wild relatives, even if they are not heterozygous in the wild species sampled. Alternatively, they have undergone lineage sorting and hence they appear to be alleles with 'abnormal' inheritance, as described in MacEachern et al. [20, 27]. Therefore, approximately 10% of all polymorphisms in the Bovinae are likely due to ancestral polymorphisms in the common ancestor of cattle and Bison/Yak/Banteng.
where dI is the polymorphism rate at noncoding sites estimated from MacEachern et al.  and n = the number of chromosomes sampled. This yields an estimate of heterozygosity in Holsteins equal to 0.0011, which is approximately 3 times less than what we expected from Ne = 90,000 (He = 0.0035). This difference could largely be the result of the amount of inbreeding that has occurred in contemporary populations of B. taurus or by error in the estimate of μ.
Wild species heterozygosity and overall similarity to B. taurus
A number of the ancestral polymorphisms were examined for the proportion of loci that were heterozygous in the outgroup species and for their similarity to B. taurus. MacEachern et al.  identified a large number of genetic similarities between Yak and B. taurus. We have found that Yak shares slightly more alleles with B. taurus than does Bison, but the difference was not significant. Thus, there is only weak evidence that Yak is more closely related to B. taurus than is Bison. We have also found that Banteng shared a larger number of alleles with B. taurus than Yak or Bison, which is most likely a result of having a Banteng animal with questionable ancestry, which may result in a slight over estimation of the number of ancestral polymorphisms and hence of Ne. If the Banteng sample contained Bos taurus genes, it would inflate the number of cases where Banteng was heterozygous for the Yak/Bison allele and the B. taurus allele. Table 5 shows that this occurred in only 88 cases out of 931 SNPs. Therefore, even if this hybridisation had occurred, it would not affect the conclusions greatly.
Examining the proportion of heterozygous loci in table 5, Yak appears to be more heterozygous than Bison, which may reflect past population bottlenecks in Bison [29, 30]. Not surprisingly, Banteng was the most heterozygous of all animals, and this might be expected if the Banteng has a questionable background. Although we believe the Yak and Bison samples used are genetically pure, without 100% certainty about the ancestry of our samples, some caution may be needed with our interpretations that rely on this aspect.
We have examined the frequency distribution of polymorphisms in milking and beef breeds of B. taurus using Fay and Wu's H as a test to identify genomic positive selection. Significant deviations from neutral expectations were identified, which appears to be a combined effect of positive selection, inbreeding and ascertainment bias for common polymorphisms. By distinguishing derived from ancestral alleles we were able to eliminate the effect of ascertainment bias from our test for selection using a new metric f(1-p)/[f(p)+f(1-p)] that is able to overcome many of the problems associated ascertainment bias when knowledge of the ancestral state is known. This metric could potentially be useful for a number of studies that rely on information from allele frequency distributions. The high frequency of derived alleles we have identified here could be caused either by selection or reduced Ne. Reduction in Ne appears to have occurred because the ancestral Ne predicts a higher herterozygosity than observed. However, the tendency of high frequency, derived alleles to cluster in certain parts of the genome is evidence for positive selection because inbreeding alone does not cause this autocorrelation.
By including a number of wild relatives in the analysis the ancestral alleles were inferred. Surprisingly a high proportion of ancestral polymorphisms were identified suggesting that nearly 10% of all of the polymorphisms that are segregating in contemporary populations of B. taurus are ancient in origin and must predate the divergence of Bison, Yak, Banteng and the Domesticated cow. These ancestral polymorphisms were therefore used to estimate the ancestral population size of domesticated cattle over the last 2 million years, which must have been at least 90,000. This estimate is roughly 9 times greater than the estimate of the effective population size in humans, which has been estimated as 10,000 .
Two separate breeds of Bos taurus and a number of wild relatives were analysed for genotypic polymorphisms using a high-throughput, high-density SNP genotyping platform. This platform is commercially available from Parallele Biosciences, which was acquired by Affymetrix https://www.affymetrix.com/corporate/parallele.affx. The original progenitors of the Angus and Holstein breeds are thought to be have existed for over two thousand years in Scotland and Germany/North Holland, respectively. However breed development did not occur until the early to mid 1800's . The breed histories are very similar in that, during the past 50 years Angus and Holstein have experienced dramatic increases in selection pressure for beef and milk production, respectively and decreases in effective population size to approximately 100 individuals each .
Angus animals were selected from Trangie Agricultural Research Centre in NSW, Australia. All animals had information on sire and dam pedigree records, were born from 1993 to 2000 and had been selected for high or low post-weaning feed efficiency (FE), or were part of a control herd. Holstein animals were selected from a research project based at Genetics Australia in Victoria, Australia. All animals were bulls selected as semen donors for artificial selection, have information on pedigrees and have been selected for high and low estimated breeding values (EBVs) as determined by Australian Selection Index (ASI), which is an economic index of milk, fat and protein yield from bulls' daughters via progeny testing. Approximately equal numbers of the extreme highest and lowest FE and ASI animals were selected for SNP genotyping. A single Yak (Bos grunniens), North American Bison (Bison bison), Banteng (Bos javanicus) and Water buffalo (Bubalus bubalis) were chosen as outgroup species and genotyped in an effort to infer derived and ancestral alleles in both populations of B. taurus. Thus, there were 762 B. taurus animals and 4 wild relatives genotyped for a number of SNP markers, using the Parallele™ technology.
For each of the 766 animals, DNA was extracted from blood or semen and DNA samples were diluted to 30 ng/ul. In Angus and the four wild species 9,323 SNPs, distributed across the bovine genome were genotyped at Parallele Bio Science Inc. There were slight differences in SNP platforms as a result of Parallele Bio Sciences being taken over by Affymetrix Inc., thus, a total of 9,919 SNP were genotyped in Holstein. Only the polymorphisms genotyped on both breeds were compared.
We used the Python programming language to parse data files and extract genotypes for all animals at each locus and calculate frequencies of derived and ancient alleles in Angus and Holstein populations.
Ancestral and derived alleles
Ancestral alleles were determined using outgroup species. For loci where only one allele was represented in the wild relatives, that allele was determined as ancestral. Loci where both alleles were represented among the outgroup species were considered ancient polymorphisms that must have arisen at least 2 MYA, before the separation of the Bison, Yak, Banteng and B. taurus.
Genomic position of Parallele polymorphisms
The genomic position of all Parallele SNPs were determined by comparing the flanking sequence to the Bovine genome (Btau_3.1) scaffolds using the BLAT algorithm . Results are presented in the genome browser of the Interactive Bovine In Silico SNP (IBISS) database.
A computer simulation was developed to determine the probability that the observed differences in allele frequencies between breeds were due to finite Ne without selection. A diploid population, of Ne = 50,000 was simulated with mutation and recombination until an equilibrium was reached. Then Ne was reduced to 1,000, was simulated for 1,000 generations. In reality, estimated values of Ne for early domestic B. taurus some 2,000 generations ago was closer to 1,500 . However, for computational ease N = 1,000 and 1,000 generations was chosen. Each individual in the population consisted of 29 pairs of chromosomes, and was either male or female (probability 0.5). Each chromosome was 100 cM long, and had 301 marker loci, which resulted in a similar number of polymorphisms to the real dataset. A pair of parents of different sex was randomly chosen from the population to create each offspring. For each parent in a mating pair, a gamete was formed from its chromosome pairs by sampling the number of crossovers for each chromosome pair from a Poisson distribution, with mean of 1.0. Crossover points were randomly positioned along chromosome pairs. The haploid gametes were mutated at a rate of 5 × 10-9 per locus per gamete per generation. If a locus was mutated, a new allele was added.
To model contemporary B. taurus breeds, the simulated population was subdivided in two at generation 900, both with Ne = 200. These populations were simulated without inter mating for a further 100 generations, thus generating an inbreeding coefficient (F = 1-(1-1/2Ne)g), where g = generations, relative to generation 900 of F = 0.22. In generation 1000, the difference in allele frequency was calculated for each marker. The X chromosome was not included in the simulation due to difficulties associated with the difference in Ne for this chromosome.
Testing the frequency distribution of B. taurus SNP against those predicted under the neutral model was completed using a paired t-test to determine whether the mean value of θH was significantly larger than θπ. Traditionally significance tests were completed using null distributions generated from computer simulations. Given the large number of loci used, the central limit theorem predicts that the test statistic will be close to a t-distribution even if the allele frequencies are not normally distributed.
where u = mutation rate per year and t = generation time in years. Therefore, if u is 2.2e-9 from the estimate of mammalian mutation rates by Subramanian and Kumar  and t is the average divergence time for Bison, Yak and Banteng, which is roughly 2.1 MY  then the probability for a double mutation between any of these two species and B. taurus is 0.005. Thus, for 7,500 bases we would expect 37 such mutations. Hence, the inferred ancestral allele should be correct over 97% of the time and are therefore assumed to be correct.
The H statistics estimated for Angus and Holstein cattle were tested for evidence of differences in the frequency spectrum of derived alleles in the two populations using a t-test. Differences in the H statistic between breeds may be due to increased selective pressure or possibly indicate differences in population substructure.
Frequency spectrum, the genomic distribution of derived alleles and distinguishing between the effects of positive selection, inbreeding and ascertainment bias
Fay and Wu's H statistic makes predictions regarding the frequency distribution of derived alleles under neutrality, which can be affected by population substructure, ascertainment bias or positive selection. By cataloguing the variation expected and observed with the frequency spectrum of derived alleles, inferences can be made regarding deviations from neutrality. Therefore, plots examining the spectrum of the derived polymorphism f(p) against the allele frequency (p) were compared with the theoretical value of derived polymorphisms under neutrality e(p), which is calculated as k/p and is modified from equation 4 in Fay and Wu , where k was chosen to match the observed values of f(p) as closely as possible. By comparing the frequency of expected and observed values of f(p), observations regarding positive selection and population substructure can be made.
One drawback of using H is that ascertainment bias can affect the frequency spectrum if there is a bias for common alleles. However, because derived and ancestral alleles are known it should be possible to examine the frequency spectrum of derived alleles devoid of any affects by ascertainment bias. The ratio f(1-p)/[f(p)+f(1-p)] should not be affected by ascertainment bias because this ratio relies on knowing the derived and ancestral allele and this information is irrelevant during SNP discovery. Therefore, f(1-p)/[f(p)+f(1-p)] was compared with the theoretical spectrum value l(p), which under neutrality was simply the proportion of derived alleles that were found to be the most common allele, or alternatively, p against p. This can be restated as: Because f(p) = k/p, f(1-p)/[f(p)+f(1-p)] = k/(1-p)/[k/p + k/(1-p)] = p.
Due to the influence of demographics on Fay and Wu's H statistic and the frequency spectrum of derived alleles, we were interested in detecting whether positive selection had driven the frequency of derived alleles in either breed of B. taurus at specific regions of the genome. Therefore, plots of derived allele frequency and the genomic position were examined for clustering of high frequency derived alleles. An autocorrelation of derived allele frequencies between one locus and the next on the chromosome was also completed. A positive autocorrelation should indicate that high frequency derived alleles are clustered on the genome and this may be evidence of genetic hitchhiking. To test if the observed autocorrelation could be due to inbreeding, we have examined the autocorrelation in Angus and Holstein and the simulated populations, which have been modelled to demonstrate the effect of inbreeding in the absence of selection.
Ancestral polymorphisms and neutral evolution
The frequency distribution of ancestral polymorphisms was examined. Ancestral polymorphisms are those that have been found to be polymorphic in Domestic cattle and also vary between the wild relatives (Bison, Yak and Banteng), and hence no ancestral allele could be determined for these sites. The frequency distribution of allele frequencies at these ancestral polymorphisms should follow a neutral model with a relatively flat distribution as they would have been segregating for ~2 million years, or since all species last shared a common ancestor. Thus the J shaped curve expected for derived mutants is not expected and a flat distribution is predicted under neutrality, as these polymorphisms are ancient with no influx of new mutations. Therefore, plots examining the frequency spectrum of ancestral alleles f(p) were compared with p and the theoretical value a(p), which was simply the mean value for f(p).
Differences between breeds in allele frequency
where Hs is the expected heterozygosity averaged across all populations and Ht = the expected heterozygosity expected for the total population. Thus, Fst must vary from 0 to 1, which at the extremes represent fixation of alleles in different populations. If derived alleles have been fixed in one population and not in the other, as a result of positive selection, one might expect reasonably high values of Fst for alleles with extreme allele frequencies. Alternatively, if these alleles have drifted to fixation the frequency distribution of Fst should be independent of allele frequency.
We thank staff from the Toronga open plains zoo, Claire Gill and Jeannette Muntwyler for providing the outgroup samples. Thanks to Amanda Chamberlain, Lakshmi Sethuraman and Jodie Ryan for extracting and preparing DNA for SNP analysis by Parallele. We also thank Keith Savin for work identifying genomic positions of SNP using existing human and cattle web resources, and Gemma Payne plus two anonymous reviewers for their contributions to editing the manuscript. The authors acknowledge the early access under the Fort Lauderdale conventions to the draft bovine genome sequence provided by the bovine genome sequencing project consortia and in particular Richard Gibbs and George Weinstock and the rest of the Baylor Human Genome Sequencing Center.
- Ward TJ, Honeycutt RL, Derr JN: Nucleotide sequence evolution at the kappa-casein locus: evidence for positive selection within the family Bovidae. Genetics. 1997, 147 (4): 1863-72.PubMed CentralPubMedGoogle Scholar
- Beja-Pereira A, Luikart G, England PR, Bradley DG, Jann OC, Bertorelle G, Chamberlain AT, Nunes TP, Metodiev S, Ferrand N, Erhardt G: Gene-culture coevolution between cattle milk protein genes and human lactase genes. Nat Genet. 2003, 35 (4): 311-3. 10.1038/ng1263.View ArticlePubMedGoogle Scholar
- Van Laere AS, Nguyen M, Braunschweig M, Nezer C, Collette C, Moreau L, Archibald AL, Haley CS, Buys N, Tally M, Andersson G, Georges M, Andersson L: A regulatory mutation in IGF2 causes a major QTL effect on muscle growth in the pig. Nature. 2003, 425 (6960): 832-6. 10.1038/nature02064.View ArticlePubMedGoogle Scholar
- Wiener P, Burton D, Ajmone-Marsan P, Dunner S, Mommens G, Nijman IJ, Rodellar C, Valentini A, Williams JL: Signatures of selection? Patterns of microsatellite diversity on a chromosome containing a selected locus. Heredity. 2003, 90 (5): 350-8. 10.1038/sj.hdy.6800257.View ArticlePubMedGoogle Scholar
- Andersson L, Georges M: Domestic-animal genomics: deciphering the genetics of complex traits. Nat Rev Genet. 2004, 5 (3): 202-12. 10.1038/nrg1294.View ArticlePubMedGoogle Scholar
- Cohen-Zinder M, Seroussi E, Larkin DM, Loor JJ, Everts-van der Wind A, Lee JH, Drackley JK, Band MR, Hernandez AG, Shani M, Lewin HA, Weller JL, Ron M: Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle. Genome Res. 2005, 15 (7): 936-44. 10.1101/gr.3806705.PubMed CentralView ArticlePubMedGoogle Scholar
- Lynn DJ, Freeman AR, Murray C, Bradley DG: A genomics approach to the detection of positive selection in cattle: adaptive evolution of the T-cell and natural killer cell-surface protein CD2. Genetics. 2005, 170 (3): 1189-96. 10.1534/genetics.104.039040.PubMed CentralView ArticlePubMedGoogle Scholar
- Pollinger JP, Bustamante CD, Fledel-Alon A, Schmutz S, Gray MM, Wayne RK: Selective sweep mapping of genes with large phenotypic effects. Genome Res. 2005, 15 (12): 1809-19. 10.1101/gr.4374505.PubMed CentralView ArticlePubMedGoogle Scholar
- Li MH, Adamowicz T, Switonski M, Ammosov I, Ivanova Z, Kiselyova T, Popov R, Kantanen J: Analysis of population differentiation in North Eurasian cattle (Bos taurus) using single nucleotide polymorphisms in three genes associated with production traits. Anim Genet. 2006, 37 (4): 390-2. 10.1111/j.1365-2052.2006.01479.x.View ArticlePubMedGoogle Scholar
- Hayes BJ, Lien S, Nilsen H, Gro Olsen H, Berg P, MacEachern S, Potter S, Meuwissen THE: The origin of selection signatures on bovine chromosome six. Anim Genet. 2008, 39 (2): 105-11. 10.1111/j.1365-2052.2007.01683.x.View ArticlePubMedGoogle Scholar
- Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002, 12 (12): 1805-14. 10.1101/gr.631202.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimura M: The neutral theory of molecular evolution. 1983, Cambridge: Cambridge University PressView ArticleGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123 (3): 585-95.PubMed CentralPubMedGoogle Scholar
- Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155 (3): 1405-13.PubMed CentralPubMedGoogle Scholar
- Akashi H: Within- and between-species DNA sequence variation and the 'footprint' of natural selection. Gene. 1999, 238 (1): 39-51. 10.1016/S0378-1119(99)00294-2.View ArticlePubMedGoogle Scholar
- Przeworski M: The signature of positive selection at randomly chosen loci. Genetics. 2002, 160 (3): 1179-89.PubMed CentralPubMedGoogle Scholar
- Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science. 2006, 312 (5780): 1614-20. 10.1126/science.1124309.View ArticlePubMedGoogle Scholar
- Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105 (2): 437-60.PubMed CentralPubMedGoogle Scholar
- Gautier M, Faraut T, Moazami-Goudarzi K, Navratil V, Foglio M, Grohs C, Boland A, Garnier J, Boichard D, Lathrop GM, Gut IG, Eggen A: Genetic and haplotypic structure in 14 European and African cattle breeds. Genetics. 2007, 177 (2): 1059-1070. 10.1534/genetics.107.075804.PubMed CentralView ArticlePubMedGoogle Scholar
- MacEachern S, McEwan J, Goddard M: Phylogenetic reconstruction and the identification of ancient polymorphism in the Bovini tribe (Bovidae, Bovinae). BMC Genomics. 2009, 10: 177-10.1186/1471-2164-10-177.PubMed CentralView ArticlePubMedGoogle Scholar
- McClure MC, Schnabel RD, Morsci NS, Kim JW, Sellner EM, Yao P, Taylor JF: Whole Genome Mapping For Marbling QTL In A Commercial Angus Cattle Population. Plant & Animal Genome XV Conference. 2007, Town & Country Hotel, San Diego, CA, P524-Google Scholar
- Website, Bovine QTL Viewer. [http://genomes.sapac.edu.au/bovineqtl/index.html]
- Bradley DG, Loftus RT, Cunningham P, Machugh DE: Genetics and Domestic Cattle Origins. Evolutionary Anthropology. 1998, 6 (3): 79-86. 10.1002/(SICI)1520-6505(1998)6:3<79::AID-EVAN2>3.0.CO;2-R.View ArticleGoogle Scholar
- Diamond J: Evolution, consequences and future of plant and animal domestication. Nature. 2002, 418 (6898): 700-7. 10.1038/nature01019.View ArticlePubMedGoogle Scholar
- Bruford MW, Bradley DG, Luikart G: DNA markers reveal the complexity of livestock domestication. Nat Rev Genet. 2003, 4: 900-910. 10.1038/nrg1203.View ArticlePubMedGoogle Scholar
- Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L: Population History and Natural Selection Shape Patterns of Genetic Variation in 132 Genes. PLOS Biol. 2004, 2 (10): 1591-1599. 10.1371/journal.pbio.0020286.View ArticleGoogle Scholar
- MacEachern S, McEwan J, McCulloch A, Mather A, Savin K, Goddard M: Molecular evolution of the Bovini tribe (Bovidae, Bovinae): Is there evidence of rapid evolution or reduced selective constraint in Domestic cattle?. BMC Genomics. 2009, 10: 179-10.1186/1471-2164-10-179.PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar S, Subramanian S: Mutation rates in mammalian genomes. Proc Natl Acad Sci USA. 2002, 99 (2): 803-8. 10.1073/pnas.022629899.PubMed CentralView ArticlePubMedGoogle Scholar
- Ward TJ, Skow LC, Gallagher DS, Schnabel RD, Nall CA, Kolenda CE, Davis SK, Taylor JF, Derr JN: Differential introgression of uniparentally inherited markers in bison populations with hybrid ancestries. Anim Genet. 2001, 32 (2): 89-91. 10.1046/j.1365-2052.2001.00736.x.View ArticlePubMedGoogle Scholar
- Halbert ND, Ward TJ, Schnabel RD, Taylor JF, Derr JN: Conservation genomics: disequilibrium mapping of domestic cattle chromosomal segments in North American bison populations. Mol Ecol. 2005, 14 (8): 2343-62. 10.1111/j.1365-294x.2005.02591.x.View ArticlePubMedGoogle Scholar
- Takahata N, Satta Y, Klein J: Divergence time and population size in the lineage leading to modern humans. Theor Popul Biol. 1995, 48 (2): 198-221. 10.1006/tpbi.1995.1026.View ArticlePubMedGoogle Scholar
- Friend JB, Bishop D: Cattle of the World in Colour. 1978, Poole: Blandford PressGoogle Scholar
- de Roos APW, Hayes BJ, Spelman RJ, Goddard ME: Linkage disequilibrium and persistence of phase in Holstein Friesian, Jersey and Angus cattle. Genetics. 2008, 179: 1503-1512. 10.1534/genetics.107.084301.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ: BLAT – the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-64.PubMed CentralView ArticlePubMedGoogle Scholar
- Goddard M, Hayes BJ, McPartlan HC, Chamberlain AT: Can the same genetic markers be used in multiple breeds. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production. 2006, Belo Horizonte, Brazil, 14-22.Google Scholar
- Subramanian S, Kumar S: Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res. 2003, 13 (5): 838-44. 10.1101/gr.1152803.PubMed CentralView ArticlePubMedGoogle Scholar
- Frankham R, Ballou JD, Brisoce DA: Introduction to Conservation Genetics. 2002, Cambridge: Cambridge University PressView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.