- Research article
- Open Access
A novel analytical method, Birth Date Selection Mapping, detects response of the Angus (Bos taurus) genome to selection on complex traits
BMC Genomicsvolume 13, Article number: 606 (2012)
Several methods have recently been developed to identify regions of the genome that have been exposed to strong selection. However, recent theoretical and empirical work suggests that polygenic models are required to identify the genomic regions that are more moderately responding to ongoing selection on complex traits. We examine the effects of multi-trait selection on the genome of a population of US registered Angus beef cattle born over a 50-year period representing approximately 10 generations of selection. We present results from the application of a quantitative genetic model, called Birth Date Selection Mapping, to identify signatures of recent ongoing selection.
We show that US Angus cattle have been systematically selected to alter their mean additive genetic merit for most of the 16 production traits routinely recorded by breeders. Using Birth Date Selection Mapping, we estimate the time-dependency of allele frequency for 44,817 SNP loci using genomic best linear unbiased prediction, generalized least squares, and BayesCπ analyses. Finally, we reconstruct the primary phenotypes that have historically been exposed to selection from a genome-wide analysis of the 16 production traits and gene ontology enrichment analysis.
We demonstrate that Birth Date Selection Mapping utilizing mixed models corrects for time-dependent pedigree sampling effects that lead to spurious SNP associations and reveals genomic signatures of ongoing selection on complex traits. Because multiple traits have historically been selected in concert and most quantitative trait loci have small effects, selection has incrementally altered allele frequencies throughout the genome. Two quantitative trait loci of large effect were not the most strongly selected of the loci due to their antagonistic pleiotropic effects on strongly selected phenotypes. Birth Date Selection Mapping may readily be extended to temporally-stratified human or model organism populations.
Several statistical tests have been developed to identify the genomic regions that have been subjected to strong recurrent selection. Most have been based on extreme population differentiation[1–3], the enrichment of rare mutations in the site frequency spectrum[4, 5], or patterns of extended haplotype homozygosity[6–8] (See[9, 10] for further review). These tests have now been used to detect molecular signatures of selection in cattle[11–16]. However, recently, there has been a call to employ polygenic models to simultaneously identify loci responding to selection but that do not fit the typical “hard sweep” paradigm[17, 18].
Concurrent with the development of new approaches for the detection of selective sweeps, the statistical models employed for genome-wide association studies have been improved. Some of the refinements deal with the effects of population structure and kinship between sampled individuals, since not accounting for these effects can significantly increase the number of false positive associations (See for review). It has been shown that fitting a genomic relationship matrix (also known as a kinship matrix) effectively prevents false positives due to population structure and kinship[19, 20]. Furthermore, there has also been a shift toward the application of polygenic models for the identification of genetic risk factors and variants associated with complex phenotypes[21, 22].
In this study, we merge the search for loci responding to selection with advanced genome-wide association models to quantify the genome-wide response to selection in US registered Angus cattle. We introduce a novel method, Birth Date Selection Mapping, for identifying loci that are responding to ongoing selection.
Selection induces changes in allele frequency for the selected mutation, as well as for neighbouring loci that hitchhike along with the selected locus due to the presence of linkage disequilibrium between the loci. Accordingly, individual allele counts (0, 1, or 2 for AA, AB, and BB genotypes) could be regressed on birth date using a Poisson model to identify loci that have rapidly changed in frequency over time. However, in the presence of any sampling bias (non-random ascertainment of family members in time, population structure, or kinship), this approach suffers from a very high false-positive rate of detection of loci subject to selection (Additional file1: Figures S1 and S2). The bias results from a pedigree-based stratification in the depth of sampling of DNA on individuals within different families and differences in allele frequencies between families such that the differences in allele frequencies between families are partially confounded with differences in allele frequencies in time. In other words, this approach is confounded by pedigree relationships and the non-random sampling of individuals from families at different time points. The use of a mixed linear model with allele counts or frequencies fit as the dependent variable and a random animal term fit using a numerator or genomic relationship matrix does not solve the problem because any time-dependent trend in allele frequency is now incorporated into the solutions for animal effects (data not shown).
In our approach, rather than regressing allele frequencies (dependent variable) on birth date (explanatory variable), we invert the relationship and fit birth date as the dependent variable and identify SNPs that are strongly associated or predictive of birth date. If a neutral DNA variant is drifting through a population, changes in allele frequency will be stochastic and small provided the effective population size (N e ) is large. For these variants, the probability of a specific genotype will be approximately constant in time and knowledge of an individual’s genotype will not be strongly predictive of that individual’s birth date. On the other hand, if a variant is under strong directional selection, changes in allele frequency will be consistent in direction and may become large over several generations. For these variants, genotype will be predictive of birth date. For example, if the A allele is increasing in frequency in time, AA individuals are much more likely to be born recently than in the distant past. We utilize mixed model methods to account for the pedigree-based structure in our sample through the use of the genomic relationship matrix[19–21]. With the analysis framed from this perspective, we identify the SNPs that are changing in frequency due to selection while accounting for kinship within the sample. By so doing, we are the first to apply polygenic models for the detection of genomic imprints of selection.
Evidence of selection
Deregressed estimated breeding values (EBVs) for 16 production traits (see Supplementary Information for definitions and acronyms) were regressed on birth date (measured as a continuous variable with month and day converted to a decimal fraction of a year) for 3,570 registered Angus animals (Additional file1: Table S1, Figure1 and Additional file1: Figures S3-S16). For traits that can easily be appraised and for which expected progeny differences (EPDs, an EPD is one half of the EBV) were implemented earlier in the development of the breed (e.g., growth and stature), selection has significantly changed the breed additive genetic mean over time. For traits for which increased production has consistently been sought by producers, such as weaning weight, yearling weight, and milk, additive genetic means have increased linearly (Additional file1: Table S1; Figure1b, Additional file1: Figure S4, and S10). However, additive genetic means for birth weight, yearling height, mature weight, and mature height increased until the mid-1980s when breeders recognized the detrimental effects of large birth weights on calving difficulty and large mature size on cow maintenance requirements and fertility, and these traits were subsequently selected to decrease (Figures1a, Additional file1: Figures S5, S11, and S12). For these traits, the quadratic regression models have a much smaller Akaike Information Criterion (AIC), larger adjusted R2 values, and smaller p-values (Additional file1: Table S1). Traits with recently developed EPDs, such as docility and heifer pregnancy rate, show little change in additive genetic mean over time (Additional file1: Figures S7 and S8). Docility and heifer pregnancy rate had among the smallest R2 values of all the fitted linear and quadratic regression models. Additive genetic means for growth traits (WW, YW, and CW) and for the incidence of unassisted births (CED and CEM) have increased annually. Weaning weight has increased, on average, by 2.81 pounds per year and the rate of unassisted births (CED) has increased by 0.56% per year—remarkable achievements by Angus breeders considering the 50-year span of these data.
Birth Date Selection Mapping
We applied our Birth Date Selection Mapping method to this data set using three mixed models. We first estimated allele substitution effects (ASEs) for birth date for 45,073 SNPs using genomic best linear unbiased prediction (GBLUP)[21, 24, 25] applied to 3,570 registered Angus cattle, but we do not report results for the 256 SNPs that map to unassigned contigs in the UMD3.1 reference assembly. The GBLUP analysis simultaneously fits all SNPs as random effects and does not estimate p-values for tests of significance of individual SNPs. Rather, ASEs were converted to estimates of additive genetic variance associated with each SNP and plotted (Figure2).
We next used EMMAX to individually estimate SNP ASEs as fixed effects and q-values representing the expected proportion of false positives among all SNP effects as extreme as observed for the current SNP. Compared to the Poisson regression of allele counts on birth date (Additional file1: Figure S2), the significance values were not inflated for this analysis (Additional file1: Figure S17), demonstrating this analysis appropriately models kinship relationships. Additional file1: Figure S17 further demonstrates that we have sufficient power to identify significant associations. This approach is quite conservative and indicates that strong selection has caused large changes in allele frequency at only a small number of loci on chromosomes 1, 2, 3, 6, 20, 21, 22, 23, 24, and 29 (Figure3). The two peaks on chromosome 23 contain the major histocompatibility complex (MHC) and numerous olfactory receptors.
Finally, we used GenSel to fit a non-linear BayesCπ model in which the parameter π estimates the proportion of SNPs that are not associated with the dependent variable. We estimated π to be 0.979, and thus 2.11% (948) of the SNPs were estimated to be predictive of birth date and therefore putatively exposed to strong selection. BayesCπ employs a MCMC approach in which 1-π of the SNPs are sampled for inclusion in the model in each chain and the jointly estimated SNP ASEs are finally shrunk according to the proportion of times each SNP is retained in the selected model. Thus, SNPs that are rarely retained in the model have their ASEs strongly shrunk towards zero. This analysis found strongly selected loci on all chromosomes (Figure4).
The SNP ASEs estimated with GBLUP and EMMAX had a Spearman correlation of 0.92 and a Pearson correlation of 0.78. The difference in ASE magnitude identified by the Pearson correlation essentially reflects the difference in estimates achieved due to fitting SNPs as uncorrelated random or fixed effects. The SNP ASEs estimated by GBLUP and BayesCπ had a Spearman correlation of 0.85, but had a Pearson correlation of only 0.56 due to the strong shrinkage of large effect SNPs in GBLUP (due to the assumption that all SNPs are drawn from a distribution with a common variance) and the strong shrinkage of small effect SNPs in BayesCπ. The EMMAX and BayesCπ ASEs had a Spearman correlation of 0.83 and Pearson correlation of 0.46. In the GBLUP and EMMAX analyses, the multilocus genotypes explained 0.534 and 0.531 of the variance in birth date (i.e., the “heritability” of birth date), respectively, as estimated using restricted maximum likelihood (REML) estimation of variance components. In the BayesCπ analysis, genotypes explained 0.717 of the variance in birth date.
Effective population size and drift
To ascertain whether drift is a significant force influencing allele frequency changes within the artificially selected US Angus breed, we estimated the inbreeding effective population size, under the neutral model, and modelled the effects of drift on neutral loci. Using a pedigree of up to 63 generations and which comprised 91,001 Angus animals including the 3,570 genotyped animals and all known ancestors, we estimated the generation interval for US Angus cattle to be 4.99 years, which was the average age of animals born between 1941 and 1990 at the birth of their male and female registered progeny. From this pedigree, we also estimated inbreeding coefficients (denoted as F) for all animals from which we estimated effective population size from the regression of F on generation number. From a principal component analysis of the SNP genotypes, we identified two distinct subgroups within our sample. In Additional file1: Figure S18, we identified the Wye Angus herd which was formed from an importation of bulls from the British Isles and then closed to new germplasm in 1958 as a group that was distinct from the remaining US registered Angus cattle. The inbreeding effective population size for the Wye herd was estimated to be 36.41 ± 0.03, whereas the estimate for the remaining North American Angus was 267.59 ± 0.02 using animals born after 1930 and 116.15 ± 0.04 using animals born after 1980 (Figure5a and Table1). For each of the 44,817 SNPs, we constructed a test (see Methods) to determine whether the observed change in allele frequency could be explained by drift or alternatively must be due to selection. From this analysis, we found that the observed allele frequency changes exceeded the likely changes due to drift for 84.60% of the 44,817 SNPs.
We also compared genomic with pedigree estimates of F. The realized genomic F have a larger variance (s2 = 0.0023) than do the pedigree F (s2 = 0.0014), and the two measures of F had a Pearson correlation of 0.65 (Figure5b), which is consistent with the underestimation of pedigree F due to the assumption that F is zero for all individuals in the base generation, pedigree errors, and incomplete pedigree information. We regressed pedigree F on genomic F, and found the slope of the regression to be 0.49 ± 0.01. Separate regressions for the Wye and North American Angus revealed pedigree and genomic F coefficients to be more similar for the Wye herd than for the remaining North American Angus cattle (see Figure5b and Table2). When genomic F was regressed on pedigree F, the slope of the regression was 0.85 ± 0.02 with an adjusted R2 of 0.42.
Because estimates of genomic F coefficients are based on fewer assumptions than are pedigree estimates (which require neutrality), we also estimated N e using the genomic F coefficients for North American Angus animals born after 1980. This resulted in an N e of 94.18 ± 0.10 (Table1). Using this N e in our drift test, we estimated that allele frequency changes exceeded those likely due to drift for 82.41% of the 44,817 SNPs.
Applying Birth Date Selection Mapping to smaller sample sizes
We appreciate that not all species will have the large samples that are available for agriculturally important species for Birth Date Selection Mapping. To assess the effects of sample size and birth date range, we created four subsamples from our data set that were analysed using EMMAX. The first subsample contained 1,237 animals from pedigree generations 58, 59, and 60 (mean birth date 2004.84 ± 3.03; range 1993.16 to 2008.66). The second contained 60 animals, consisting of 20 animals randomly sampled from each of pedigree generations 58, 59, and 60 (mean birth date 2005.16 ± 2.20; range 1999.14 to 2008.01). The third consisted of 1,237 animals randomly sampled from the entire data set (mean birth date 1999.10 ± 9.00; range 1955.99 to 2008.64). The final sample included 60 animals randomly sampled from the entire data set (mean birth date 1998.77 ± 8.56; range 1974.34 to 2008.10). Additional file1: Figures S19 and S20 show that only the third data set had sufficient power to identify loci under selection with genome-wide significance. Comparing Additional file1: Figures S20A and S20D suggests that increasing the time period over which individuals are sampled improves power more than does increasing sample sizes using contemporary individuals. Selection would likely have to be extremely strong and focussed on monogenic traits to identify selected loci using small samples of contemporaries (Additional file1: Figures S19B and S20B).
Connecting selected phenotype to selected genotype
We analysed deregressed EBVs in a weighted analysis for 16 production traits (Supplementary Information) using data provided by the American Angus Association (AAA) under an animal model that incorporated a genomic relationship matrix and from which we estimated the proportion of additive genetic variance explained by the SNP markers (Table3). With the exception of two QTLs on chromosomes 7 and 20, most genes influencing variation in growth traits in Angus cattle are of small effect (Figures6, Additional file1: Figures S21-S35). The most likely location of the pleiotropic QTL on chromosome 7 was found to be at 93.22 Mbp in the GBLUP analyses, and the SNP at this location, rs110059753, was among the most strongly associated of all SNPs with CED, BW, WW, YW, HP, CEM, MILK, MW, MH, CW, MARB, RE, and FAT (Figures6, Additional file1: Figures S21, S22, S23, S27, S28, S29, S30, S31, S32, S33, S34, S35). However, the strongest selection signal found on this chromosome was found at 99.02 Mbp (Figure2) by GBLUP, at 100.02 Mbp by BayesCπ (Figure4), and a small, but not significant, birth date selection signal was found at 99.02 Mbp in the EMMAX results (Figure3). The birth date effect for rs110059753 was ranked 11,224 out of the 44,817 SNP effects (75th ASE percentile). The most likely location of the pleiotropic QTL on chromosome 20 was estimated to be at the position of SNP rs43711332 at 4.62 Mbp (affecting CED, BW, WW, YW, YH, CEM, MW, MH, and CW; Figures6, Additional file1: Figures S21, S22, S23, S24, S28, S30, S31, and S32). Selection signals were detected at 5.1 Mbp in the GBLUP and at 5.9 Mbp in the EMMAX analyses of birth date. The birth date effect for SNP rs43711332 was ranked 5,168 out of the 44,817 SNP effects (88th ASE percentile).
To assess the identity of the trait or combination of traits that have historically been under selection in Angus cattle and that produced the molecular signals of selection, we individually regressed the SNP ASEs for birth date on standardized SNP ASEs (see Methods) for all 16 production traits for which the AAA routinely produces EPDs (Additional file1: Table S2) using generalized least squares accounting for the pair-wise LD among SNPs. We fit this model for the 948 SNPs (935 SNPs after LD pruning) with the largest birth date variances corresponding to the 1-= 0.0211 proportion of SNPs detected to be under strong selection in the BayesCπ analysis of birth date. Growth traits (WW, YW, and CWT), milk, marbling and calving ease (CED and CEM) had the largest adjusted coefficients of determination and relative selection intensities (Additional file1: Table S2). However, the models for birth weight, docility, and yearling height were not significant (Bonferroni corrected α = 0.003125 = 0.05/16). A multiple regression with all 16 traits fit jointly produced an adjusted R2 of 0.6625, which is only slightly (0.0457) larger than the R2 for the weaning weight model. Individual terms from the multiple regression are not reported due to multicollinearity between several traits.
Finally, to elucidate the biological processes associated with the genes located in the genomic regions detected to be under selection, we analysed the gene ontology term enrichment for the annotated genes within these regions (See Additional file2 and Additional file3). From the GBLUP results, we queried 2,074 genes within 100 Kbp of the top ranked 948 SNPs for their birth date ASEs, and from the BayesCπ results, we queried 1,996 genes within 100 Kbp of the top 948 SNPs. There were 1,059 genes shared between the two lists, and this list of genes was also queried using the DAVID Bioinformatics tools. Various biological processes appear to be under selection based on the intersection of the GBLUP and GenSel results—notably cellular metabolic process, biosynthetic process, translation, protein folding, regionalization, ectoderm development, leukocyte mediated immunity, and striated muscle cell proliferation (Additional file4 contains the complete list). The intersection of the GBLUP and GenSel results also found olfactory transduction, antigen processing and presentation, and the adipocytokine signalling pathway to be enriched KEGG pathways. In addition to these terms, cell proliferation, spermatogenesis, and organ growth were enriched gene ontology terms from the GBLUP results (Additional file2 contains the complete list). Developmental process, anatomical structure development, cellular response to stress, response to oxidative stress, positive regulation of lymphocyte activation, and limb morphogenesis were additional enriched gene ontology terms from the GenSel results (Additional file3 contains the complete list).
Artificial selection has increased the weights at which cattle are marketed either at weaning or yearling ages (Figures1, Additional file1: Figures S4, and S13) while simultaneously decreasing the incidence of assisted births (Additional file1: Figures S3 and S9), and the trends observed in our data set are very similar to those reported for the entire Angus breed. Larger birth weights and yearling heights are both strongly associated with increased calving difficulty and genetic trend increased both traits until about the mid-1980s, after which both began to decrease (Figures1 and Additional file1: Figures S5). Breeders did not directly select to increase birth weight, but it increased as a correlated response to selection for increased weaning and yearling weights. Some breeders selected for increased yearling height to produce Angus cattle more comparable in frame size to the Continental European breeds, which were imported into the US during the 1970s. However, once breeders appreciated the undesirable correlated response in calving ease, selection was practised to increase weaning and yearling weights while maintaining birth weight and yearling height constant.
Using EMMAX, only eleven loci were found to be significantly associated with birth date and thus under strong selection, but all loci simultaneously explained 53% of the variation in birth date. On the other hand, BayesCπ estimated that 2.11% of the SNPs were strongly associated with birth date, but all SNPs explained 72% of the variance in birth date. The difference in the heritability estimates between the GBLUP or EMMAX analyses compared to the BayesCπ analysis reflects the different model assumptions underlying these analyses. Whereas GBLUP and EMMAX assume the infinitesimal model under which all SNP ASEs are drawn from a distribution with constant variance[33, 34], BayesCπ begins with a distribution with constant variance but shrinks the variance for small effect SNPs that are rarely fit in the model. As a consequence, GBLUP and EMMAX regress all SNP ASEs equally towards the mean of zero, while BayesCπ more aggressively regresses small effect SNPs and less aggressively regresses large effect SNPs, leading to a better model fit—as was found here—when there are loci under very strong selection. In the absence of selection, genotype frequency should be independent of time provided that the effects of drift are negligible and the heritability estimates should be close to zero. However, this was not the case for US registered Angus cattle and we conclude that a significant number of loci are rapidly responding to selection (our results suggest 2.11%) and that most of the genome (82.4% from the drift analysis) is responding more slowly to selection. Furthermore, in our nonlinear BayesCπ model, 72% of the variation in birth date could be explained by simultaneously using all SNP genotypes, suggesting that there are loci under very strong selection (large effect loci) that are not appropriately fit by the infinitesimal model.
The SNP ASEs for the 16 analysed traits indicate that, with the exception of the two large effect QTLs on BTA 7 and 20, the vast majority of QTLs underlying quantitative traits in beef cattle are of small effect. Of considerable interest, neither of these QTLs was found to be under very strong selection and this seems to be because of their large antagonistic pleiotropic effects on growth and calving difficulty. We postulate that when multiple traits are simultaneously selected, the genetic architecture of the population defined by pleiotropy and the chromosomal organization of QTL alleles (phase effects) constrains both the phenotypic and genotypic response to selection.
For selection to be effective, the selection intensity and effective population size must be sufficiently large to overcome the effects of genetic drift. We demonstrate that US registered Angus cattle have a sufficiently large effective population size to enable successful artificial selection, but more importantly, that large intergenerational changes in allele frequency are unlikely to occur due to drift alone. Furthermore, we found a considerable disparity between pedigree and genomic estimates of inbreeding coefficients. While others have argued that genomic relationship matrices should be adjusted to more closely resemble pedigree relationship matrices, we assert that genomic relationship matrices provide a more accurate representation of the realized relationships among individuals that result from the Mendelian sampling of parental gametes and selection. The use of genomic relationship matrices in place of pedigree relationship matrices avoids the assumption of neutrality of loci both in the estimation of inbreeding coefficients and for the mean value of gametes inherited by progeny—both of which are assumed for the computation of the numerator relationship matrix. The disagreement between genomic and pedigree estimates of F coefficients is likely to be due to the assumption that base animals are not inbred, errors in the pedigrees, and missing pedigree information likely due to the large-scale importation of Canadian Angus cattle in the 1940s and 1950s that were not carriers of dwarfism alleles, which had been driven to high frequency due to selection at the time. This is supported by the closer agreement between pedigree and genomic F coefficients for the Wye herd that was largely derived from British stock with more complete pedigree records than the remaining US registered Angus cattle (Table2 and Figure5b).
We attempted to identify the relative selection intensities placed on each selected trait via the imprints that multi-trait selection had left on the Angus genome. Although this analysis assumed no change in relative selection intensities in time, an assumption that is clearly violated in view of the genetic trends in birth weight and yearling height, we were able to confirm that growth traits have historically been the most strongly selected in US registered Angus cattle. Because Angus is considered to be a maternal breed (i.e., motherly, used as dams in commercial beef production), it is logical that loci that influence calving ease, growth to weaning, and milking ability should have been found to be under selection. Angus breeders have successfully selected to increase calving ease and body weight by selecting for body shapes that allow a calf’s easy passage through the dam’s pelvis. This is supported by the finding of an enrichment of gene ontology terms related to limb morphogenesis and anatomical structure development within regions of the genome detected as responding to selection. It has previously been shown that calving ease is negatively correlated with several body measures, such as head circumference, head width, hip width, hip height, chest girth, and cannon bone circumference[38–40]. Likely due to the Certified Angus Beef’s emphasis on quality grade, Angus breeders have more recently selected to increase marbling. Conversely, traits such as fat thickness, docility, and heifer pregnancy rate have not been as intensely selected as growth traits, due to the differing breeding objectives of beef producers, genetic antagonisms constraining selection response, and the historic difficulty in collecting field data to allow the development of EPDs for these traits.
There is also evidence that natural selection has occurred in this population. The gene ontology enrichment results indicate that genes affecting immune response, such as the MHC, NOD2, C3, and DBH, have strongly responded to selection (Additional file2, Additional file3, and Additional file4), presumably due to the exposure of Angus cattle to novel pathogens following their introduction into the US in 1873 and the continuous co-evolutionary “arms race” between bovine and pathogen genomes[43, 44]. The Bovine HapMap Consortium found that the MHC had some of the lowest Fst values in the entire genome when compared between breeds. Our analyses have identified the MHC as being under strong selection. Taken together, these results suggest that the MHC or certain of the numerous olfactory receptors which occupy the same region on chromosome 23 are under strong convergent selection.
Furthermore, natural selection may also be attempting to buffer the cellular environment against the deleterious effects of inbreeding. We found that spermatogenesis was an enriched ontology term describing the function of genes within the strongly selected regions of the genome (Additional file3). Seminal plasma proteins have been associated with bull fertility, and selection may be increasing fertility to counter act the inbreeding depression of reproduction (alternatively, the use of AI may be selecting for increased fertility). Genes involved in response to oxidative stress were also identified; response to oxidative stress has been tied to mitigating inbreeding depression. We also inferred that at least 6 heat shock proteins are under selection in Angus and that protein folding was an enriched biological process (Additional file4). It has been hypothesized that heat shock proteins assist the organism to cope with protein instability and misfolding caused by homozygous nonsynonymous mutations that are elevated in frequency by inbreeding[46–50].
One of the greatest difficulties encountered in identifying genomic signatures of selection is in distinguishing changes that have occurred due to demographic as opposed to selective forces. Our Birth Date Selection Mapping approach utilizing mixed models specifically accounts for pedigree relationships and explicitly deconvolutes their confounding effects on time-dependent allele frequency changes, which are due to the fact that not all pedigrees are sampled equally deeply in terms of the numbers of genotyped individuals. Rather than fit generations as the dependent variable, which are poorly estimated when pedigrees are incomplete, fitting birth date allows unknown or complex pedigrees with overlapping generations to be analysed. Furthermore, the genomic relationship matrix accounts for pedigree relationships between samples, allowing closely related samples to be analysed. However, one of the limitations of Birth Date Selection Mapping is the requirement of a temporally stratified sample of genotyped individuals. The results for the analyses of the reduced data subsets suggest that sampling over extended time periods or large sample sizes—but not necessarily both—will be necessary to identify strongly selected loci. This will currently limit the utility of the approach in human populations due to a lack of preserved DNA samples across multiple generations. However, this limitation may be alleviated as it becomes more practical to extract quality DNA from formalin-fixed, paraffin-embedded tissue section samples and ancient remains. Nevertheless, Birth Date Selection Mapping is clearly most easily applied to organisms with temporally stratified DNA samples and genome-wide genotypes.
Using the estimated birth date ASEs as informative priors in the development of genomic selection programs is another interesting application of our method. Loci with small birth date ASEs are either of small effect on the selection objective or represent genes of large effect that have undesirable pleiotropic effects (or closely linked loci with antagonistic phase relationships). Loci that have large birth date ASEs must have large effects on the selection objective that are less constrained by antagonistic pleiotropic effects allowing them to more rapidly respond to selection.
When temporally stratified DNA samples are available, Birth Date Selection Mapping is an effective method for the identification of strongly selected loci. We demonstrate that selection on polygenic traits leaves detectable signatures of selection throughout the genome at which small changes in allele frequencies per generation have occurred. If genes with large antagonistic pleiotropic effects exist, they respond to multi-trait selection as if they were of small effect on the breeding objective as predicted by quantitative genetic theory. By relating the detected signatures of selection to phenotype, we infer that artificial selection in US registered Angus cattle has historically focussed primarily on growth and maternal traits including calving ease, weaning weight, and milking ability. This result was directly confirmed by the response to selection that has occurred in these traits that we directly estimated from EPDs provided by the AAA. Finally, our results suggest that natural selection has also acted in this domesticated population to increase immunity and possibly to buffer the organism against the effects of inbreeding depression.
DNA extraction and SNP genotyping
Cryopreserved semen was obtained from semen distributors, the National Animal Germplasm Program, and individual Angus breeders including the University of Maryland Foundation which owns the Wye herd. DNA was extracted using a proteinase K digestion, Phenol: Chloroform alcohol extraction, and ethanol precipitation. Single nucleotide polymorphisms were assayed using the Illumina BovineSNP50 BeadChip and genotyped using the Illumina GenomeStudio software. Genotypes were filtered using a SNP call rate threshold of 90%, animal call rate threshold of 95%, and minor allele frequency threshold of 0.01. Autosomal and pseudoautosomal SNPs that had a Hardy-Weinberg Chi-square statistic with 1 degree of freedom greater than 300 were also filtered—primarily to remove polymorphisms detected in copy number variants rather than remove loci that were under selection. Filtered data were processed through FastPHASE version 1.4.0 to impute the 0.49% of missing genotypes. Parameter values were set at T=10, K=20, with -eo flags set. The resulting dataset consisted of genotypes for 45,073 SNPs scored in 3,570 animals with no missing values.
Response to selection
Expected progeny differences for 16 production traits along with their accuracies were provided by the AAA for 103,816 animals including the 3,570 genotyped animals and all identified ancestors in their pedigrees. These values were doubled to obtain estimated breeding values that were deregressed for the 3,570 animals as previously described. The deregression of estimated breeding values removes parent average information and converts the information available on the individual back to the scale of the underlying phenotype—that is, it removes the “shrinkage” that was applied to convert phenotypes into estimated breeding values. In the statistical package R, trait breeding values were plotted against birth date. Linear and quadratic regressions were fit for each trait.
Principal component analysis of Angus genotypes
We used the smartpca program, part of EIGENSOFT, for principal component analysis of the Angus genotypes. We plotted principal component 1 by principal component 2 to visualize the largest elements of population substructure. Figure S18 revealed that the primary substructure detected in the population was the largest families—the linearly related members of the Wye herd and the ancestors and sons of N Bar Emulation EXT, a popular bull within the breed that generated numerous sons also used in artificial insemination.
Estimation of effective population size
Pedigree inbreeding coefficients (F) were estimated as a by-product of using the rapid algorithm for producing the inverse of a numerator relationship matrix. Genomic F were estimated by subtracting 1 from the diagonals of the genomic relationship matrix, which was estimated according to with base generation allele frequencies at each SNP estimated using the 59 animals born between 1955 and 1974, excluding all animals from the Wye herd.
The inbreeding effective population size N e was estimated from the regression of inbreeding coefficients on pedigree generation number using individual animal data. This requires inverting the relationship ΔF = 1/2N e , in which ΔF is the increase in mean inbreeding coefficient between adjacent generations and is estimated as the slope of the regression across all generations if N e is assumed constant in time. A Taylor series expansion leads to an estimate of the standard error of N e as SE(N e ) ≈ 2N e SE(ΔF), in which SE(ΔF) is the standard error of the estimated slope of the regression. Because the depth of available pedigree information varied substantially for the 3,570 sampled Angus animals (animals within the pedigree that were assigned to generation 0 varied in birth year from 1838 to 1954) we considered the estimates of pedigree generation to be unreliable from the perspective of estimating N e . Accordingly, we estimated generation number for each of the 3,570 genotyped animals by subtracting 1950 from their birth year and dividing by the generation interval of 5 years. Because of the closed nature of the Wye herd and complete pedigree information back to foundation animals, we fit separate models for the Wye and remaining North American Angus animals. For the North American Angus subset, we fit two models using generation number estimated from birth year for animals born after 1930 and for animals born after 1980 where there appeared to be an inflection in the rate of increase in inbreeding. This corresponds to the point in time when the increased use of artificial insemination became significant within the breed.
For each of the 44,817 SNPs, we directly estimated the change in allele frequency that occurred between the 460 individuals assigned to pedigree generation 58 and the 450 individuals assigned to pedigree generation 59 using PLINK[59, 60]. These pedigree generations were chosen because they had the largest sample sizes, represent the individuals with the deepest pedigrees for which their generation assignment would not be significantly influenced by missing pedigree information, and were among the most recent of the generations suggesting they were likely to represent all of the families present within the sample. Furthermore, the sample sizes for these generations were sufficiently large to obtain precise estimates of allele frequencies. We compared the observed allele frequency changes between generations 58 and 59 to the boundaries of the 99.999999% (−log10(p-value) = 8) confidence interval for the change in allele frequency due to drift (estimated under the assumption of normality assuming a mean of 0 and the drift variance for the ith SNP to be p i (1-p i )/2N e , for p i the frequency of the A i allele in generation 58 and N e = 116.15). For SNPs on the X chromosome, the drift variance for the ith SNP was p i (1-p i )/1.5N e . Loci for which the allele frequency change exceeded the boundaries of the confidence interval were concluded to be changing in frequency due to selection rather than drift.
GBLUP of phenotypic traits
In a weighted analysis using deregressed EBVs as previously described, GBLUP was used to estimate ASEs for 16 different traits using 45,073 SNPs genotyped in 3,570 animals. Allele substitution effects were converted to additive genetic variances by squaring the ASE and multiplying by 2p i q i , in which p i and q i = 1 - p i are the base generation allele frequencies at the ith SNP. Base generation allele frequencies at each SNP were estimated using the 59 animals born between 1955 and 1974, excluding all animals from the Wye herd. Results are presented only for the 44,817 SNPs that mapped to autosomes or the X chromosome in the UMD3.1 bovine assembly.
Signatures of selection analysis
SNPs with the greatest changes in allele frequency over time will have the largest ASEs for birth date. The ASE reflects the amount of response to selection realized by the genomic region tagged by a SNP. Genome-wide associations with birth date were analysed by GBLUP using custom developed software described in, by EMMAX, and by BayesCπ as implemented in GenSel. A genomic relationship matrix, computed as previously described, was incorporated in the GBLUP analysis and a Balding-Nichols matrix was used as the kinship matrix in the EMMAX analysis. Both analyses estimated the amount of variation in birth date explained by multilocus genotypes using REML. Test statistic p-values for each SNP produced by EMMAX were adjusted to q-values using the method of Benjamini and Hochberg as implemented by the GenABEL package in R. The additive genetic and residual variance components estimated in the GBLUP analysis were used as starting values for variance components in the BayesCπ analysis. The starting value for π was set to 0.9 and GenSel was run for 160,000 iterations, with 1,000 iterations used as burn-in. Manhattan plots were created in R, with R source code from which was altered to allow 30 chromosomes on the X-axis and for q-values or variances to be plotted on the Y-axis.
From Falconer and MacKay, the change in allele frequency resulting from selection is Δq = −ipqa/σ p , where i is the selection intensity, a is one half the phenotypic difference between homozygote mean phenotypes, σ p is the trait variance, and p and q are allele frequencies. Assuming the dominance deviation is zero, the ASE α is equal to the genotypic value a. Thus, we use the ASE as a proxy for a which we then scaled as pq ASE/σ ASE to form the independent variables for each of the 16 production traits which were individually regressed on the birth date ASEs to provide estimates of the relative selection intensity i for each trait (the sign is included in the realized estimate). For each trait, σ ASE represents the ASE standard deviation in the equation above. Regressions were performed using generalized least squares, with e ~ (0, V σ) where V was the matrix of correlations between alleles at pairs of SNPs estimated using PLINK. When multiple, contiguous SNPs had r = ±1 (i.e., r2 = 1), only the first SNP was retained, resulting in the removal of 13 SNPs. The model was fit to the 935 SNPs with the largest birth date ASEs.
Due to the extent of LD within the bovine genome[12, 65], we identified all genes within 100 Kbp of the 948 most strongly selected SNPs (top 2.11% of 44,817 SNPs, estimated by BayesCπ) identified by the GBLUP and BayesCπ analyses. We used the DAVID bioinformatics resources[66, 67] to identify enriched GO terms in the lists of 2,074 (GBLUP) and 1,996 (BayesCπ) genes, and the 1,059 genes in common between the lists. We used annotations from Bos taurus, Homo sapiens, Mus musculus, Rattus norvegicus, Canis lupus, Pan troglodytes, Macaca mulatta, Equus caballus, Pongo abelii, Sus scrofa, Ovis aries, and Oryctolagus cuniculus for GO enrichment analysis.
Genotypes are available to scientists interested in non-commercial research upon signing a Materials Transfer Agreement (MTA).
Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002, 12: 1805-1814.
Shriver MD, Kennedy GC, Parra EJ, Lawson HA, Sonpar V, Huang J, Akey JM, Jones KW: The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs. Hum Genomics. 2004, 1: 274-286.
Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG: Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005, 15: 1468-1476.
Carlson CS, Thomas DJ, Eberle MA, Swanson JE, Livingston RJ, Rieder MJ, Nickerson DA: Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 2005, 15: 1553-1565.
Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM: Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res. 2006, 16: 980-989.
Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, et al: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449: 913-918.
Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4: e72-
Wang ET, Kodama G, Baldi P, Moyzis RK: Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci U S A. 2006, 103: 135-140.
Akey JM: Constructing genomic maps of positive selection in humans: where do we go from here?. Genome Res. 2009, 19: 711-722.
Hancock AM, Alkorta-Aranburu G, Witonsky DB, Di Rienzo A: Adaptations to new environments in humans: the role of subtle allele frequency shifts. Philos Trans R Soc Lond B Biol Sci. 2010, 365: 2459-2468.
Gautier M, Naves M: Footprints of selection in the ancestral admixture of a New World Creole cattle breed. Mol Ecol. 2011, 20: 3128-3143.
Bovine HapMap C, Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, Green RD, Hamernik DL, Kappes SM, et al: Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science. 2009, 324: 528-532.
Hayes BJ, Chamberlain AJ, MacEachern S, Savin K, McPartlan H, MacLeod I, Sethuraman L, Goddard ME: A genome map of divergent artificial selection between Bos taurus dairy cattle and Bos taurus beef cattle. Anim Genet. 2009, 40: 176-184.
Hayes BJ, Lien S, Nilsen H, Olsen HG, Berg P, MacEachern S, Potter S, Meuwissen TH: The origin of selection signatures on bovine chromosome 6. Anim Genet. 2008, 39: 105-111.
Qanbari S, Gianola D, Hayes B, Schenkel F, Miller S, Moore S, Thaller G, Simianer H: Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle. BMC Genomics. 2011, 12: 318-
Qanbari S, Pimentel EC, Tetens J, Thaller G, Lichtner P, Sharifi AR, Simianer H: A genome-wide scan for signatures of recent selection in Holstein cattle. Anim Genet. 2010, 41: 377-389.
Pritchard JK, Di Rienzo A: Adaptation - not by sweeps alone. Nat Rev Genet. 2010, 11: 665-667.
Pritchard JK, Pickrell JK, Coop G: The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr Biol. 2010, 20: R208-R215.
Price AL, Zaitlen NA, Reich D, Patterson N: New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010, 11: 459-463.
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E: Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010, 42: 348-354.
McClure MC, Ramey HR, Rolf MM, McKay SD, Decker JE, Chapple RH, Kim JW, Taxis TM, Weaber RL, Schnabel RD, Taylor JF: Genome wide association analysis for quantitative trait loci influencing Warner Bratzler shear force in five taurine cattle breeds. Anim Genet. 2012, 10.1111/j.1365-2052.2012.02323.x.
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM: Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010, 42: 565-569.
Garrick DJ, Taylor JF, Fernando RL: Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol. 2009, 41: 55-
VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91: 4414-4423.
Saatchi M, McClure MC, McKay SD, Rolf MM, Kim JW, Decker JE, Taxis TM, Chapple RH, Ramey HR, Northcutt SL, Bauck S, Woodward B, Dekkers JCM, Fernando RL, Schnabel RD, Garrick DJ, Taylor JF: Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation. Genet Sel Evol. 2011, 43: 40-
Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, et al: A whole-genome assembly of the domestic cow. Bos taurus. Genome Biol. 2009, 10: R42-
Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100: 9440-9445.
Fernando RL, Garrick DJ: GenSel - User manual for a portfolio of genomic selection related analyses. 2012, http://taurus.ansci.iastate.edu Accessed Jan 31
Habier D, Fernando RL, Kizilkaya K, Garrick DJ: Extension of the Bayesian alphabet for genomic selection. BMC Bioinforma. 2011, 12: 186-
Wye Angus History.http://wyeangus.umd.edu/History.cfm,
American Angus Association: Genetic Trends.http://www.angus.org/Nce/GeneticTrends.aspx,
Hough B: A History of Cattle Selection. Western Livestock Journal. 2012,http://npaper-wehaa.com/western-livestock-journal/2012/01/11/#?article=1493763&z=74,
Clark SA, Hickey JM, van der Werf JHJ: Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol. 2011, 43: 18-
Taylor JF, McKay SD, Rolf MM, Ramey HR, Decker JE, Schnabel RD: Genomic Selection in Beef Cattle. Bovine Genomics. Edited by: Womack JE. 2012, John Wiley & Sons, Inc, 218-
Powell JE, Visscher PM, Goddard ME: Reconciling the analysis of IBD and IBS in complex trait studies. Nat Rev Genet. 2010, 11: 800-805.
Quaas RL, Anderson RD, Gilmour AR: Use of mixed models for prediction and for estimation of (co)variance components. BLUP school handbook. Animal Genetics and Breeding Unit. 1984, NSW, Australia: Univ. of New England
Baker ML, Blunn CT, Plum M: “Dwarfism” in Aberdeen-Angus cattle. J Hered. 1951, 42: 141-144.
Bureš D, Bartoň L, Zahrádková R, Teslík V, Fiedlerová M: Calving difficulty as related to body weights and measurements of cows and calves in a herd of Gascon breed. Czech J Anim Sci. 2008, 53: 187-194.
Nugent RA, Notter DR, Beal WE: Body measurements of newborn calves and relationship of calf shape to sire breeding values for birth weight and calving ease. J Anim Sci. 1991, 69: 2413-2421.
Wall E, Brotherstone S, Kearney JF, Woolliams JA, Coffey MP: Impact of nonadditive genetic effects in the estimation of breeding values for fertility and correlated traits. J Dairy Sci. 2005, 88: 376-385.
10 Quality Specifications.http://www.certifiedangusbeef.com/brand/specs.php,
Stavrinides J, McCann HC, Guttman DS: Host–pathogen interplay and the evolution of bacterial effectors. Cell Microbiol. 2008, 10: 285-292.
Walker AM, Roberts RM: Characterization of the bovine type I IFN locus: rearrangements, expansions, and novel subfamilies. BMC Genomics. 2009, 10: 187-
Killian GJ, Chapman DA, Rogowski LA: Fertility-associated proteins in Holstein bull seminal plasma. Biol Reprod. 1993, 49: 1202-1207.
Kristensen TN, Pedersen KS, Vermeulen CJ, Loeschcke V: Research on inbreeding in the 'omic' era. Trends Ecol Evol. 2010, 25: 44-52.
Ayroles JF, Hughes KA, Rowe KC, Reedy MM, Rodriguez-Zas SL, Drnevich JM, Caceres CE, Paige KN: A genomewide assessment of inbreeding depression: gene number, function, and mode of action. Conserv Biol. 2009, 23: 920-930.
Pedersen KS, Kristensen TN, Loeschcke V: Effects of inbreeding and rate of inbreeding in Drosophila melanogaster- Hsp70 expression and fitness. J Evol Biol. 2005, 18: 756-762.
Sorensen JG, Kristensen TN, Loeschcke V: The evolutionary and ecological role of heat shock proteins. Ecol Lett. 2003, 6: 1025-1037.
Cheng P, Liu X, Zhang G, Deng Y: Heat-shock protein70 gene expression in four hatchery Pacific Abalone Haliotis discus hannai Ino populations using for marker-assisted selection. Aquacul Res. 2006, 37: 1290-1296.
Johansson AM, Pettersson ME, Siegel PB, Carlborg Ö: Genome-Wide Effects of Long-Term Divergent Selection. PLoS Genet. 2010, 6: e1001188-
Eggen A: The development and application of genomic selection as a new breeding paradigm. Anim Front. 2012, 2: 10-15.
Sambrook J, Fritsch EF, Maniatis T: Molecular cloning: a laboratory manual. 1989, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press
Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O’Connell J, Moore SS, Smith TPL, Sonstegard TS, Van Tassell CP: Development and characterization of a high density SNP genotyping assay for cattle. PLoS One. 2009, 4: e5350-
Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78: 629-644.
R Development Core Team: R: A language and environment for statistical computing. 2011, Vienna: R Foundation for Statistical Computing,http://www.R-project.org,
Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet. 2006, 2: e190-
Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, Essex, UK: Longmans Green, Harlow, 4
Purcell S: PLINK version 1.07.http://pngu.mgh.harvard.edu/purcell/plink,
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007, 81: 559-575.
Balding DJ, Nichols RA: A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995, 96: 3-12.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995, 57: 289-300.
Aulchenko YS, Ripke S, Isaacs A, van Duijn CM: GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007, 23: 1294-1296.
Turner S: Annotated Manhattan plots and QQ plots for GWAS using R, Revisited. 2011, Available: http://gettinggeneticsdone.blogspot.com/2011/04/annotated-manhattan-plots-and-qq-plots.html
McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Stothard P, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS: Whole genome linkage disequilibrium maps in cattle. BMC Genet. 2007, 8: 74-
da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57.
da Huang W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37: 1-13.
We gratefully acknowledge the provision of semen samples from Angus breeders and semen distributors. This project was supported by the University of Missouri, National Research Initiative grants number 2008-35205-04687 and 2008-35205-18864 from the USDA Cooperative State Research, Education and Extension Service and National Research Initiative grant number 2009-65205-05635 from the USDA National Institute of Food and Agriculture. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.
The authors declare that they have no competing interests.
JFT and JED created the methodology and designed the experiment. DAV, JED and JFT analyzed data. SDM, MCM, MMR, JWK, and RDS extracted DNA. MMR and SDM prepared samples for genotyping, SDM ran the Illumina assay, and RDS genotyped samples and managed the genotype database. SLN provided pedigree and estimated genetic merit data and SB and BWW provided genotypes on about 900 Angus animals. JED and JFT wrote the manuscript and other authors provided feedback. All authors read and approved the final manuscript.