Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
© Khatkar et al.; licensee BioMed Central Ltd. 2012
Received: 5 April 2012
Accepted: 6 October 2012
Published: 8 October 2012
We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios.
Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K.
Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.
KeywordsImputation 800K High-density SNP Dairy cattle Genomic selection
Innovations in genomic technologies provide new tools for enhancing productivity and wellbeing of domestic animals. Genomic selection, where genetic merit is predicted from genome-wide single nucleotide polymorphism (SNP) genotypes[1, 2], is used in the dairy industries in a number of countries[3, 4]. The rapid uptake of this technology has been driven by both the availability of commercial high-density SNP chips, and increased genetic gain over traditional progeny testing largely as a consequence of reduced generation interval and increased accuracy of selection at a younger age[5–7].
A number of SNP chips from Illumina (http://www.illumina.com) and Affymetrix (http://www.affymetrix.com) are available for cattle. These include 3K, 7K, 15K, 25K, 50K and more recently 800K from Illumina, and 650K and 3 million SNP panels from Affymetrix. In addition next generation sequencing technologies for low-cost sequencing of whole genomes are now available. Use of genotypic data from high-density SNPs potentially can increase accuracy of genomic selection but also the total cost of genotyping/sequencing. As new higher density chips are developed, re-genotyping previously genotyped samples or new samples with new chips or whole genome sequencing is expensive. For some applications, such as selection of heifers to be retained in the dairy herd or selection in beef production systems, low-density SNP panels e.g. 3-7K may be the only cost effective option (e.g.). If low-cost genotyping could be useful, very large numbers of animals can be genotyped on a routine basis.
Accuracy of genomic predictions based on different subsets of low-density SNP panels up to 50K have been compared in a number of studies[15–18]. A common finding is that accuracy of genomic prediction for young animals increased as the number of markers increased from a few hundred up to all SNPs from 50K SNP chip. There are several possible strategies how to select loci for low-density panels. However, instead of using lower density SNP in genomic prediction, a promising approach is to genotype a small proportion of the population with a high-density SNP panel and then employ genotype imputation methods for predicting high-density genotypes for the rest of the population genotyped with a lower density SNP panel (e.g.[8, 9]). Genotypic imputation is defined as the prediction of genotypes at the SNP locations in a sample of individuals for which assays are not directly available. These in silico genotypes obtained by imputation, albeit with some uncertainty, can then be used in genome-wide association and genomic selection analyses (e.g.[19, 20]). Such strategies are likely to result in more accurate predictions of genomic breeding values, improved ability to resolve or fine-map QTL or QTN, and integration and meta-analysis across large datasets with heterogeneous SNP information.
A number of imputation software programs (fastPHASE, MACH, IMPUTE, Beagle, PLINK, DualPhase) have been used to infer missing or untyped genotypes based on known information derived from flanking markers. A number of studies on imputing genotypes have been published in dairy cattle[21, 28–33] using 50K data and more recently high-density SNP panels[34–36] reporting accuracies of imputation from lower SNP panels to 50K and up to high-density SNP panels examining different methods of imputation, often using limited number of scenarios and strategies of using test and reference panels. The direct comparisons across such studies are thus often difficult. Various factors affecting the accuracy of imputation require further systematic investigation. The accuracy of imputation can be improved by increasing the size of the reference population. For some resource population the animals genotyped with different SNP panels are available. Such genotype resources can be better utilised by imputing in a tiered framework, utilising multiple reference panels, which might result in improved accuracy of imputation in the study samples.
The objectives of this study were to evaluate the accuracies of imputation using three different population based methods of imputation, different size of reference and test panels, different imputation strategies, different SNP array platforms, effect of relationship between reference and test animals and finally examine the effect of using imputed genotypes on the accuracy of genomic selection.
Description of different SNP chips and SNP subset panels
Label used for SNP panel in this study
Number of SNPs on chip
Filtered SNPs used in this study
205 SNPs from BTA20
328 SNPs from BTA20
Illumina BovineSNP50 BeadChip
Illumina BovineSNP50 BeadChip
Evenly spaced Subset of 50K
Illumina BovineSNP50 BeadChip
Evenly spaced Subset of 50K
Illumina BovineSNP50 BeadChip
Evenly spaced Subset of 50K
Illumina BovineSNP50 BeadChip
Evenly spaced Subset of 50K
Illumina BovineSNP50 BeadChip
Evenly spaced Subset of 50K
Illumina BovineLD BeadChip
Illumina Bovine3K BeadChip
Illumina 800K BovineHD beadChip
Illumina 800K BovineHD beadChip
Imputed best guess genotypes
Illumina 800K BovineHD beadChip
Imputed dosage for B-allele
Illumina BovineSNP50 BeadChip
Common SNP between 800K and 50K chip
Of the 2,205 bulls with 50K genotypic information, 1,419 were previously genotyped for 15K, and 431 for 25K (,http://www.affymetrix.com). These datasets were used to test the accuracies of imputing SNP genotypes between different chips. The animals in all these datasets are related in a complex pedigree structure. The distributions of relatedness in the form of boxplots of pedigree kinship among animals in different datasets are given in Additional file1.
Composition of reference and test sets for evaluating imputation accuracy up to 50K
training set bulls
test set young bulls
To examine the effect of pedigree relatedness between test and reference animals on the accuracy of imputation, the test animals with sire and without sires in the reference set were compared. In addition the highest value of pedigree kinship for each test animal with reference animals was computed. The test animals were classified into four interval categories with respect to their highest pedigree kinship viz. 0.0-0.01, 0.01-0.1, 0.1-0.2 and 0.2-0.4. The accuracy of imputation of the test animals in these four categories was compared using IMPUTE2.
Generating low-density SNP panels
To mimic various low-density SNP panels, different subset of 50K SNPs were selected for the test sets. The SNP densities equivalent to 3000, 5000, 10000, 20000 and 35000 evenly spaced autosomal SNPs were generated by iterative thinning the SNPs based on spacing and MAF of SNPs (Table1). In each iteration, a SNP pair with the smallest interval was identified and the SNP with lower MAF was removed from the pair. A total of 1,324 SNPs on chromosome 20 from the 50K panel were used for the initial analyses to compare the imputation programs for different scenarios. The best method of imputation identified was then used for analysing all the autosomal SNPs from the Illumina Bovine3K and Illumina BovineLD 7K BeadChip (Illumina Inc., San Diego, CA) for assessing the comparative utility of imputed genotypes from these commercial panels up to 50K for genomic prediction.
Most of the SNPs on the 50K chip are present on the 800K chip. For the scenarios using the 800K panel the lower density SNP panels for the test set consisted of common SNPs between 800K and 50K as well as between 800K and Illumina Bovine3K and Illumina BovineLD 7K, respectively (Table1).
Population based imputation methods rely on linkage disequilibrium relationship between SNPs, and essentially consist of two steps viz. inference of haplotypes and imputing untyped genotypes in the test set using information from the best fit haplotypes derived from the reference panel. We compared three commonly used population-based programs for imputing missing genotypes which don’t rely on pedigree information viz. IMPUTE2, fastPhase and Beagle.
We used IMPUTE2 version 2.1.2 in this study which implements a Hidden Markov Model (HMM). The details of the algorithm are given in.The algorithm involves estimating haplotypes using all the SNP in reference set and then imputing the alleles at untyped SNPs in the test set based on the best fit haplotypes estimated from the reference. IMPUTE2 requires to specify the effective population size as an input parameter. This was set to 100 which is within the range of the effective population size reported for Holstein-Friesian dairy cattle[39, 40].
We used fastPHASE version 1.2.3. fastPhase uses a haplotype clustering algorithm which is based on the observation that haplotypes in a population tend to cluster into groups of closely related or similar haplotypes over a short region. fastPhase requires the number of clusters K as input and was set to 20 in this study.
Beagle version 3.3 is also based on a local haplotype-clustering model (as detailed in,), similar to fastPHASE, but allows for a variable number of clusters across a region. Beagle uses a localized haplotype cluster-model to cluster haplotypes at each marker and then defines a HMM to find the most likely haplotype pairs based on the individual’s known genotypes. The most likely genotype at untyped loci is generated from defined haplotype pairs. We used the option where reference and test panel are defined separately. Imputation was performed for each chromosome separately for all the three methods. Except the above mentioned parameters, programs were run with default parameters.
Accuracy of imputation
All the three imputation methods provide the probability of the three possible genotypes at each missing genotype. We used the most likely genotype as the predicted genotype. For incorrectly imputed genotypes it is possible to impute one or both alleles incorrectly. To distinguish between these two cases, we computed the accuracy of imputing alleles as the percentage of correctly predicted alleles, and the allelic error rate of imputation as the percentage of incorrectly predicted alleles i.e. mean allelic error rate (%) = number of incorrectly predicted alleles / total number of alleles imputed in the test set × 100. In general allelic error rates are just slightly more than half of genotypic error rates. Accuracy of imputation was also computed as the percentage of correctly predicted genotypes for the masked genotypes.
800K imputed dataset for genomic prediction
The data on 2,205 bulls genotyped with 50K were imputed, with IMPUTE2, up to 800K using 845 heifers genotyped with 800K as reference and using most likely genotype as the predicted genotype (‘800K-imputed’, Table1). In addition the dosage/copies of the B allele for each genotype was computed as p AB +2×p BB , where p AB and p BB are imputed probabilities of AB and BB genotypes, respectively. This measure takes into account the uncertainty of imputation and is an appropriate measure when using an additive model in genomic prediction and genome-wide association studies. These two datasets of 2,205 bulls with imputed genotypes (‘800K-imputed’) and imputed dosage (‘800K-dosage’) for 610,879 autosomal SNPs were used to compute genomic prediction.
Accuracy of genomic prediction
where y is a vector of twice the daughter trait deviations (DTD) of bulls, 1 is a column vector of ones of size N Anim , is the general mean, ĝ is a vector of the estimated SNP effect, X is an N Anim × NSNP matrix of SNP genotypes coded as 0 (homozygote), 1 (heterozygote), or 2 (other homozygote), or SNP allele dosage. I is an identity matrix of size NSNP × NSNP , λ is a shrinkage parameter derived by cross-validation. R is a diagonal matrix with elements R ii = (1/rel i )-1, where rel i is the reliability of the DTD of ith bull. DGV were calculated as.
Five traits were analysed viz. milk yield, fat yield, protein yield, survival and daughter fertility which reflect a range of heritabilities (i.e. 0.25, 0.25, 0.25, 0.04 and 0.04, respectively). Phenotype information was provided by the Australian Dairy Herd Improvement Scheme (ADHIS,http://www.adhis.com.au). The phenotypes used were daughter trait deviations (DTD) for the bulls. The accuracy of the DGV prediction using subsets of SNP genotypes, and imputed SNP genotypes were compared to the DGV prediction obtained with the all 50K SNP genotypes. The accuracy of DGV prediction was computed as Pearson’s correlation coefficient between DGV and DTD of the young bulls in the test data.
Imputation up to 50K
Comparison of imputation methods
Effect of SNP density
The accuracy of imputation increases with the number of SNPs in the test set (Figure2, Additional file2) for all the scenarios and the methods examined here. The mean allelic error rate decreases from 2.80% for the evenly spaced 3K SNP panel to 0.76% for the 35K panel in the scenario where 50% animals are in the reference set (Figure2a). The mean allelic error rate of imputation is lower for the evenly spaced 3K SNP panel (2.80%) compared to the Bovine3K panel (3.34%). There is a large reduction in the mean allelic error rate of imputation when using the 5K evenly spaced SNP panel (1.97%) in the test set (Additional file2). Further reductions in error rate of imputation by increasing SNP density in the test set to 10K (1.36%), 20K (1.00%) and 35K (0.76%) are relatively smaller (Figure2a).
Effect of size of reference panel
The mean allelic error rate increases as the number of animals in the reference set decreases (Figure2, Additional file2). The lowest allelic error rate is obtained when 1,363 (50%) animals are in the reference and the rest in the test set. The mean allelic error rate ranges from 0.76% for the 35K SNP panel to 2.80% for evenly spaced 3K SNP panel using IMPUTE2. The mean imputation error rate for the cows using the bulls as reference ranges from 1.21 to 4.65% and for the bulls using the cows as reference ranges from 0.73 to 3.47% for different SNP densities using IMPUTE2 (Additional file2).
Effect of relatedness between test and reference animals
The mean allelic error rates for the test animals with sire and without sire in the reference for all the 42 scenarios using IMPUTE2 are given in Additional file2. In general test animals with sire in the reference have slightly lower allelic error rate of imputation (2.61% for with sire vs. 3.34% without sire averaged across all the scenarios). We further compared the error rate with kinship estimates of the test animals with the reference animals. The results for the 42 scenarios presented in Additional file3 show that, in general, the mean allelic error rate decreases with the increase in the highest kinship of the test animals with the reference animals. This is more pronounced when the SNP panels in the test set are small and also when the reference size is small.
Imputation between SNP chips
Mean allelic error rate of imputing SNP genotypes between different SNP chips obtained with IMPUTE2
Animals masked (%)
N animals total
N animal reference
N animals test
Mean allelic error rate (%)
15K by 50K
50K by 15K
25K by 50K
50K by 25K
Similarly the mean allelic error rates of imputing of 25K specific (328 SNPs) genotypes are 1.50%, 1.85% and 2.75% when 25%, 50% and 75% of the animals, respectively, are in the test set. The respective mean allelic error rates of imputing 50K specific (1324 SNPs) genotypes are 2.75%, 2.75% and 4.55%. The error rates in these scenarios are slightly higher compared to the above mentioned corresponding scenarios including 15K, possibly due to a lower number of animals in the reference and the test sets. Overall the results indicate that a reasonable accuracy of imputation for untyped SNP genotypes can be achieved when combining datasets genotyped with these SNP chips.
Comparison of methods for imputation up to 800K
Comparison of 2-tiered and 3-tiered approaches for imputation up to 800K
We further tested the accuracy of imputation using a smaller number of animals in the top tier. The mean allelic error rates for all scenarios are much higher when a small number of animals (41 animals, 5% of 825 cows) is included in the top tier (Figure4b). The mean allelic error rates for the 2-tiered approach ranges from 5.55% using 49K to 14.43% for using the Bovine3K panel in the test set. However, there are larger decrease in the error rates of imputation using the Bovine3K (14.43% to 9.58%), BovineLD 7K (10.01% to 6.03%) and 49K (5.55% to 3.41%), by including a middle tier of 2205 bulls with 50K genotypes when the top reference tier is small.
To further test the potential of using 800K for imputing even higher density genotypes (e.g. up to 3 million or whole genome sequence) we tested accuracy of imputing every 10th SNP and 100th SNP by masking these SNP genotypes in 50% of the 825 cows genotyped with 800K using BTA20 as an example. The imputation accuracies for masked genotypes were 99.78% and 99.80% for every 10th and 100th SNP, respectively. However, such a large number of animals genotyped with very high-density SNP arrays or whole genome sequence may not be available in immediate future. We also tested a scenario when a smaller reference set (41 animals) was used and the accuracies of imputed genotypes were 98.00% and 98.44% for imputing every 10th and 100th SNP, respectively suggestive that ultra high-density and whole genome sequence may also be imputed with a very high level of accuracy from a commercial high-density SNP array.
Accuracy of DGV prediction based on actual and imputed genotypes using 50K dataset
Accuracy of prediction of direct genomic value (DGV) for 5 dairy traits based on Bovine3K, BovineLD 7K, 50K, imputed up to 50K, imputed up to 800K and imputed 800K-dosage
Mean allelic error rate (%) of imputation
Subset Bovine LD 7K
50K-imputed (Test imputedAA using Bovine3K)
50K-imputed (Test imputedA with BovineLD)
50K-imputed (Train & Test imputedB using Bovine3K)
50K-imputed (Train & Test imputedB using BovineLD)
Accuracy of DGV prediction based on 800K imputed data
Table4 further presents the results on accuracies of DGV prediction using imputed genotypes up to 800K. The accuracies of DGV prediction using the most likely genotype (800K-imputed) and allele dosage (800K-dosage) are quite similar viz. 0.558 and 0.554 for milk yield, 0.530 and 0.525 for protein yield, 0.526 and 0.520 for fat yield, 0.232 and 0.229 for survival and 0.256 to 0.253 for daughter fertility, respectively. Overall there is only a small improvement in DGV prediction using the imputed 800K genotypes over the actual 50K genotypes.
With the rapid development of higher density SNP chips for cattle, it is now common to have population samples genotyped with different SNP chips. We have presented different strategies for utilising such heterogenous SNP datasets efficiently. We compared accuracies of imputation within and across SNP chips and the accuracy of genomic prediction using imputed genotypes.
IMPUTE2 gave higher accuracies of imputation compared to Beagle and fastPhase. fastPhase may provide comparable accuracy when the reference panel is small and the SNP densities used in the test set is high. However fastPhase required more computing time compared to Beagle and IMPUTE2. For example for scenario 1 (Additional File2), using a Linux machine with AMD Opteron Processor 6136, IMPUTE2, Beagle and fastPhase took 2.36, 6.19 and 20.7 hours of computing time and used 100MB, 807MB, 112MB RAM, respectively. Computation time on a multiprocessor machine can be reduced by dividing the chromosome into smaller segments. However, using IMPUTE2, we observed that accuracy was slightly higher when the whole chromosome was imputed in a single run (not shown). This may possibly be due to the extended linkage disequilibrium present in the bovine genome which allows for better definition of long-range haplotypes when the whole chromosome is used.
Our estimates of mean allelic error of imputing up to 50K from evenly spaced 3K panel (2.8%) were lower compared to Bovine3K (3.3%) which may be because of the higher number of SNPs with higher MAF in evenly spaced 3K SNP panel. These estimates are comparable to the range of 2.1 to 5.5% reported by Dassonneville et al. for Bovine3K and 3 to 4% obtained by Zhang et al. for evenly spaced 3000 SNPs using DAGPHASE. We found an increase in the accuracy of imputation with an increase in the number of animals in the reference set. However, we tested only up to 1,363 animals in the reference. Larger reference sets might further improve accuracy of imputation.
We showed that 800K genotypes could be imputed with low allelic error using 50K genotypes (0.79% for all autosomes). Most of the SNPs had low error rate. However, we noted a very small proportion of the SNPs with higher imputation error than expected. For example we found 12 SNPs on BTA20 which had an allelic error rate of larger than 5%. We suspect that these SNPs may have incorrect positions on UMD3.1 assembly or contain errors in genotyping call itself. The mean error rates reported throughout this study include all such SNPs. If wrong map assignment and genotypic error of SNPs have a significant effect on the accuracy of imputation process is not known, but should be considered in future studies.
We showed that using additional reference panel genotyped with medium-density SNP chip in a 3-tiered framework increased the accuracy of imputation especially when the main reference panel was small. The additional gain in the accuracy of imputation in the 3-tiered approach may be due to better definition of haplotypes with the availability of large number of samples in the combined reference. Our results suggest that increasing the size of the reference panel by including animals genotyped with different SNP chips in a tiered framework can improve the accuracy of imputation. We used population based methods for imputation and showed that these used relationship information indirectly. The degree of kinship between animals in test and reference set has a significant effect on the accuracy of imputation and as such can be strategically optimised in selecting animals to be genotyped if pedigree information is available. A number of other programs have been used for imputation ([43–45],) which use pedigree information directly along with haplotype data and these can be more efficient when required family information is available. Johnston et al. suggested a blending approach that combined the strength of various programs available. Development of multi-tiered imputation strategies that utilises pedigree information seems promising when the animals genotyped with heterogenous SNP panels and up to whole genome sequences are available.
Using imputed genotypes up to 50K increased the accuracy of genomic selection compared to just using the smaller SNP subsets used for imputation. Similar observations were made by Johnston et al. and Weigel et al.. Therefore, using genotype imputation would increase return on investment when a larger proportion of the population is genotyped with lower density SNP panels.
By testing the utility of imputed 800K genotypes i.e. best guess genotypes and dosages of the B-allele, we showed that the accuracy of genomic prediction from imputed 800K genotypes was only marginally better compared to using 50K genotypes. Although we cannot compare these accuracies with the actual 800K genotypes in this study, however, mean allelic error rate of imputation up to 800K using 50K in the test samples was very small (0.79%). These error rates were obtained by using 425 cows in the reference set. The results of imputing up to 50K (Figure2) show that using larger reference can improve accuracy of imputation even further. Moreover additional analyses within the 50K dataset indicate that small error rates of the imputed genotypes will have no notable effect on the accuracy of genomic selection. Hence we believe that presented accuracies of genomic prediction with imputed 800K genotypes are comparable to the actual 800K genotypes. However, we have only used one method for genomic prediction and it is possible that other methods may utilise higher density genotype more efficiently (e.g.,). High-density SNP genotypes are likely to be useful for genome-wide association studies and across study meta-analysis of SNP-trait relationships. Further studies are required to see the utility of imputed genotypes to discover and map the casual mutation affecting phenotypes in dairy cattle.
IMPUTE2 had the highest accuracy of the three imputation methods examined. Accuracy of imputation increases with the number of SNPs in the test set, increase in the number of samples in the reference set and presence of closely related animals in the reference. 800K SNP genotypes can be imputed with very high accuracies from 50K SNP genotypes and with slightly lower accuracies from lower density SNP panels (e.g. 3K, 7K). The accuracy of imputation is improved using a 3-tiered approach, which used an additional middle tier of 50K, compared to 2-tiered approach, especially when the top panel of animals genotyped with 800K SNPs is small. There is no appreciable loss in accuracy of genomic prediction using imputed 50K SNP genotypes derived from the commercial 3K or 7K panels compared to using the actual 50K SNP genotypes and both perform substantially higher than using 3K or 7K genotypes. Our results show that imputation from lower density SNP panels is a cost effective strategy for genomic selection. There is only a small gain in the accuracy of genomic prediction when using imputed 800K genotypes compared to actual 50K genotypes.
Direct genomic values
Single nucleotide polymorphism
Hidden Markov Model
Minor allele frequency
Daughter trait deviations.
The authors wish to thank Genetics Australia for semen samples, the Australian Dairy Herd Improvement Scheme (ADHIS) for providing phenotype and pedigree data. The study was supported by the Dairy Futures Cooperative Research Centre (CRC). The authors are grateful to Professors Chris Moran and Frank Nicholas for editorial suggestions in review of the manuscript.
- Nejati-Javaremi A, Smith C, Gibson JP: Effect of total allelic relationship on accuracy of evaluation and response to selection. J Anim Sci. 1997, 75 (7): 1738-1745.PubMedGoogle Scholar
- Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157 (4): 1819-1829.PubMed CentralPubMedGoogle Scholar
- Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review: Genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009, 92 (2): 433-443. 10.3168/jds.2008-1646.View ArticlePubMedGoogle Scholar
- Wiggans GR, Vanraden PM, Cooper TA: The genomic evaluation system in the United States: Past, present, future. J Dairy Sci. 2011, 94 (6): 3202-3211. 10.3168/jds.2010-3866.View ArticlePubMedGoogle Scholar
- Schaeffer LR: Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet. 2006, 123 (4): 218-223. 10.1111/j.1439-0388.2006.00595.x.View ArticlePubMedGoogle Scholar
- Pryce JE, Goddard ME, Raadsma HW, Hayes BJ: Deterministic models of breeding scheme designs that incorporate genomic selection. J Dairy Sci. 2010, 93 (11): 5455-5466. 10.3168/jds.2010-3256.View ArticlePubMedGoogle Scholar
- Konig S, Simianer H, Willam A: Economic evaluation of genomic breeding programs. J Dairy Sci. 2009, 92 (1): 382-391. 10.3168/jds.2008-1310.View ArticlePubMedGoogle Scholar
- Wiggans GR, Cooper TA, Vanraden PM, Olson KM, Tooker ME: Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. J Dairy Sci. 2012, 95 (3): 1552-1558. 10.3168/jds.2011-4985.View ArticlePubMedGoogle Scholar
- Boichard D, Chung H, Dassonneville R, David X, Eggen A, Fritz S, GK S, Hayes BJ, Lawley CT, Sonstegard TS, et al: Design of a Bovine Low-Density SNP Array Optimized for Imputation. PLoS One. 2012, 7 (3): e34130-10.1371/journal.pone.0034130.PubMed CentralView ArticlePubMedGoogle Scholar
- Khatkar MS, Zenger KR, Hobbs M, Hawken RJ, Cavanagh JA, Barris W, McClintock AE, McClintock S, Thomson PC, Tier B, et al: A primary assembly of a bovine haplotype block map based on a 15,036-single-nucleotide polymorphism panel genotyped in Holstein-Friesian cattle. Genetics. 2007, 176 (2): 763-772.PubMed CentralView ArticlePubMedGoogle Scholar
- Raadsma HW, Khatkar MS, Moser G, Hobbs M, Crump RE, Cavanagh JA, Tier B: Genome wide association studies in dairy cattle using high density SNP scans. Proc Assoc Advmt Anim Breed Genet. 2009, 18: 151-154.Google Scholar
- Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O'Connell J, Moore SS, Smith TP, Sonstegard TS, et al: Development and characterization of a high density SNP genotyping assay for cattle. PLoS One. 2009, 4 (4): e5350-10.1371/journal.pone.0005350.PubMed CentralView ArticlePubMedGoogle Scholar
- Metzker ML: Sequencing technologies- the next generation. Nat Rev Genet. 2010, 11 (1): 31-46. 10.1038/nrg2626.View ArticlePubMedGoogle Scholar
- Pryce J, Hayes B: A review of how dairy farmers can use and profit from genomic technologies. Animal Production Science. 2012, 52: 180-184. 10.1071/AN11172.View ArticleGoogle Scholar
- Weigel KA, de los Campos G, Gonzalez-Recio O, Naya H, Wu XL, Long N, Rosa GJ, Gianola D: Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J Dairy Sci. 2009, 92 (10): 5248-5257. 10.3168/jds.2009-2092.View ArticlePubMedGoogle Scholar
- Zukowski K, Suchocki T, Gontarek A, Szyda J: The impact of single nucleotide polymorphism selection on prediction of genomewide breeding values. BMC Proc. 2009, 3 Suppl 1: S13-View ArticlePubMedGoogle Scholar
- Moser G, Khatkar M, Hayes B, Raadsma H: Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genet Sel Evol. 2010, 42 (1): 37-10.1186/1297-9686-42-37.PubMed CentralView ArticlePubMedGoogle Scholar
- Vazquez AI, Rosa GJ, Weigel KA, de los Campos G, Gianola D, Allison DB: Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci. 2010, 93 (12): 5942-5949. 10.3168/jds.2010-3335.PubMed CentralView ArticlePubMedGoogle Scholar
- Browning BL, Browning SR: Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet Epidemiol. 2007, 31 (5): 365-375. 10.1002/gepi.20216.View ArticlePubMedGoogle Scholar
- Goddard ME, Hayes BJ: Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet. 2009, 10 (6): 381-391. 10.1038/nrg2575.View ArticlePubMedGoogle Scholar
- Zhang Z, Ding X, Liu J, Zhang Q, de Koning DJ: Accuracy of genomic prediction using low-density marker panels. J Dairy Sci. 2011, 94 (7): 3642-3650. 10.3168/jds.2010-3917.View ArticlePubMedGoogle Scholar
- Holm H, Gudbjartsson DF, Sulem P, Masson G, Helgadottir HT, Zanon C, Magnusson OT, Helgason A, Saemundsdottir J, Gylfason A, et al: A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet. 2011, 43 (4): 316-320. 10.1038/ng.781.PubMed CentralView ArticlePubMedGoogle Scholar
- Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78 (4): 629-644. 10.1086/502802.PubMed CentralView ArticlePubMedGoogle Scholar
- Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, et al: Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008, 40 (2): 161-169. 10.1038/ng.76.View ArticlePubMedGoogle Scholar
- Howie BN, Donnelly P, Marchini J: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009, 5 (6): e1000529-10.1371/journal.pgen.1000529.PubMed CentralView ArticlePubMedGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81 (3): 559-575. 10.1086/519795.PubMed CentralView ArticlePubMedGoogle Scholar
- Druet T, Georges M: A Hidden Markov Model Combining Linkage and Linkage Disequilibrium Information for Haplotype Reconstruction and QTL Fine Mapping. Genetics. 2010, 184 (3): 789-798. 10.1534/genetics.109.108431.PubMed CentralView ArticlePubMedGoogle Scholar
- Weigel KA, Van Tassell CP, O'Connell JR, VanRaden PM, Wiggans GR: Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. J Dairy Sci. 2010, 93 (5): 2229-2238. 10.3168/jds.2009-2849.View ArticlePubMedGoogle Scholar
- Druet T, Schrooten C, de Roos AP: Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. J Dairy Sci. 2010, 93 (11): 5443-5454. 10.3168/jds.2010-3255.View ArticlePubMedGoogle Scholar
- Zhang Z, Druet T: Marker imputation with low-density marker panels in Dutch Holstein cattle. J Dairy Sci. 2010, 93 (11): 5487-5494. 10.3168/jds.2010-3501.View ArticlePubMedGoogle Scholar
- Vanraden PM, O'Connell JR, Wiggans GR, Weigel KA: Genomic evaluations with many more genotypes. Genet Sel Evol. 2011, 43: 10-10.1186/1297-9686-43-10.PubMed CentralView ArticlePubMedGoogle Scholar
- Dassonneville R, Brondum RF, Druet T, Fritz S, Guillaume F, Guldbrandtsen B, Lund MS, Ducrocq V, Su G: Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J Dairy Sci. 2011, 94 (7): 3679-3686. 10.3168/jds.2011-4299.View ArticlePubMedGoogle Scholar
- Daetwyler HD, Wiggans GR, Hayes BJ, Woolliams JA, Goddard ME: Imputation of Missing Genotypes from Sparse to High Density Using Long-Range Phasing. Genetics. 2011, 189 (1): 317-27. 10.1534/genetics.111.128082.PubMed CentralView ArticlePubMedGoogle Scholar
- Harris BL, Creagh FE, Winkelman AM, Johnson DL: Experiences with the Illumina High Density Bovine BeadChip. Interbull Bulletin. 2011, 44: 3-7.Google Scholar
- Solberg TR, Heringstad B, Svendsen M, Grove H, Meuwissen THE: Genomic Predictions for Production- and Functional Traits in Norwegian Red from BLUP Analyses of Imputed 54K and 777K SNP Data. Interbull Bulletin. 2011, 44: 240-243.Google Scholar
- Su G, Brøndum RF, Ma P, Guldbrandtsen B, Aamand GP, Lund MS: Genomic prediction using high-hensity SNP markers in Nordic Holstein and Red. Interbull Bulletin. 2011, 44: 157-161.Google Scholar
- Browning SR, Browning BL: Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011, 12 (10): 703-714. 10.1038/nrg3054.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchini J, Howie B: Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010, 11 (7): 499-511. 10.1038/nrg2796.View ArticlePubMedGoogle Scholar
- Hayes BJ, Visscher PM, McPartlan HC, Goddard ME: Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 2003, 13 (4): 635-643. 10.1101/gr.387103.PubMed CentralView ArticlePubMedGoogle Scholar
- Zenger KR, Khatkar MS, Cavanagh JA, Hawken RJ, Raadsma HW: Genome-wide genetic diversity of Holstein Friesian cattle reveals new insights into Australian and global population variability, including impact of selection. Anim Genet. 2007, 38 (1): 7-14. 10.1111/j.1365-2052.2006.01543.x.View ArticlePubMedGoogle Scholar
- Xu S: Estimating polygenic effects using markers of the entire genome. Genetics. 2003, 163 (2): 789-801.PubMed CentralPubMedGoogle Scholar
- Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, Schnabel RD, Taylor JF, Raadsma HW: Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genomics. 2008, 9 (1): 187-10.1186/1471-2164-9-187.PubMed CentralView ArticlePubMedGoogle Scholar
- Druet T, Georges M: A hidden markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics. 2010, 184 (3): 789-798. 10.1534/genetics.109.108431.PubMed CentralView ArticlePubMedGoogle Scholar
- Johnston J, Kistemaker G, Sullivan PG: Comparison of Different Imputation Methods. Interbull Bulletin. 2011, 44: 25-33.Google Scholar
- Hickey JM, Kinghorn BP, Tier B, van der Werf JH, Cleveland MA: A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation. Genet Sel Evol. 2012, 44 (1): 9-10.1186/1297-9686-44-9.PubMed CentralView ArticlePubMedGoogle Scholar
- Weigel KA, de Los Campos G, Vazquez AI, Rosa GJ, Gianola D, Van Tassell CP: Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J Dairy Sci. 2010, 93 (11): 5423-5435. 10.3168/jds.2010-3149.View ArticlePubMedGoogle Scholar
- Meuwissen T, Goddard M: Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics. 2010, 185 (2): 623-631. 10.1534/genetics.110.116590.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.