- Research article
- Open Access
Estimation of linkage disequilibrium in four US pig breeds
BMC Genomics volume 13, Article number: 24 (2012)
The success of marker assisted selection depends on the amount of linkage disequilibrium (LD) across the genome. To implement marker assisted selection in the swine breeding industry, information about extent and degree of LD is essential. The objective of this study is to estimate LD in four US breeds of pigs (Duroc, Hampshire, Landrace, and Yorkshire) and subsequently calculate persistence of phase among them using a 60 k SNP panel. In addition, we report LD when using only a fraction of the available markers, to estimate persistence of LD over distance.
Average r2 between adjacent SNP across all chromosomes was 0.36 for Landrace, 0.39 for Yorkshire, 0.44 for Hampshire and 0.46 for Duroc. For markers 1 Mb apart, r2 ranged from 0.15 for Landrace to 0.20 for Hampshire. Reducing the marker panel to 10% of its original density, average r2 ranged between 0.20 for Landrace to 0.25 for Duroc. We also estimated persistence of phase as a measure of prediction reliability of markers in one breed by those in another and found that markers less than 10 kb apart could be predicted with a maximal accuracy of 0.92 for Landrace with Yorkshire.
Our estimates of LD, although in good agreement with previous reports, are more comprehensive and based on a larger panel of markers. Our estimates also confirmed earlier findings reporting higher LD in pigs than in American Holstein cattle, especially at increasing marker distances (> 1 Mb). High average LD (r2 > 0.4) between adjacent SNP found in this study is an important precursor for the implementation of marker assisted selection within a livestock species.
Results of this study are relevant to the US purebred pig industry and critical for the design of programs of whole genome marker assisted evaluation and selection. In addition, results indicate that a more cost efficient implementation of marker assisted selection using low density panels with genotype imputation, would be feasible for these breeds.
The extent of non-random association of gametes at different loci, or linkage disequilibrium (LD), has become the focus of many recent studies in both humans and animals [1–4]. Gaining knowledge of the distribution of LD in livestock populations is important for genetic mapping of economically important traits such as disease resistance , and it can reveal population history and breed development [6, 7]. Moreover, genome wide association (GWAs) studies as well as genomic selection in livestock rely on the existence of LD between causative variants and genetic markers [8, 9]. Recent advances in genotyping technology allow high density genotyping of single nucleotide polymorphisms (SNP) for several livestock species such as cattle , chickens  and pigs . Obtaining high density genotypes from a sample of individuals allows for the estimation of genome-wide LD and persistence of phase among breeds .
Previous studies have shown that the extent and persistence of LD in livestock [14–16] is much larger than that found in human populations , due to selection and smaller effective population size in livestock species [1, 17]. Using dense markers to cover the genome increases the likelihood of SNP markers to be in high LD with causative genes altering important production phenotypes . Meuwissen et al.  proposed that the merit of these markers in livestock would be in the parallel use of all markers to derive genomic breeding values (GEBV) as a composite score of all individual SNP effects rather than improving mapping of quantitative trait loci (QTL).
The implementation of genomic selection using GEBV has been successful in dairy cattle [8, 20, 21], and is currently being tested in laying chickens , and pigs . The reliability of GEBV prediction relies on the level of LD between markers and QTL, the origin of such LD (either within family or population-wise), the number of animals used in the training population as well as heritability of the trait . In this study it is our objective to estimate and describe genome wide levels of LD in four pig breeds using high density genotypes. We also estimate population-wise LD for a variety of panels with lower marker density in order to estimate the number of markers needed to reach a given level of LD. We estimate persistence of phase between the four breeds in this study as a measure of relationship between these populations.
Estimation of Linkage Disequilibrium
To estimate LD, we genotyped 351 animals in 117 sire/dam/offspring trios across four breeds of pigs (Duroc, Hampshire, Landrace and Yorkshire) using the Illumina PorcineSNP60 BeadChip . We used BEAGLE  to build haplotypes and estimated pairwise r2 for all SNP on the same chromosome using equation (1). Average r2 between adjacent markers within breed was estimated using equation (2). Average r2 at various distances was computed by grouping all SNP combinations by their pairwise distance in classes of 100 kb of length starting at 0 to 10 Mb. Figure 1 displays an overview of the decline of r2 over distance in each breed. In addition, Table 1 displays average r2 for adjacent markers and at 0.5, 1 and 5 Mb. The average r2 between adjacent SNP was largest in the Duroc animals (r2 = 0.46), followed by Hampshire (r2 = 0.44), whereas Yorkshire and Landrace exhibited the smallest average r2 (0.39 and 0.36 respectively; Table 1). Marker pairs with an average distance of 1 Mb had an average r2 of 0.20 for Hampshire, 0.19 for Duroc, 0.16 for Yorkshire and 0.15 for Landrace. For all breeds, at least 54% of the adjacent SNP had r2 ≥ 0.2 and 44% had r2 ≥ 0.3. For most chromosomes, average r2 between adjacent SNP in Duroc and Hampshire was larger than average r2 in Landrace or Yorkshire. In addition to estimating average r2 within distance classes, we also computed average r2 between adjacent markers for different marker densities. To obtain marker sets with various SNP densities we sequentially removed markers from the current map using every second, fourth, 10th, 50th, 100th and 200th marker (Table 2). Average r2 decreased between 6% for Yorkshire to 15% for Hampshire when only 50% of the markers were used, with highest average r2 for Duroc (r2 = 0.40) followed by Hampshire (r2 = 0.37), Yorkshire (r2 = 0.34) and the lowest for Landrace (r2 = 0.30). Using only every 10th marker, average r2 decreased to around 50% of the original r2 (r2 = 0.20-0.25), and using every 100th marker average r2 ranged from 0.05-0.07 at an average marker distance of 6.5 Mb, which was comparable to the results found for average r2 at 5 Mb.
Persistence of Phase
Persistence of phase is a measure of the degree of agreement of LD phase for pairs of SNP between two populations. To estimate persistence of phase, we calculated r ij as the square root of r ij 2 in equation (1) between all possible combinations of SNP i and j respectively, using the sign of the non-squared numerator. If r2 between two markers is equal in two populations, but their corresponding r has opposite sign, the gametic phase is reversed . Persistence of phase over a certain genomic distance interval can be estimated as the pairwise Pearson correlation coefficient (R k, k' ) of inter-marker r ij between two populations k and k' (Equation 3). For all pairwise comparisons of breeds we estimated R k, k' and the percentage of SNP with reversed sign of r. Similar to our computation of average r2, we grouped SNP pairs in classes of inter-marker distances 100 kb long and computed persistence of phase within each class starting at 0 up to 10 Mb (Figure 2). In theory, the Pearson correlation coefficient ranges between -1 and 1. Large negative values are a result of high LD (r2) in both breeds but phase is reversed between them. High positive values are a result of high r2 and equal phase in both breeds . Correlation of phase between SNP less than 100 kb apart ranged from 0.73 for Duroc with Hampshire and Yorkshire to 0.82 for Landrace with Yorkshire. Considering SNP pairs with an average distance of 0.9 to 1 Mb, correlation of phase decreased to 0.41 for Duroc with Hampshire and to 0.57 for Yorkshire with Landrace (Table 3). Persistence of phase decreased with increasing marker distance at a rate comparable to that observed for the decrease in average r2 with increasing marker spacing. The slope of the decline was lower for the correlation between Landrace and Yorkshire when compared to other breed comparisons. Applying a z-test with Fishers' transformation  to the correlation of phase at < 10 kb, the correlation of phase between Landrace and Yorkshire was significantly larger (p < 0.001, n = 1,520) than all other breed combinations. Results for the correlation of phase were not significantly different (p > 0.05, n = 1,520) in the Duroc-Hampshire, Duroc-Landrace, Duroc-Yorkshire, Hampshire-Yorkshire, and Hampshire-Landrace pairings (Table 3). For these five population comparisons, the average proportion of SNP with r having opposite sign ranged between 9-11% for SNP spaced within 10 kb and up to 45-49% for SNP spaced between 4.9 and 5 Mb (Table 3). In general, the estimates of r with reversed sign for Landrace-Yorkshire were lower ranging from 9% to 45%. These results suggested a closer population relationship between the Landrace and Yorkshire populations than among all other populations.
Extent of Linkage Disequilibrium
Current effective population size of the breeds used in this study was previously estimated, using pedigree information, to be between 74 (Landrace) and 113 (Duroc, Yorkshire) . Consistent with having the largest current effective population size, we find that long range r2 (10 Mb, Figure 1) estimated from our data was smallest for Duroc and Yorkshire (0.035, 0.03). In Hampshire, a smaller effective population of 109 corresponded to higher r2 at 10 Mb (0.046). Due to the similar long range r2 (0.035) at 10 Mb we would have expected the Landrace population to have an effective population size comparable to that of Duroc and Yorkshire. However, using pedigree data Welsh et al.  estimated the current effective population size of Landrace to be 74. The reason for this discrepancy remains unknown. Several previous studies investigated LD in pigs using reduced numbers of microsatellite markers and fewer animals from commercial populations [17, 27]. Nsengimana et al.  found relatively large estimates of LD (D') from 0.29 to 0.41 using 15 microsatellite markers. In contrast, using r2 instead of D' and thereby correcting for minor allele frequency, Harmegnies et al.  found r2 ranging from 0.15 to 0.19 for marker distance < 1 cM and 0.10-0.12 for markers spaced between 1 cM and 5 cM, using 29 microsatellite markers on SSC15, comparable to our results of r2 between 0.16-0.22 for markers spaced between 1 and 5 Mb. Du et al.  estimated r2 from 4,500 SNP markers in six commercial lines of pigs and found estimates of average r2 = 0.51 for markers less than 0.1 cM apart, and estimates of 0.21 and 0.07 at marker distances of 1 cM and 5 cM respectively. Similarly, our populations had average r2 of 0.15 to 0.20 and 0.05 to 0.08 at marker distances of 1 Mb and 5 Mb, respectively. A recent study conducted by Uimari & Tapio  used the same genotyping platform as our study to estimate r2 and effective population size in Finnish Landrace and Yorkshire populations. Uimari & Tapio  found average r2 of 0.43 and 0.46 for adjacent markers in the Finnish Landrace and Yorkshire populations, respectively, which was higher than our results of 0.36 for Landrace and 0.39 for Yorkshire. In addition, Uimari & Tapio reported that the r2 for markers spaced at 5 Mb decreased to 0.09 and 0.12 in the Finnish Landrace and Yorkshire breeds, respectively. In the present study, r2 declined further to 0.05-0.06 at 5 Mb marker spacing for Landrace and Yorkshire (Table 1). The higher average r2 for distant (r2 > 0.2 for 1 Mb) markers in the Finnish populations could be explained by smaller effective population size of the Finnish populations, causing higher r2 on average. This is partially confirmed by comparing the estimated effective size of the Finnish populations (N e = 91, 61 for Landrace and Yorkshire, respectively) , to estimated effective population sizes of the populations used in the current study reported by Welsh et al.  (N e = 74, 113 for Landrace and Yorkshire, respectively), where the current effective population size for Finnish Yorkshire is approximately half that of our Yorkshire population. Compared to recent estimates from Canadian populations we found estimates of average r2 for markers with pairwise distance below 100 kb to be consistent in Landrace (US: 0.34, Canadian: 0.31) and Yorkshire (US: 0.37, Canadian: 0.32) . However, in Duroc estimates of average r2 for markers with pairwise distance below 100 kb were considerably higher in the US population (0.42) compared to the Canadian population (0.31) .
Persistence of Phase
Persistence of phase can be used to infer upon the history of a species and relatedness of breeds within that species as well as on reliability of across population GWA and GEVB prediction . Persistence of phase was previously reported for three Canadian swine breeds (Duroc, Landrace, Yorkshire) . For SNP with pairwise distance below 50 kb we estimated persistence of phase to be 0.88 between Landrace and Yorkshire and 0.82 for both Landrace and Yorkshire with Duroc. In the Canadian breeds persistence of phase also indicates a closer relationship between Landrace and Yorkshire (0.82) and a more distant relationship between Landrace/Yorkshire and Duroc . We found correlation of phase of 0.82 for Landrace/Yorkshire with Duroc, while the Canadian breeds had 0.66/0.65, indicating less agreement of phase even at short pairwise distance . Our results showed that correlation of phase for the pig breeds in this study ranged between 0.87 for Duroc-Yorkshire and 0.92 for Landrace-Yorkshire for markers with pairwise distance < 10 kb. Previous research in Australian cattle breeds  showed correlation of phase between 0.68 for Australian Angus-New Zealand Jersey to 0.97 for Dutch Holstein-Black and White. At increasing marker distance, correlation of phase for the pig breeds in this study decreased (range in r: 0.41 to 0.57 at an average pairwise marker distance of 1 Mb). This decrease however was less than the decrease de Roos et al.  observed in all but two of the cattle breeds they considered (< 0.4 for markers spaced 1 Mb). While correlation of phase was similar between these pig breeds and dairy cattle at short range (< 10 kb), the pig breeds showed generally larger correlation of phase than the dairy cattle  at increasing marker distances.
If two populations diverged from a common ancestral population, their correlation of phase can be expressed as r 0 2(1-c)2T, where r 0 2 is a measure of LD in the common ancestral population, c is the recombination distance between markers and T is time since breed divergence in generations . For markers as close as 10 kb the recombination distance c will be almost 0, so that correlation of phase at those short distances can serve as an estimation of r 0 2 in the common ancestral population. Since correlation of phase was comparable in the pig populations (0.87-0.92) for markers with pairwise distance below 10 kb to that reported in Australian cattle (0.80-0.97) , LD in the common ancestral pig population is likely to be similar to that in the common ancestral population of Australian cattle breeds. Larger correlation of phase at increasing marker distance (1 Mb) in the pig populations used in this study (0.41-0.57) compared to Australian cattle breeds (< 0.40) suggests that T is smaller in our pig breeds than it is in the cattle breeds. The expected correlation of r between two breeds can be expressed as e-2cT . To estimate the time since breed divergence for the pig breeds in this study we used SNP with pairwise distance between 10 kb and 300 kb, and estimated correlation of phase for each 2.5 kb interval. We calculated the linear regression of the natural logarithm of the estimated correlation of phase onto the average pairwise distance c. The slope of this regression is an estimate of -2T. Consequently, the slope divided by -2 is the number of generations (T) since these two breeds have diverged . Results suggest that the pig breeds in this study diverged approximately 40-66 generations ago. The expected correlation of phase would decrease to 0.41 and 0.02 at 1 cM and 5 cM distance respectively in the Yorkshire-Landrace comparison, assuming T of 40 and r 0 2 of 0.92. We observed a correlation of phase of 0.57 and 0.30 at 1 Mb and 5 Mb, respectively, between these two breeds, indicating that a T of 40 may overestimate the actual time since breed divergence. One possible cause of this observation is admixture between these two breeds, causing more common LD between them than what would be expected from fully diverged breeds . We obtained the date of herd book closure for each of the breeds in this study, and assuming a generation interval of approximately 2 years , Duroc, Hampshire, Landrace, and Yorkshire have existed as distinct breeds for at least 38.5, 44.5, 31.5, and 30.5 generations, respectively. The time of herd book closure does not directly indicate the time since breed divergence, since distinguishable breeds must have existed before herd book closure. Nevertheless, the time of herd book closure further supports our observation that Landrace and Yorkshire have developed as separate breeds later than Duroc and Hampshire.
Implications of estimated levels of LD for GEBV implementation
Our results have several important implications for future implementation of genomic selection in swine. Accuracy of prediction of genome wide marker assisted selection can be directly affected by the chosen marker density (resulting in average r2 between markers and QTL), and the size of the training population .
The currently used marker panel, containing approximately 40,000 usable markers, had average r2 of approximately 0.4 between adjacent markers for all four breeds. That exceeds the level of r2 = 0.2 simulated by Meuwissen et al.  to reach prediction GEVB accuracy around 0.85. Furthermore, our results indicated that reducing the original marker panel to 10% of the markers (3,000-4,000 SNP) still resulted in average r2 for adjacent markers exceeding 0.2 in all four breeds. On the other hand, recent research in Australian Holstein Friesian cattle has shown  that using subsets of 3000-5000 SNP to estimate direct genomic breeding values (DGV) could only reach 80% of the prediction accuracy previously estimated using approximately 42,000 SNP. Such a reduction in prediction accuracy will be unacceptable for most practical implementations. However, the accuracy of GEBV predicted by low density panels can be increased through the use of genotype imputation , where high density genotypes are imputed using low density SNP genotypes and a high density reference panel of haplotypes . Weigel et al.  used approximately 10% of 2,693 SNP from Bos Taurus chromosome 1 to impute the full SNP set in a Jersey population. They found that using a high density reference genotype panel (n = 2,542 animals), the imputation accuracy of the non-typed markers was between 0.86 and 0.94. Average r2 in our populations ranged from 0.36 to 0.48 for markers less than 100 kb apart, comparable to average r2 = 0.38 for markers spaced at < 100 kb in the Jersey population . Assuming a comparable decline of LD for increasing marker distance between the Jersey population and our pig populations, we would expect to accurately impute approximately 90% of the high density genotypes, using a low density panel containing 10% of the markers. More recent results reported even higher average accuracy of imputation (approximately 95%) when imputing 42,000 SNP in the Bovine 50 K using the 3 K subset in Holstein cattle . To assess the accuracy of GEBV estimated from imputed genotypes Weigel et al. , used the same Jersey population from their previous study , and they found that the accuracy of GEBV based on imputed markers was 95% of the accuracy of the GEBV estimated using the observed genotypes . As noted above average r2 is similar between the American Jersey population and our pig populations, suggesting that future research in genomic selection in swine should explore the use of imputed low density genotypes to increase cost efficiency. Previous research in humans , and European Holstein cattle  indicated that combining haplotypes from closely related populations can increase the accuracy of genotype imputation, while research in sheep suggests that breed specific reference haplotypes would yield better accuracy . The success of combined haplotypes for genotype imputation depends on the relatedness between the populations. Further research is necessary to determine if persistence of phase is large enough in our pig populations to increase imputation accuracy when combining reference haplotypes across breeds. As noted by Goddard , the accuracy of GEBV prediction can be expressed as a function of the LD between marker and QTL and the accuracy of estimated SNP effects. The loss in accuracy of GEVB prediction caused by imputing instead of observing genotypes could be compensated by increasing the number of animals used to estimate SNP effects. If not enough animals are available for the estimation of SNP effects, animals from different, but closely related, populations could be combined to estimate SNP effects for GEBV prediction in both populations [13, 39]. The squared short-range (< 10 kb) correlation of phase can also serve as the accuracy with which we can predict a marker-QTL association in one population using known marker-QTL associations from another population. For the pig breeds reported in this study the squared correlation of phase for close markers (0-100 kb) ranged from 0.53 to 0.67. To evaluate whether these accuracies would warrant the use of a combined training population to estimate SNP effects accurately for both populations we refer to a simulation study conducted by de Roos et al.  estimating the accuracy of GEBV prediction for combined training populations of highly, moderately and lowly related populations. Correlation of phase for populations diverged approximately T= 30 generations ago was reported to be below 0.80 for markers with pairwise distance below 0.055 cM . We found correlation of phase between Landrace-Yorkshire of around 0.80 at a corresponding marker distance. De Roos et al.  concluded that reliability of GEBV prediction could be increased between 0.05-0.10 points in two populations, when approximately 40,000 marker genotypes are available, heritability is h2= 0.3 or higher, 1000 animals from each population were used to estimate SNP effects, and under the assumption that QTL effects are the same for both populations . In addition, they found that for genetically distant populations, at least 1,000 animals with genotypes and phenotypes available in each population were needed to avoid a decrease in the reliability of prediction . When SNP effects estimated in one population are used to calculate GEBV for another population which diverged approximately T= 30 generations ago, the reliability of the predicted GEBV was 0.65 assuming both high marker density (M = 40,000) and heritability h2 = 1 . Consequently, combining animals into a multi-breed panel to estimate SNP effects is likely to be only marginally beneficial for the pig breeds in this study, given the estimated correlation of phase and the large number of animals and markers required .
We used the PorcineSNP60 chip  to obtain high density genotypes (34,000-40,000 SNP) from pig trios in four breeds. From this data we estimated r2 as a measure of LD across the genome as well as correlation of r, which measures phase agreement between breeds. We found r2 of approximately 0.4 for markers less than 100 kb apart, which is higher than comparable estimates reported for North American Holstein cattle  as well as various Australian cattle breeds . The same was true for average r2 between markers with pairwise distance larger than 1 Mb, indicating a smaller past effective population size of these pig breeds. We also report a relatively slow rate of decay of LD over distance, observing r2 around 0.2 at 1 Mb. The comparably high long range LD is an indicator that good accuracy can be expected for future implementations of GEBV in pigs using 10% (3,000-4,000) of the SNP used in the current assay or less, along with genotype imputation. We would encourage future research in genomic selection in swine to especially focus on the possible benefits of the combined use of reduced marker panels and genotype imputation. To successfully promote the use of genomic selection in swine it will be necessary to increase cost efficiency while maintaining high accuracy of prediction. Currently no low density panels for SNP genotyping are publicly available for swine, but the presented results will be available to aid in the development of efficient SNP platforms. Relatively low persistence of phase reported here implies that the use of multi-breed panels estimating SNP effects for genomic selection will likely be limited, especially when using low density genotypes, but the merit of combining reference haplotypes for genotype imputation should be further investigated.
For this study sire/dam/offspring trios of the Duroc, Hampshire, Landrace and Yorkshire breeds were selected from the National Swine Registry (NSR) pedigree. Selected parents were unrelated for at least two generations. All animals were genotyped using the Illumina PorcineSNP60 (Number of markers M = 62,163) Genotyping BeadChip (Illumina Inc.)  at a commercial laboratory (GeneSeek, a Neogen Company, Lincoln, NE). All SNP showing Mendelian inconsistencies for a trio were set missing in that particular trio. If one or more animals within a trio had missing genotypes in more than 10% of the SNP that trio was eliminated from further analysis. Similarly, SNP were removed if they did not have genotypes available for at least 90% of the samples across all breeds (M CallRate < 0.9 = 5080). Only autosomal SNP were considered in this study, leading to the exclusion of all SNP with an uncertain map position on build 10 of the pig genome sequence, as well as SNP on the sex chromosomes (M non-autosomal = 9308). To exclude non-segregating SNP from the analysis we removed markers with minor allele frequency (MAF) below 5% within each breed separately. The number of fixed SNP varied substantially between breeds: we excluded M MAF < 5% = 13,646 SNP in Duroc, M MAF < 5% = 15,405 SNP in Hampshire, M MAF < 5% = 7,631 SNP in Landrace and, M MAF < 5% = 8,665 SNP in Yorkshire. Additionally, SNP were excluded for failure to meet Hardy Weinberg Equilibrium (p < 0.001) within breeds causing M HWE < 0.001 = 117, 85, 146, and 176 SNP to be discarded in Duroc, Hampshire, Landrace, and Yorkshire respectively. After applying the described filtering criteria, a total of 30, 26, 29, and 32 trios were included for the Duroc, Hampshire, Landrace and Yorkshire breeds, respectively. And a total of 34,129, 32,370, 40,144 and 39,110 SNP spaced at an average distance of 70, 74, 60 and 61 kb satisfied the SNP selection criteria for Duroc, Hampshire, Landrace and Yorkshire, respectively.
Estimation of average LD and persistence of phase
Haplotypes were obtained for the founder animals using the trio option of BEAGLE , phasing the genotypes by chromosome. Sampling animals in trios was shown to yield improved accuracy of estimated haplotypes . To further increase haplotype accuracy, BEAGLE was set to run 100 iterations of the phasing algorithm and sample 100 haplotype pairs for each individual per iteration. Additionally, a short simulation experiment was conducted showing that for MAF above 5% average r2 can be reliably estimated from the current sample size (results not shown). Alleles for each SNP were re-coded as 0/1, keeping the reference allele constant across all four populations, allowing for later determination of phase agreement. Haplotypes and code needed to reproduce these results are publicly available at https://www.msu.edu/~steibelj/JP_files/LD_estimate.html.
For all pairs of SNP r2 was estimated, using allelic frequencies of the founding animals, according to the following equation:
where p i , p j are the marginal allelic frequencies at the ith and jth SNP respectively and p ij is the frequency of the two-marker haplotype , using the freely available software R . Marker pairs were grouped by their pairwise physical distance into intervals of 100 kb starting from 0 up to 10 Mb. Average r2 for SNP pairs in each interval was estimated as the arithmetic mean of all (Equation 1), with the pairwise distance between the ith and jth element of the currently considered interval:
where is the average of all adjacent SNP across 18 autosomes (l), with M l SNP per chromosome. To estimate average r2 between adjacent markers for different marker densities a certain percentage of markers (50%, 75%, 90%, 95%, 99%, and 99.5%) were removed before average r2 was estimated using equation (2). To select markers, an increasing proportion of SNP were sequentially removed solely considering their map position, so that for instance: to reduce a panel to 50%, every second marker was kept for analysis, for 25% every fourth was kept and so on.
To estimate persistence of phase only markers with minimum MAF of 5% in all breeds were included in the analysis, resulting in 22,340 common SNP across all breeds. Correlation of phase was estimated for intervals of 100 kb (from 0 to 10 Mb). We excluded markers with pairwise distance above10 Mb to decrease the computational load. Estimates of average r2 at larger distances are close to zero, which would cause correlation of phase to be close to zero as well. Persistence of phase was then estimated as:
where R k, k' is the correlation of phase between r ij(k) in population k and r ij(k') in population k', s (k) and s (k') are the standard deviation of r ij(k) and r ij(k') respectively, and are the average r ij across all SNP i and j within interval p for population k and k' respectively.
Amaral AJ, Megens H-J, Crooijmans RPMA, Heuven HCM, Groenen MAM: Linkage disequilibrium decay and haplotype block structure in the pig. Genetics. 2008, 179 (1): 569-579. 10.1534/genetics.107.084277.
Conrad DF, Jakobsson M, Coop G, Wen XQ, Wall JD, Rosenberg NA, Pritchard JK: A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics. 2006, 38 (11): 1251-1260. 10.1038/ng1911.
Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R: Linkage disequilibrium in the human genome. Nature. 2001, 411 (6834): 199-204. 10.1038/35075590.
Corbin LJ, Blott SC, Swinburne JE, Vaudin M, Bishop SC, Woolliams JA: Linkage disequilibrium and historical effective population size in the Thoroughbred horse. Animal Genetics. 2010, 41: 8-15.
Pritchard JK, Donnelly P: Case-control studies of association in structured or admixed populations. Theor Popul Biol. 2001, 60 (3): 227-237. 10.1006/tpbi.2001.1543.
Nordborg M, Tavare S: Linkage disequilibrium: what history has to tell us. Trends Genet. 2002, 18 (2): 83-90. 10.1016/S0168-9525(02)02557-X.
Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM: Recent human effective population size estimated from linkage disequilibrium. Genome Research. 2007, 17 (4): 520-526. 10.1101/gr.6023607.
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review: Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci. 2009, 92 (2): 433-443. 10.3168/jds.2008-1646.
Goddard ME, Hayes BJ: Genomic selection based on dense genotypes inferred from sparse genotypes. Proceedings of the 18th Conference of the Association for the Advancement of Animal Breeding and Genetics. 2009, Barossa Valley, Australia: Association for the Advancement of Animal Breeding and Genetics, 26-29.
Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O'Connell J, Moore SS, Smith TPL, Sonstegard TS: Development and Characterization of a High Density SNP Genotyping Assay for Cattle. PLoS One. 2009, 4 (4): e5350-10.1371/journal.pone.0005350.
Groenen MAM, Wahlberg P, Foglio M, Cheng HH, Megens H-J, Crooijmans RPMA, Besnier F, Lathrop M, Muir WM, Wong GK-S: A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. Genome Res. 2009, 19 (3): 510-519.
Ramos AM, Crooijmans RPMA, Affara NA, Amaral AJ, Archibald AL, Beever JE, Bendixen C, Churcher C, Clark R, Dehais P: Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One. 2009, 4 (8): e6524-10.1371/journal.pone.0006524.
Goddard M, Hayes B, McPartlan H, Chamberlain AJ: Can the same genetic markers be used in multiple breeds. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production, Belo Horizonte, Brazil 2006. 2006, 16-22.
de Roos APW, Hayes BJ, Spelman RJ, Goddard ME: Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 2008, 179 (3): 1503-1512. 10.1534/genetics.107.084301.
Sargolzaei M, Schenkel FS, Jansen GB, Schaeffer LR: Extent of linkage disequilibrium in Holstein cattle in North America. J Dairy Sci. 2008, 91 (5): 2106-2117. 10.3168/jds.2007-0553.
Uimari P, Tapio M: Extent of linkage disequilibrium and effective population size in Finnish Landrace and Finnish Yorkshire pig breeds. J Anim Sci. 2011, 89 (3): 609-614. 10.2527/jas.2010-3249.
Harmegnies N, Farnir F, Davin F, Buys N, Georges M, Coppieters W: Measuring the extent of linkage disequilibrium in commercial pig populations. Anim Genet. 2006, 37 (3): 225-231. 10.1111/j.1365-2052.2006.01438.x.
Goddard M: Genomic selection: prediction of accuracy and maximisation of long term response. Genetica. 2009, 136 (2): 245-257. 10.1007/s10709-008-9308-0.
Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157 (4): 1819-1829.
VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS: Invited review: Reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009, 92 (1): 16-24. 10.3168/jds.2008-1514.
Goddard ME, Hayes BJ: Genomic selection. J Anim Breed Genet. 2007, 124 (6): 323-330. 10.1111/j.1439-0388.2007.00702.x.
Wolc A, Stricker C, Arango J, Settar P, Fulton JE, O'Sullivan NP, Preisinger R, Habier D, Fernando R, Garrick DJ: Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Genet Sel Evol. 2011, 43: 5-10.1186/1297-9686-43-5.
Cleveland MA, Forni S, Garrick DJ, Deeb N: Prediction of Genomic Breeding Values in a Commercial Pig Population. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production, Leipzig, Germany:. 2010, 47-August
Browning BL, Browning SR: A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. Am J Hum Genet. 2009, 84 (2): 210-223. 10.1016/j.ajhg.2009.01.005.
Cohen J, Cohen P, West SG, Aiken LS: Applied multiple regression/correlation analysis for the behavioral sciences. 2003, Mahwah, NJ: Lawrence Erlbaum Associates, 3
Welsh CS, Stewart TS, Schwab C, Blackburn HD: Pedigree analysis of 5 swine breeds in the United States and the implications for genetic conservation. J Anim Sci. 2010, 88 (5): 1610-1618. 10.2527/jas.2009-2537.
Nsengimana J, Baret P, Haley CS, Visscher PM: Linkage disequilibrium in the domesticated pig. Genetics. 2004, 166 (3): 1395-1404. 10.1534/genetics.166.3.1395.
Du F, Clutter A, Lohuis M: Characterizing linkage disequilibrium in pig populations. Int J Biol Sci. 2007, 3 (3): 166-178.
Jafarikia M, Maignel L, Wyss S, Sullivan B: Linkage Disequilibrium in Canadian Swine Breeds. 9th World Congress of Genetics Applied to Livestock Production. 2010, Leipzig, Germany
Sved JA, McRae AF, Visscher PM: Divergence between human populations estimated from linkage disequilibrium. Am J Hum Genet. 2008, 83 (6): 737-743. 10.1016/j.ajhg.2008.10.019.
Moser G, Khatkar MS, Hayes BJ, Raadsma HW: Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genet Sel Evol. 2010, 42: 37-10.1186/1297-9686-42-37.
Weigel KA, de Los Campos G, Vazquez AI, Rosa GJ, Gianola D, Van Tassell CP: Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J Dairy Sci. 2010, 93 (11): 5423-5435. 10.3168/jds.2010-3149.
Weigel KA, Van Tassell CP, O'Connell JR, VanRaden PM, Wiggans GR: Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. J Dairy Sci. 2010, 93 (5): 2229-2238. 10.3168/jds.2009-2849.
Villa-Angulo R, Matukumalli LK, Gill CA, Choi J, Van Tassell CP, Grefenstette JJ: High-resolution haplotype block structure in the cattle genome. BMC Genet. 2009, 10: 19-
Johnston J, Kistemaker G, Sullivan PG: Comparison of different imputation methods. 2011 Interbull Open Meeting. 2011, Stavanger, Norway
Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, Scheet P: Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet. 2009, 84 (2): 235-250. 10.1016/j.ajhg.2009.01.013.
Dassonneville R, Brondum RF, Druet T, Fritz S, Guillaume F, Guldbrandtsen B, Lund MS, Ducrocq V, Su G: Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J Dairy Sci. 2011, 94 (7): 3679-3686. 10.3168/jds.2011-4299.
Hayes BJ, Bowman PJ, Daetwyler HD, Kijas JW, van der Werf JHJ: Accuracy of genotype imputation in sheep breeds. Animal Genetics. 2011, no-no
Ibanez-Escriche N, Fernando RL, Toosi A, Dekkers JC: Genomic selection of purebreds for crossbred performance. Genet Sel Evol. 2009, 41: 12-10.1186/1297-9686-41-12.
de Roos AP, Hayes BJ, Goddard ME: Reliability of genomic predictions across multiple populations. Genetics. 2009, 183 (4): 1545-1553. 10.1534/genetics.109.104935.
Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR: A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet. 2006, 78 (3): 437-450. 10.1086/500808.
Devlin B, Risch N: A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics. 1995, 29 (2): 311-322. 10.1006/geno.1995.9003.
R Development Core Team: R: A language and environment for statistical computing. 2009, Vienna, Austria: R Foundation for Statistical Computing
This project was supported by Agriculture and Food Research Initiative Competitive Grant no. 2010-65205-20342 from the USDA National Institute of Food and Agriculture, and by funding from the National Swine Registry. Technical assistance of Whitney Hosier and Nancy Raney is appreciated. Computer resources and programming advice were provided by the Michigan State University High Performance Computing Center (HPCC).
JPS, ROB, CS, and CWE designed the experiments. CS and ROB identified trios for data collection and oversaw the collection of tissue samples. CWE coordinated the genotyping. JPS and YMB completed statistical analysis and wrote the manuscript. All authors read and approved the paper.