This is a comprehensive study of LD with a high density SNP panel in HF dairy cattle and currently reflects the best estimates of genome-wide LD in this breed based on number of animals screened and number of SNPs genotyped. The pairwise measures of LD decline over increasing distance between SNPs. The LD estimated by D' appears to be quite extensive and is much higher in cattle than in humans. This may be due to random drift caused by relatively small effective population in dairy cattle . Comparable estimates of extensive LD based on D' estimated from microsatellites have been reported in cattle [17–21], sheep , pig , horse , dog  and chicken . The LD between SNP markers reported in the present study is slightly smaller than that estimated between microsatellite markers in earlier studies [17, 21] and this may be explained in part by the differences in the mutation rate between these two types of markers and reflect the more recent origin of microsatellite polymorphisms. Secondly this may be due to the higher power for detecting LD when using markers with many alleles (e.g., microsatellites) as compared to biallelic SNP markers . The difference in LD detected using SNP and microsatellite loci was more pronounced in humans .
The D' metric has been suggested as a good measure to explain the extent of LD in population and variation of LD over the genome [22, 15]. However, individual values of D' are more influenced by variation in allele frequencies than for the r
2 metric. This is reflected in the inflated D' values at low MAF . For non-syntenic SNP pairs we observed higher values of D' for the pairs with low mean MAF. It is known that SNPs of different MAFs have different LD properties . Higher D' between loci with rare alleles is mainly due to population genetic effects (rare alleles are, in general, younger than common alleles, and hence may still be in LD) and to effects of sampling. Smaller samples may fail to sample rare fourth gametes and, therefore, can inflate the D' estimate . On the other hand SNPs with rare alleles tend to have lower r
2 values. The inflated D' estimates between non-syntenic SNPs with rare alleles are probably due to the effect of sampling caused by random drift or the loss of rare haplotypes in sampling in the present study.
The decline in LD is much steeper for r
2 than for D'. Such differences in the pattern of D' and r
2 over physical distances have previously been observed in humans [38, 39]. These two measures, which have been widely used in practice, have different statistical properties : D' focuses on historical recombination, which is central to defining the extent and pattern of LD over a genome. However, r
2 is more useful for predicting the power of association mapping. To obtain the same power as obtained when testing the causative quantitative trait nucleotide (QTN)/mutation, the sample size required for association mapping is inversely proportional to r
2 [13, 40, 41]. From the pattern of decline of r
2 the average useful LD for single-point association mapping in this population extends only up to 40 kb which suggests that at least 75,000 SNPs are required for a whole genome association scan. At this distance r
2 values between adjacent SNPs decrease to an average of ~0.3 (median r
2 = 0.19). At this spacing a QTN would be at a maximum of 20 kb (located at the mid point of the interval) from an adjacent SNP which would give an average r
2 of 0.4 (median r
2 = 0.28) between SNP and QTN. However, if more stringent criteria are considered for higher power genome scans, then the number of SNPs required would be 150,158 (one SNP every 20 kb) and 300,631 (one SNP every 10 kb) to obtain average r
2 values between adjacent SNPs of 0.4 (median r
2 = 0.28), and 0.6 (median r
2 = 0.62), respectively. In addition there is a lot of variation in r
2, as indicated by the large standard deviation of r
2 (Table 2), within an interval of extent of LD considered. The 25th percentile of r
2 (Table 2) indicates that only 75 % of the pairs of SNPs in the 1–10 kb distance bin have r
2 of more than 0.2. Similar low values of the 25th percentile are noted for other distance bins in Table 2. The variation in r
2 in a distance bin is partly because of the variation in LD in different genomic regions. In addition r
2 is dependent upon the matching allelic frequencies and are known to have very low values between markers even at very short distances . This variation has been ignored in most studies while suggesting the number of SNPs required for genome scans based on average r
2 . To accommodate this variation, more SNPs will need to be genotyped in each interval which will increase the estimate of the total number of SNPs required for association mapping. However, these partial correlations can be exploited using multi-markers haplotype analysis which provides more discriminatory power as compared to individual SNPs to detect putative causal mutation. Recently McKay et al.  suggested one SNP every 100 kb to obtain an average r
2 of 0.15-0.2 between adjacent SNPs. Gautier et al.  reported the LD analysis of 526 SNPs mostly located on one chromosome in 14 breeds and suggested a common panel of 300,000 SNPs (one SNP every 10 kb) for association mapping in different breeds similar to the requirement for a high power panel within-breed study as shown here.
The number of SNPs required for association mapping can be reduced by excluding some of the redundant SNPs, by optimally using the LD information present in the population. However, there are differences in the pattern of LD across the genome. This can be addressed more precisely by identifying tag SNPs based on haplotype block structure information, as was done in the human HapMap project . An attempt to construct the haplotype block map of the bovine genome and the concomitant identification of tag SNPs from this dataset was presented by Khatkar et al.  but at present only covers a relatively small proportion of the bovine genome based on 15,036 markers. Genome-wide identification of tags SNPs for the whole genome would be possible from a saturated haplotype block map of the bovine genome. Until such maps become available, the extent of LD as expressed by r
2 can provide an interim guide for number and spacing of the SNPs over the genome.
The extent of LD within a genome can be affected by a number of factors. Our results confirm that MAF has direct effects on the estimation of extent of LD. The r
2 between common SNPs is higher especially for SNPs at close physical distances. On the other hand D' between SNPs with low MAF is higher at longer distances. Similarly, sample size also affects the estimation of LD. The results in this study clearly indicate that the estimate of D' is affected most by sample size. It seems that for reliable estimates of D' a sample of 400 or more is required. Similar observations were also made for Dvol. The requirement of sample size would be even higher in human because of the larger effective population size. Hence, it may be suggested that analyses utilizing D' matrix (like construction of LD maps and HapMap) should be based on a large sample size and preferably from within-breed group samples.
However, the identification of tag SNPs, which is generally based on r
2 values, can be accomplished using smaller samples. Similar estimates of correlations between the estimates of r
2 were obtained from different samples in a simulation study by Visscher . Similarly, a small sample size of 50 and above can provide precise estimate of MAF for common SNPs in the population (Table 3).