- Research article
- Open Access
Study of whole genome linkage disequilibrium in Nellore cattle
BMC Genomics volume 14, Article number: 305 (2013)
Knowledge of the linkage disequilibrium (LD) between markers is important to establish the number of markers necessary for association studies and genomic selection. The objective of this study was to evaluate the extent of LD in Nellore cattle using a high density SNP panel and 795 genotyped steers.
After data editing, 446,986 SNPs were used for the estimation of LD, comprising 2508.4 Mb of the genome. The mean distance between adjacent markers was 4.90 ± 2.89 kb. The minor allele frequency (MAF) was less than 0.20 in a considerable proportion of SNPs. The overall mean LD between marker pairs measured by r2 and |D'| was 0.17 and 0.52, respectively. The LD (r2) decreased with increasing physical distance between markers from 0.34 (1 kb) to 0.11 (100 kb). In contrast to this clear decrease of LD measured by r2, the changes in |D'| indicated a less pronounced decline of LD. Chromosomes BTA1, BTA27, BTA28 and BTA29 showed lower levels of LD at any distance between markers. Except for these four chromosomes, the level of LD (r2) was higher than 0.20 for markers separated by less than 20 kb. At distances < 3 kb, the level of LD was higher than 0.30. The LD (r2) between markers was higher when the MAF threshold was high (0.15), especially when the distance between markers was short.
The level of LD estimated for markers separated by less than 30 kb indicates that the High Density Bovine SNP BeadChip will likely be a suitable tool for prediction of genomic breeding values in Nellore cattle.
Nellore is a beef cattle (Zebu) breed that originated in India. The first specimens of the breed arrived in Brazil at the end of the 18th century and Nellore animals rapidly became the predominant breed in the Brazilian herd . There are about 200 million cattle heads in Brazil and most of them (about 80%) are Zebu animals and their crossbreds . Over the past decades, there has been an increased interest to use genetically evaluated animals in the Zebu population. As a consequence, several genetic evaluation programs of Zebu breeds exist, particularly for Nellore cattle. The main focus of these programs is growth and conformation traits, which are used as selection criteria .
The breeding value of animals can be obtained from genomic data by marker-assisted selection covering the whole genome, also called genomic selection [4, 5]. Genomic selection explores the linkage disequilibrium (LD) between markers, assuming that the effects of chromosome segments will be the same in the whole population since the markers are in LD with genes that are responsible for expression of the trait (quantitative trait loci, QTL). Therefore, the density of markers should be sufficiently high to guarantee that all QTL are in LD with a marker or with a marker haplotype. The LD maps are important tools for exploring the genetic basis of economically important traits in cattle. Likewise, comparison of LD maps permits to establish the diversity between cattle breeds with different biological attributes and to identify genome regions that were subject to different selection pressures .
The two measures most commonly used to evaluate LD between biallelic markers are r2 and |D'| [7–10]. These parameters can vary between 0 and 1. A value of |D'| < 1 indicates the occurrence of recombination between two loci, and |D'| = 1 indicates the lack of recombination between two loci. One disadvantage of |D'| is that it tends to be strongly overestimated in small samples and in the presence of rare or low-frequency alleles. The r2 parameter represents the correlation between two loci and is preferred in association studies since an inverse relationship exists between r2 and the size of the sample needed for the same detection power. Linkage disequilibrium is necessary to detect associations between a QTL and a marker .
The LD between markers has been studied in the genome of taurine breeds. In this respect,  analyzing 505 SNPs located on chromosome 14 of Holstein cattle, reported moderate levels of LD (r2 = 0.2) for markers separated by less than 100 kb. Similar results have been reported by  who estimated the LD (r2) between 2,670 markers in eight cattle breeds. Villa-Angulo et al., 2009  studied the genomes of 19 taurine and Zebu breeds using a set of 32,826 SNPs. The authors observed that Zebu breeds have a higher proportion of low-frequency alleles and a lower level of LD than taurine breeds. Recently,  genotyped 25 Gyr bulls using a panel of 54,000 markers (SNPs) and obtained a mean LD (r2) between adjacent markers of 0.21.
The first step necessary to determine the number of markers required for QTL mapping and genomic selection is the quantification of the extent of LD in the cattle genome. Therefore, the objective of the present study was to evaluate LD in Nellore cattle using a high density SNP panel (Illumina High Density Bovine SNP BeadChip®).
Results and discussion
The results of descriptive statistics of the SNP markers and LD (r2 and |D'|) between synthetic adjacent markers obtained for each autosome are shown in Table 1. A total of 446,986 (57.5%) markers met the filtering criteria and were included in the final analysis. This sub-set of markers comprised 2,508.4 Mb of the genome, with a mean distance between markers of 4.90 ± 2.89 kb. The SNPs were uniformly distributed across all autosomes since the marker density was similar for all chromosomes, ranging from 4.9 to 5.2 kb (Table 1). The autosomes differed in size, with BTA25 being the shortest chromosome (42.8 Mb) and BTA1 the longest (158.5 Mb).
After filtering of the SNP data, MAF < 0.20 were observed in a considerable proportion of SNPs (Figure 1). Similar results have been reported by  and  for Zebu breeds. However, the mean MAF obtained in the present study (0.25) was slightly higher than that reported by  for Nellore cattle (0.19) and by the Bovine Hapmap Consortium using the Illumina Bovine SNP50K BeadChip for Nellore cattle (0.20) . According to , the threshold for MAF affects the distribution and extent of LD. Chromosomes BTA2, BTA4, BTA7, BTA15, BTA17, BTA25 and BTA26 presented a higher proportion of minor alleles (MAF < 0.10), whereas chromosomes BTA6, BTA8, BTA16, BTA22 and BTA23 presented a lower proportion of minor alleles (MAF < 0.10).
All possible SNP pairs on the same chromosome separated by ≤ 100 kb produced 9,254,142 combinations of SNP pairs to estimate LD across the 29 autosomes. The overall mean LD between marker pairs measured by r2 and |D'| was 0.17 and 0.52, respectively. Silva et al., 2010  genotyped 25 Gyr sires using a panel of 54,000 markers (SNPs) and obtained a mean LD between adjacent markers measured by r2 and |D'| of 0.21 and 0.68, respectively. The present results and those reported in previous studies confirm that the |D'| parameter overestimates LD, especially in cases of low MAF.
The mean LD between adjacent SNPs across autosomes ranged from 0.003 to 0.21 for r2 and from 0.12 to 0.59 for |D'| (Table 1). Silva et al., 2010  reported slightly higher values for Gyr cattle, ranging from 0.17 to 0.24 for r2 and from 0.60 to 0.72 for |D'|, respectively. Lower levels of LD (r2 < 0.16) were estimated for chromosomes BTA1, BTA27, BTA28 and BTA29. This relatively low level of LD obtained for these chromosomes is in contrast to findings previously published for Zebu breeds [6, 14]. According to , there is a wide variation in autosomal recombination rates, a fact, among others, that leads to marked diversity in the pattern of LD in different genomic regions. However, the results obtained in this study for BTA1, BTA27, BTA28 and BTA29 can probably be attributed to a sampling variation since the number of markers, marker density, mean MAF or proportion of MAF did not differ from the other autosomes studied.
To analyze the decline in LD according to physical distance between markers, synthetic SNP pairs were classified into intervals (bins) based on the distance between markers and mean values of r2 and |D'| were estimated for each bin per autosome (Figures 2 and 3) and for the whole genome (Table 2). The LD decreased with increasing physical distance between markers (Table 2). In contrast to this clear decrease of LD measured by r2, the changes in |D'| indicated a less pronounced decline of LD (Figures 2 and 3). Moderate levels of r2 (0.20 to 0.34) were observed at distances < 30 kb. When the distance between markers increased from 30 to 100 kb, the mean r2 value decreased from 0.20 to 0.11. A high variability in r2 estimates was observed for marker distances of more than 10 kb. Markers showing LD (r2) higher than 0.30 and 0.15 had an average spacing of 38.9 and 41.8 kb, respectively. However, not all markers with a spacing of 40 to 50 kb presented an r2 value higher than 0.3. For distances of less than 40 kb, the proportion of markers with an r2 > 0.15 and > 0.30 ranged from 35 to 57% and from 21 to 42%, respectively. This proportion was lower than that reported by  (68.34%) for markers spacing from 0 to 0.1 Mb, who genotyped 821 sires using 5,564 SNPs and the same threshold (0.30) for LD (r2). Recently,  genotyped 810 Holstein cattle using the Illumina Bovine SNP50K panel and found that, for SNPs separated by less than 100 kb, the proportion of those in LD (r2) > 0.25 was 29%.
Except for autosomes BTA1, BTA27, BTA28 and BTA29, the level of LD (r2) was higher than 0.20 for markers separated by less than 20 kb, and higher than 0.30 for markers separated by less than 3 kb. For marker distances higher than 100 kb, the level of LD (r2) decreased from 0.11 (100 kb) to 0.05 (1,000 kb) (data not shown). McKay et al., 2007  estimated the LD between all marker pairs (synthetic markers) in eight cattle breeds (Bos taurus and Bos indicus) and reported a mean LD (r2) ranging from 0.15 to 0.20 for a physical distance of 100 kb between adjacent markers.
In the present study, certain autosomes presented higher LD than others. In addition, when autosomes with low levels of LD (r2 < 0.17) were excluded (BTA1, BTA27, BTA28 and BTA29), a linear relationship was observed between chromosome length and LD (r2), i.e., the level of LD increased with increasing chromosome size. According to , recombination rates decrease as the length of the chromosome increases. In a recent study,  found no association between chromosome size and level of LD. However, these authors used a Bos taurus cattle population and a much lower marker density.
The use of SNP pairs with low allele frequencies tends to underestimate LD. Polymorphisms with high allele frequencies are thus preferred for a less biased estimation of LD . We therefore analyzed the effect of MAF on the estimates of |D'| and r2 (Figures 4 and 5). The LD (r2) between markers was higher when the MAF threshold was high (0.15), particularly when the distance between markers was short (Figure 4). Yan et al., 2009 , genotyping 632 maize lines using 1,229 SNP markers, showed that the LD (r2) between markers increased with increasing MAF threshold, especially in the case of very close SNP pairs (0–10 kb). For adjacent markers (< 10 kb), the |D'| remained unchanged for different MAF thresholds (Figure 5). For more distant markers, the |D'| was lower as the MAF threshold increased. According to , the LD measured by |D'| is underestimated as the MAF threshold increases (above 0.25). When LD is determined by |D'|, the denominator in the formula is the product between allele frequencies. Thus, in the case of SNP pairs with low allele frequencies, D' will be divided by a small number, resulting in a large value for |D'| . The results of the present study indicate a considerable variation in the magnitude and pattern of LD in the Nellore genome. As a consequence, two markers that are very close may show a low level of LD, whereas more distant markers may show a higher level of LD than expected. This variation is probably due to different recombination rates between and within chromosomes, heterozygosity, genetic drift, and effects of selection .
The level of LD between adjacent markers (distance of less than 30–40 kb) observed in the present study was lower than that reported in other studies on Bos taurus cattle and similar to that found in studies using Bos indicus. The differences between taurine and indicine breeds decrease for markers separated by 80 to 100 kb. However, it is generally difficult to compare the level of LD obtained in different studies because of differences in sample size, measures of LD, type of markers and marker density, as well as because of the recent history of the population . Nevertheless, differences between indicine and taurine cattle that occurred during the historical process of domestication and selection and as a consequence of the effective size of populations seem to explain the discrepancy in LD at short distances between markers . Another reason is the fact that Bos indicus populations present a higher proportion of low-frequency alleles in the HD SNP chip than Bos taurus populations which, in turn, influences LD estimates [6, 24].
The level of LD estimated for markers separated by less than 30 kb indicates that the High Density Bovine SNP BeadChip will likely be a suitable tool for prediction of genomic breeding values in Nellore cattle. Further studies investigating the magnitude of LD in a larger sample of animals from this population are needed to confirm the estimates obtained here.
Seven hundred and ninety five Nellore bulls born in 2008 and 2009 from 117 sires, which belonged to the three Brazilian beef cattle breeding programs, were used in the present study. This research did not involve humans and the Animal Care and Use Committee approval was not obtained for this study because the data were from an existing database. Genotyping was performed by high density bead array technology using the Illumina Infinium HD Assay® and Illumina HiScan system®. The High Density Bovine SNP BeadChip contains 777,962 SNP markers spread across the genome at a mean distance of 3.43 kb between markers. The HiScan images and genotypes were first analyzed using the Genome Studio® software (Illumina). A total of 1,465 markers were excluded due to unknown genome position and 15,116 markers were monomorphic. For sake of the present study, only autosomal markers with minor allele frequencies (MAF) higher than 0.05, 0.10 or 0.15 were included in the LD analysis. In addition, only markers with a call rate > 0.90 and heterozygote excess < 0.30 were considered. A total of 11,785 markers were excluded because they showed low mean cluster intensity (AB_R, AA_R or BB_R: mean < 0.3).
For DNA extraction, about 5 g of longissimus dorsi muscle sample was removed and stored in a 2 ml Eppendorf tube. The tubes were identified with the identification of each animal and then stored in styrofoam boxes in a freezer at −20°C. Next, 25 to 30 mg of muscle tissue specimens were weighed on an aluminum sheet using an analytical balance and transferred to Eppendorf tubes (1.5 to 2 ml). DNA was extracted from the muscle samples using the DNeasy Blood & Tissue Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer’s instructions.
The LD between two SNPs was evaluated using r2 and the absolute value of D'. The r2 was calculated as follows:
where freq. A, freq.a, freq. B and freq.b are the frequencies of alleles A, a, B and b, respectively, and freq. AB, freq.ab, freq.aB and freq. Ab are the frequencies of haplotypes AB, ab, aB and Ab in the population, respectively. If the two loci are independent, the expected frequency of haplotype AB (freq. AB) is calculated as the product between freq. A and freq. B. A freq. AB higher or lower than the expected value indicates that these two loci in particular tend to segregate together and are in LD. The measures of LD (r2 and |D'|) were calculated for all marker pairs of each chromosome using the SnppldHD software (Sargolzaei, M., University of Guelph, Canada).
Only maternal haplotypes were considered for the estimation of LD measures (r2 and |D'|). The exclusive use of maternal haplotypes is a common practice in studies estimating LD when the population consists of half-sib families, as was the case here. The reason is that the pedigree structure leads to the over-representation of paternal haplotypes in the sample since sires have multiple progenies in the dataset, which might increase the frequency of certain haplotypes and consequently overestimate LD .
Single nucleotide polymorphism
Minor allele frequency
Quantitative trait loci.
Albuquerque LG, Mercadante MEZ, Pereira EJ: Recent studies on the genetic basis for selection of Bos indicus for beef production. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production: 13–18. 2006, Belo Horizonte, Brazil: , August
ANUALPEC: Anuário da Pecuária Brasileira. 2011, São Paulo: Instituto FNP
Yokoo MJI, Albuquerque LG, Lôbo RB, Sainz RD, Carneiro JML, Bezerra AF, Araujo FRC: Estimation of genetic parameters for hip height, weight and scrotal circumference in Nelore cattle. Rev. Bras. Zootecn. 2007, 36: 1761-1768. 10.1590/S1516-35982007000800008.
Bennewitz J, Solberg T, Meuwissen T: Genomic breeding value estimation using nonparametric additive regression models. Genet Sel Evol. 2009, 41: 20-10.1186/1297-9686-41-20.
Calus MPL, Roos SPW, Veerkamp RF: Estimating genomic breeding values from the QTL-MAS workshop data using a single SNP and haplotype/IBD approach. BMC Proc. 2009, 3: S01-S10.
McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Wouter Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Stothard P, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Stephen S, Moore SS: Whole genome linkage disequilibrium maps in cattle. BMC Genet. 2007, 8: 74-
Hill WG, Robertson A: Linkage disequilibrium in finite populations. Theor Appl Genet. 1968, 38: 226-231. 10.1007/BF01245622.
Hill WG: Estimation of effective population size from data on linkage disequilibrium. Genet Res. 1981, 38: 209-216. 10.1017/S0016672300020553.
Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JN, Mott R, Flint J: Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet. 2006, 38: 879-887. 10.1038/ng1840.
Bohmanova J, Sargolzaei M, Schenkel F: Characteristics of linkage disequilibrium in North American Holsteins. BMC Genomics. 2010, 11: 421-10.1186/1471-2164-11-421.
Pritchard JK, Przeworski M: Linkage disequilibrium in humans: models and data. Am J Hum Genet. 2001, 69: 1-14. 10.1086/321275.
Marques E, Schnabel R, Stothard P, Kolbehdari D, Wang Z, Taylor JF, Moore SS: High density linkage disequilibrium maps of chromosome 14 in Holstein and Angus cattle. BMC Genet. 2008, 9: 45-
Villa-Angulo R, Matukumalli LK, Gill CA, Choi J, Van Tassell CP, John J, Grefenstette J: High-resolution haplotype block structure in the cattle genome. BMC Genet. 2009, 10: 19-
Silva CR, Neves HHR, Queiroz SA, Sena JAD, Pimentel ECG: Extent of linkage disequilibrium in Brazilian Gyr dairy cattle based on genotypes of AI sires for dense SNP markers. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production: 01–06. 2010, Leipzig, Germany: , August
Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O'Connell J, Moore SS, Smith TP, Sonstegard TS, Van Tassell CP: Development and characterization of a high density SNP genotyping assay for cattle. PLoS One. 2009, 4: e5350-10.1371/journal.pone.0005350.
Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, Green RD, Hamernik DL, Kappes SM, Lien S: Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science. 2009, 324: 528-532.
Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, Schnabel RD, Taylor JF, Raadsma HW: Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high density SNP panel. BMC Genomics. 2008, 9: 187-10.1186/1471-2164-9-187.
Arias JA, Keehan M, Fisher P, Coppieters W, Spelman R: A high density linkage map of the bovine genome. BMC Genet. 2009, 10: 18-
Sargolzaei M, Schenkel FS, Jansen GB, Schaeffer LR: Extent of linkage disequilibrium in Holstein cattle in North America. J Dairy Sci. 2008, 91: 2106-2117. 10.3168/jds.2007-0553.
Qanbari S, Pimentel ECG, Tetens J, Thaller G, Lichtner P, Sharifi AR, Simianer H: The pattern of linkage disequilibrium in German Holstein cattle. Anim Genet. 2010, 41: 346-356.
Reich DE, Lander ES: On the allelic spectrum of human disease. Trends Genet. 2001, 17: 502-10. 10.1016/S0168-9525(01)02410-6.
Yan J, Shah T, Warburton ML, Buckler ES, McMullen MD, Crouch J: Genetic characterization and linkage disequilibrium estimation of a global maize collection using SNP markers. PLoS One. 2009, 4: e8451-10.1371/journal.pone.0008451.
Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM: Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007, 17: 520-526. 10.1101/gr.6023607.
Weiss KM, Clark AG: Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 2002, 18: 19-24. 10.1016/S0168-9525(01)02550-1.
This work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo. We are indebted to the Conexão Delta G®, Paint - CRV Lagoa and Nelore Qualitas for providing the cattle tissue samples.
The authors declare that they have no competing interests.
RE and FB participated in the design of the study, performed the genome studio analysis, statistical analysis and drafted the manuscript, AAB participated in the design of the study, helped with the genome studio analysis, statistical analysis and to draft the manuscript, FRPS and DFC participated in the DNA extraction, carried out the molecular analysis and helped to draft the manuscript, DMG, RLT and RE participated in the collection and preparation of the samples, HNO participated in the design of the study and to draft the manuscript, HT helped to draft the manuscript, MS participated in the design of the study, helped with LD analysis and to draft the manuscript, FSS participated in the design of the study, helped with LD analysis and to draft the manuscript, RC participated in the design of the study, helped the genome studio analysis, statistical analysis and to draft the manuscript, JAF carried out the molecular analysis and helped to draft the manuscript, LGA conceived the study and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.