Study of whole genome linkage disequilibrium in Nellore cattle

Background Knowledge of the linkage disequilibrium (LD) between markers is important to establish the number of markers necessary for association studies and genomic selection. The objective of this study was to evaluate the extent of LD in Nellore cattle using a high density SNP panel and 795 genotyped steers. Results After data editing, 446,986 SNPs were used for the estimation of LD, comprising 2508.4 Mb of the genome. The mean distance between adjacent markers was 4.90 ± 2.89 kb. The minor allele frequency (MAF) was less than 0.20 in a considerable proportion of SNPs. The overall mean LD between marker pairs measured by r2 and |D'| was 0.17 and 0.52, respectively. The LD (r2) decreased with increasing physical distance between markers from 0.34 (1 kb) to 0.11 (100 kb). In contrast to this clear decrease of LD measured by r2, the changes in |D'| indicated a less pronounced decline of LD. Chromosomes BTA1, BTA27, BTA28 and BTA29 showed lower levels of LD at any distance between markers. Except for these four chromosomes, the level of LD (r2) was higher than 0.20 for markers separated by less than 20 kb. At distances < 3 kb, the level of LD was higher than 0.30. The LD (r2) between markers was higher when the MAF threshold was high (0.15), especially when the distance between markers was short. Conclusions The level of LD estimated for markers separated by less than 30 kb indicates that the High Density Bovine SNP BeadChip will likely be a suitable tool for prediction of genomic breeding values in Nellore cattle.


Background
Nellore is a beef cattle (Zebu) breed that originated in India. The first specimens of the breed arrived in Brazil at the end of the 18th century and Nellore animals rapidly became the predominant breed in the Brazilian herd [1]. There are about 200 million cattle heads in Brazil and most of them (about 80%) are Zebu animals and their crossbreds [2]. Over the past decades, there has been an increased interest to use genetically evaluated animals in the Zebu population. As a consequence, several genetic evaluation programs of Zebu breeds exist, particularly for Nellore cattle. The main focus of these programs is growth and conformation traits, which are used as selection criteria [3].
The breeding value of animals can be obtained from genomic data by marker-assisted selection covering the whole genome, also called genomic selection [4,5]. Genomic selection explores the linkage disequilibrium (LD) between markers, assuming that the effects of chromosome segments will be the same in the whole population since the markers are in LD with genes that are responsible for expression of the trait (quantitative trait loci, QTL). Therefore, the density of markers should be sufficiently high to guarantee that all QTL are in LD with a marker or with a marker haplotype. The LD maps are important tools for exploring the genetic basis of economically important traits in cattle. Likewise, comparison of LD maps permits to establish the diversity between cattle breeds with different biological attributes and to identify genome regions that were subject to different selection pressures [6].
The two measures most commonly used to evaluate LD between biallelic markers are r 2 and |D'| [7][8][9][10]. These parameters can vary between 0 and 1. A value of |D'| < 1 indicates the occurrence of recombination between two loci, and |D'| = 1 indicates the lack of recombination between two loci. One disadvantage of |D'| is that it tends to be strongly overestimated in small samples and in the presence of rare or low-frequency alleles. The r 2 parameter represents the correlation between two loci and is preferred in association studies since an inverse relationship exists between r 2 and the size of the sample needed for the same detection power. Linkage disequilibrium is necessary to detect associations between a QTL and a marker [11].
The LD between markers has been studied in the genome of taurine breeds. In this respect, [12] analyzing 505 SNPs located on chromosome 14 of Holstein cattle, reported moderate levels of LD (r 2 = 0.2) for markers separated by less than 100 kb. Similar results have been reported by [6] who estimated the LD (r 2 ) between 2,670 markers in eight cattle breeds. Villa-Angulo et al., 2009 [13] studied the genomes of 19 taurine and Zebu breeds using a set of 32,826 SNPs. The authors observed that Zebu breeds have a higher proportion of low-frequency alleles and a lower level of LD than taurine breeds. Recently, [14] genotyped 25 Gyr bulls using a panel of 54,000 markers (SNPs) and obtained a mean LD (r 2 ) between adjacent markers of 0.21.
The first step necessary to determine the number of markers required for QTL mapping and genomic selection is the quantification of the extent of LD in the cattle genome. Therefore, the objective of the present study was to evaluate LD in Nellore cattle using a high density SNP panel (Illumina High Density Bovine SNP BeadChip W ).

Results and discussion
The results of descriptive statistics of the SNP markers and LD (r 2 and |D'|) between synthetic adjacent markers obtained for each autosome are shown in Table 1. A total of 446,986 (57.5%) markers met the filtering criteria and were included in the final analysis. This sub-set of markers comprised 2,508.4 Mb of the genome, with a mean distance between markers of 4.90 ± 2.89 kb. The SNPs were uniformly distributed across all autosomes since the marker density was similar for all chromosomes, ranging from 4.9 to 5.2 kb ( Table 1). The autosomes differed in size, with BTA25 being the shortest chromosome (42.8 Mb) and BTA1 the longest (158.5 Mb). After filtering of the SNP data, MAF < 0.20 were observed in a considerable proportion of SNPs ( Figure 1). Similar results have been reported by [6] and [14] for Zebu breeds. However, the mean MAF obtained in the present study (0.25) was slightly higher than that reported by [15] for Nellore cattle (0.19) and by the Bovine Hapmap Consortium using the Illumina Bovine SNP50K BeadChip for Nellore cattle (0.20) [16]. According to [17], the threshold for MAF affects the distribution and extent of LD. Chromosomes BTA2, BTA4, BTA7, BTA15, BTA17, BTA25 and BTA26 presented a higher proportion of minor alleles (MAF < 0.10), whereas chromosomes BTA6, BTA8, BTA16, BTA22 and BTA23 presented a lower proportion of minor alleles (MAF < 0.10).
All possible SNP pairs on the same chromosome separated by ≤ 100 kb produced 9,254,142 combinations of SNP pairs to estimate LD across the 29 autosomes. The overall mean LD between marker pairs measured by r 2 and |D'| was 0.17 and 0.52, respectively. Silva et al., 2010 [14] genotyped 25 Gyr sires using a panel of 54,000 markers (SNPs) and obtained a mean LD between adjacent markers measured by r 2 and |D'| of 0.21 and 0.68, respectively. The present results and those reported in previous studies confirm that the |D'| parameter overestimates LD, especially in cases of low MAF.
The mean LD between adjacent SNPs across autosomes ranged from 0.003 to 0.21 for r 2 and from 0.12 to 0.59 for |D'| (Table 1). Silva et al., 2010 [14] reported slightly higher values for Gyr cattle, ranging from 0.17 to 0.24 for r 2 and from 0.60 to 0.72 for |D'|, respectively. Lower levels of LD (r 2 < 0.16) were estimated for chromosomes BTA1, BTA27, BTA28 and BTA29. This relatively low level of LD obtained for these chromosomes is in contrast to findings previously published for Zebu breeds [6,14]. According to [18], there is a wide variation in autosomal recombination rates, a fact, among others, that leads to marked diversity in the pattern of LD in different genomic regions. However, the results obtained in this study for BTA1, BTA27, BTA28 and BTA29 can probably be attributed to a sampling variation since the number of markers, marker density, mean MAF or proportion of MAF did not differ from the other autosomes studied.
To analyze the decline in LD according to physical distance between markers, synthetic SNP pairs were classified into intervals (bins) based on the distance between markers and mean values of r 2 and |D'| were estimated for each bin per autosome (Figures 2 and 3) and for the whole genome ( Table 2). The LD decreased with increasing physical distance between markers ( Table 2). In contrast to this clear decrease of LD measured by r 2 , the changes in |D'| indicated a less pronounced decline of LD (Figures 2 and 3). Moderate levels of r 2 (0.20 to 0.34) were observed at distances < 30 kb. When the distance between markers increased from 30 to 100 kb, the mean r 2 value decreased from 0.20 to 0.11. A high variability in r 2 estimates was observed for marker distances of more than 10 kb. Markers showing LD (r 2 ) higher than 0.30 and 0.15 had an average spacing of 38.9 and 41.8 kb, respectively. However, not all markers with a spacing of 40 to 50 kb presented an r 2 value higher than 0.3. For distances of less than 40 kb, the proportion of markers with an r 2 > 0.15 and > 0.30 ranged from 35 to 57% and from 21 to 42%, respectively. This proportion was lower than that reported by [19] (68.34%) for markers spacing from 0 to 0.1 Mb, who genotyped 821 sires using 5,564 SNPs and the same threshold (0.30) for LD (r 2 ). Recently, [20] genotyped 810 Holstein cattle using the Illumina Bovine SNP50K panel and found that, for SNPs separated by less than 100 kb, the proportion of those in LD (r 2 ) > 0.25 was 29%.
Except for autosomes BTA1, BTA27, BTA28 and BTA29, the level of LD (r 2 ) was higher than 0.20 for markers separated by less than 20 kb, and higher than 0.30 for markers separated by less than 3 kb. For marker distances higher than 100 kb, the level of LD (r 2 ) decreased from 0.11 (100 In the present study, certain autosomes presented higher LD than others. In addition, when autosomes with low levels of LD (r 2 < 0.17) were excluded (BTA1, BTA27, BTA28 and BTA29), a linear relationship was observed between chromosome length and LD (r 2 ), i.e., the level of LD increased with increasing chromosome size. According to [18], recombination rates decrease as the length of the chromosome increases. In a recent study, [10] found no association between chromosome size and level of LD. However, these authors used a Bos taurus cattle population and a much lower marker density.
The use of SNP pairs with low allele frequencies tends to underestimate LD. Polymorphisms with high allele frequencies are thus preferred for a less biased estimation of LD [21]. We therefore analyzed the effect of MAF on the estimates of |D'| and r 2 (Figures 4 and 5). The LD (r 2 ) between markers was higher when the MAF threshold was high (0.15), particularly when the distance between markers was short ( Figure 4). Yan et al., 2009 [22], genotyping 632 maize lines using 1,229 SNP markers, showed that the LD (r 2 ) between markers increased with increasing MAF threshold, especially in the case of very close SNP pairs (0-10 kb). For adjacent markers (< 10 kb), the |D'| remained unchanged for different MAF thresholds ( Figure 5). For more distant markers, the |D'| was lower as the MAF threshold increased. According to [10], the LD measured by |D'| is underestimated as the MAF threshold increases (above 0.25). When LD is determined by |D'|, the denominator in the formula is the product between allele frequencies. Thus, in the case of SNP pairs with low allele frequencies, D' will be divided by a small number, resulting in a large value for |D'| [21]. The results of the present study indicate a considerable variation in the magnitude and pattern of LD in the Nellore genome. As a consequence, two markers that are very close may show a low level of LD, whereas more distant markers may show a higher level of LD than expected. This variation is probably due to different recombination rates between and within chromosomes, heterozygosity, genetic drift, and effects of selection [21].
The level of LD between adjacent markers (distance of less than 30-40 kb) observed in the present study was lower than that reported in other studies on Bos taurus cattle and similar to that found in studies using Bos indicus. The differences between taurine and indicine breeds decrease for markers separated by 80 to 100 kb. However, it is generally difficult to compare the level of LD obtained in different studies because of differences in sample size, measures of LD, type of markers and marker density, as well as because of the recent history of the population [11]. Nevertheless, differences between indicine and taurine cattle that occurred during the historical process of domestication and selection and as a consequence of the effective size of populations seem to explain the discrepancy in LD at short distances between markers [23]. Another reason is the fact that Bos indicus populations  present a higher proportion of low-frequency alleles in the HD SNP chip than Bos taurus populations which, in turn, influences LD estimates [6,24].

Conclusions
The level of LD estimated for markers separated by less than 30 kb indicates that the High Density Bovine SNP BeadChip will likely be a suitable tool for prediction of genomic breeding values in Nellore cattle. Further studies investigating the magnitude of LD in a larger sample of animals from this population are needed to confirm the estimates obtained here. For DNA extraction, about 5 g of longissimus dorsi muscle sample was removed and stored in a 2 ml Eppendorf tube. The tubes were identified with the identification of each animal and then stored in styrofoam boxes in a freezer at −20°C. Next, 25 to 30 mg of muscle tissue specimens were weighed on an aluminum sheet using an analytical balance and transferred to Eppendorf tubes (1.5 to 2 ml). DNA was extracted from the muscle samples using the DNeasy Blood & Tissue Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's instructions.

Methods
The LD between two SNPs was evaluated using r 2 and the absolute value of D'. The r 2 was calculated as follows:  If the two loci are independent, the expected frequency of haplotype AB (freq. AB) is calculated as the product between freq. A and freq. B. A freq. AB higher or lower than the expected value indicates that these two loci in particular tend to segregate together and are in LD. The measures of LD (r 2 and |D'|) were calculated for all marker pairs of each chromosome using the SnppldHD software (Sargolzaei, M., University of Guelph, Canada). Only maternal haplotypes were considered for the estimation of LD measures (r 2 and |D'|). The exclusive use of maternal haplotypes is a common practice in studies estimating LD when the population consists of half-sib families, as was the case here. The reason is that the pedigree structure leads to the over-representation of paternal haplotypes in the sample since sires have multiple progenies in the dataset, which might increase the frequency of certain haplotypes and consequently overestimate LD [21].

Competing interests
The authors declare that they have no competing interests.
Authors' contributions RE and FB participated in the design of the study, performed the genome studio analysis, statistical analysis and drafted the manuscript, AAB participated in the design of the study, helped with the genome studio analysis, statistical analysis and to draft the manuscript, FRPS and DFC participated in the DNA extraction, carried out the molecular analysis and helped to draft the manuscript, DMG, RLT and RE participated in the collection and preparation of the samples, HNO participated in the design of the study and to draft the manuscript, HT helped to draft the manuscript, MS participated in the design of the study, helped with LD analysis and to draft the manuscript, FSS participated in the design of the study, helped with LD analysis and to draft the manuscript, RC participated in the design of the study, helped the genome studio analysis, statistical analysis and to draft the manuscript, JAF carried out the molecular analysis and helped to draft the manuscript, LGA conceived the study and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.