Genome wide association studies for body conformation traits in the Chinese Holstein cattle population

Background Genome-wide association study (GWAS) is a powerful tool for revealing the genetic basis of quantitative traits. However, studies using GWAS for conformation traits of cattle is comparatively less. This study aims to use GWAS to find the candidates genes for body conformation traits. Results The Illumina BovineSNP50 BeadChip was used to identify single nucleotide polymorphisms (SNPs) that are associated with body conformation traits. A least absolute shrinkage and selection operator (LASSO) was applied to detect multiple SNPs simultaneously for 29 body conformation traits with 1,314 Chinese Holstein cattle and 52,166 SNPs. Totally, 59 genome-wide significant SNPs associated with 26 conformation traits were detected by genome-wide association analysis; five SNPs were within previously reported QTL regions (Animal Quantitative Trait Loci (QTL) database) and 11 were very close to the reported SNPs. Twenty-two SNPs were located within annotated gene regions, while the remainder were 0.6–826 kb away from known genes. Some of the genes had clear biological functions related to conformation traits. By combining information about the previously reported QTL regions and the biological functions of the genes, we identified DARC, GAS1, MTPN, HTR2A, ZNF521, PDIA6, and TMEM130 as the most promising candidate genes for capacity and body depth, chest width, foot angle, angularity, rear leg side view, teat length, and animal size traits, respectively. We also found four SNPs that affected four pairs of traits, and the genetic correlation between each pair of traits ranged from 0.35 to 0.86, suggesting that these SNPs may have a pleiotropic effect on each pair of traits. Conclusions A total of 59 significant SNPs associated with 26 conformation traits were identified in the Chinese Holstein population. Six promising candidate genes were suggested, and four SNPs showed genetic correlation for four pairs of traits.


Background
Since the 1990s, body conformation traits have been used in dairy cattle breeding programs in many countries. Although these traits themselves are not of economic interest to breeders, they are closely related to many economic traits, such as the health, productivity, and lifetime of cattle. Vollema et al. [1] reported that some conformation traits such as body depth, rump angle, rump width, and udder depth were useful predictors of lifetime and longevity in Dutch dairy bull populations because of the genetic correlation between them. Lund et al. [2] showed that genetic correlations between health and type traits were generally moderate (−0.32 to 0.37) and that selection for improved udder conformation reduced the somatic cell count and the occurrence of clinical mastitis. Short and Lawlor [3] found that genetic correlations between linear type traits and first lactation yield ranged from 0.48 to 0.54. Pozveh et al. reported that body depth had genetic correlations with many other economic traits, such as the days from calving to first-insemination (0.79), calving interval (0. 35), and gestation length (0.34). Stature was also genetically correlated with gestation length (0.49) [4]. Therefore, quantitative trait loci (QTLs) associated with body conformation traits are economically as important as other economic traits.
With the availability of a high-density chip with single nucleotide polymorphisms (SNPs) for bovine, genomewide association study (GWAS) has become a useful tool for fine-scale QTL mapping. This approach has been widely applied to causative mutation detection in human [5,6], mouse [7] and cattle [8,9]. By using very large numbers of SNPs researcher can easily detect statistical associations between SNPs and phenotypes, and thus biologically meaningful candidate genes close to the significant SNPs are identified for further study. This procedure greatly narrows down the regions of the genome that contain the causative mutations. The associations can provide direct and necessary evidence for the function of a gene.
In this research, we performed a genome wide association study for 29 conformation traits in a Chinese Holstein population, which included 1314 Chinese Holstein cattle and 52,166 SNPs. A LASSO-like multiple-SNP method was applied to identify multiple SNPs simultaneously. The genes closest to the significant SNPs (within a 1 Mb region) were annotated.

Methods
Blood samples were collected from Chinese Holstein cattle when the regular quarantine inspection of the farms was conducted. The procedure for collecting the blood samples was carried out in strict accordance with the protocol approved by the Animal Welfare Committee of China Agricultural University (Permit Number: DK996).

Phenotype and genotype data
The Chinese Holstein population in this study comprised 1314 Chinese Holstein cows, the daughters of 22 sires. All the cows were from 22 dairy cattle farms in the Beijing Dairy Cattle Center and the Beijing Sanyuan Lvhe Dairy Farming Center where regular and standard performance tests, including measurement of conformation traits, have been carried out since 1999 as part of the Dairy Herd Improvement (DHI) system. According to the linear classification system defined by Dairy Data Center of China, Dairy Association of China (DAC) [25], 21 linear type traits were scored from 1 to 9, and eight composite traits were measured using an index with values and scored from 0 to 100. The 21 traits were animal size, stature, height at front end, chest width, body depth, loin strength, rump width, rump angle, bone quality, foot angle, rear legs side view, udder depth, udder texture, median suspensory, fore udder attachment, front teat placement, teat length, rear attachment height, rear attachment width, rear teat placement, and angularity. The eight function score traits were conformation (final score), dairy character, capacity, rump, feet and legs, fore udder, rear udder, and mammary system. Calculation of the scores for the eight composite traits was based on linear score, weights, and defective traits. The phenotypic values of the 21 conformation traits in the first lactation of the cows were measured by the Beijing Dairy Cattle Center [26] and then the genetic parameters of all 29 traits were estimated [27]. The estimated breeding values (EBVs) were calculated with a multiple-trait random regression test-day model using the RunGE software purchased from Canadian Dairy Network [28] by the Dairy Data Center of China. The descriptive statistics of the EBVs for the 29 traits in the 1314 individuals are listed in Table 1. The genetic correlations between each pair of traits were also calculated (see Additional file 1 for details). To conveniently generalize the results, the correlation coefficients (a total of 29*28/2 = 406 pair-wise values) were classified into five levels, -1 to −0.66, -0.66 to −0.33, -0.33 to 0.33, 0.33 to 0.66, and 0.66 to 1, and were defined as high-level negative correlation, medium-level negative correlation, weak-level correlation, medium-level positive correlation, high-level positive correlation, respectively ( Table 2). The results show that 15 of 406 pairs of traits (7.2%) have highlevel positive correlations, while most of the pairs have weak correlations (75.2%).
The animals were genotyped using the Illumina Bovi-neSNP50 BeadChip (Illumina Inc., San Diego, CA, USA). Some individuals were genotyped using the Illumina 54 K chip version1 containing 54,001 SNPs, while others were genotyped using the 54 K chip version 2 containing 54,609 SNPs. Genotype imputation was conducted for all the genotyped individuals using the Beagle software, version 3.1.0 [29,30]. After imputation, there were 56,270 SNPs in the marker data. SNPs were excluded from the analysis if the minor allele frequency (MAF) was less than 1%, the call rate was less than 90%, or the genotype frequency deviated from Hardy-Weinberg Equilibrium (HWE) with a P-value lower than 10 -6 . After quality control, 1314 individuals with 52,166 SNPs remained for the GWAS. After editing, the average distance between adjacent SNPs on the genome was 59.59 kb, and the median distance was 49.00 kb. Finally, the association analysis was conducted between each trait and 52,166 SNPs on 29 autosomes and X chromosome in the bovine genome.

Statistical analyses
Statistical tests of SNP effects were conducted using the expectation maximization algorithm based on an improved  least absolute shrinkage and selection operator (LASSO) [31] method. This method simultaneously estimates multiple SNP effects and shrinks the effects of zero-effect SNPs towards zero, and thus avoids complex model selection (Fang et al. 2013, unpublished).
The GWAS was carried out in two steps. First, single trait mixed model analysis (SMMA) was applied to estimate the effect of each SNP. Then, the first 500 markers (why 500 markers were used is explained in the Discussion section) with the lowest P-values were selected for the multiple-SNP analysis.
The linear model that was used to estimate the effect of the jth SNP can be expressed as: where y is the vector of EBVs, 1 is the vector with its elements of 1, μ is the population mean; x j is the vector of the genotype of the jth SNP marker, which is assigned to −1, 0, and 1 for genotypes AA, AB and BB, respectively, and β j is the SNP effect; g is the vector of polygenic effects, and Z is the design matrix related to the polygenic effect; e is the vector of random residuals. It is assumed that geN 0; Aσ 2 g and eeN 0; Iσ 2 e À Á , where A is the additive genetic relationship matrix based on pedigree, σ 2 g is the variance of polygenic effect, I is an identity matrix, and σ 2 e is the residual variance. When a single-SNP mixed model was applied, the computational time was extremely large because of the iterative calculation of variance components (Best Linear Unbiased Prediction). Therefore, we first approximately calculated the variance components without considering a QTL effect and then imposed the estimates of variance components on the mixed model equation, which does not need an iterative calculation and thus reduces computational time.
For the SMMA analysis, the significance of the SNP effect was tested using a t-test with null hypothesis of β= 0, and the Bonferroni correction was applied to control the false positives. So, the threshold for significant associations was -log 10 (0.05/N), where N is the number of SNP loci tested in the analysis.
The model to estimated effects of the SNPs selected from the first step can be expressed as: where X is the matrix of genotype covariables of the 500 SNPs, and β is the vector of SNP effects. An expectation-maximization algorithm was adopted to estimate the model parameters. The method assigned an [32], where the hyper-parameter λ 2 j =2 is assigned a conjugate gamma prior with gamma (a,b), where a and b are very small values, and both a and b are taken as 10 -6 . The prior of the residual polygenic effect follows the normal distribution g σ 2 g eN 0; Aσ 2 g , where σ 2 g is the residual polygenic variance and A is the additive genetic relationship matrix. The expectation-maximization algorithm was applied to estimate SNP effects β j by finding the maximum posterior mode, which treats the polygenic effect (g) as a missing variable (see Additional file 2 for details).
The threshold value for declaring the significance of the SNP was determined from the empirical distribution of the heritability of SNP j (the SNP with the largest heritability across the genome for each permutation), , derived from 1,000 permutations, j is the variance of the jth SNP, and p j is the allele frequency of the SNP. Here, heritability was used to measure the strength of each SNP, which is fairer than using the SNP effect, because the allele frequency of each SNP is different.

Identification of SNP locations and gene annotation
The locations of significant SNPs were reported based on the UMD3.1 assembly of bovine genome sequence (assembled by the Center for Bioinformatics and Computational Biology (CBCB) at University of Maryland). The genes that were closest to the significant SNPs (within 1 Mb) were determined by the National Animal Genome Research Program [33] and the National Center for Biotechnology Information [34]).

Results
A total of 59 genome-wise significant SNPs associated with 26 out of the 29 conformation traits were found by our improved LASSO method. Twenty-two of the SNPs were located within 22 known genes regions. We identified the 26 conformation traits into six trait group, and investigated the significant SNPs associated with each of these traits as described below.

Dairy character traits
Three and two SNPs were associated with dairy character and angularity respectively (Table 3). Among them, dairy character and angularity shared one common SNP, which was located 45 kb away from SLC25A24 on Bos taurus chromosome 3 (BTA3). For dairy character, one SNP was located within SCEL on BTA12 and the other SNP was 14 kb away from SPATA17. For angularity, the other SNP was 261 kb away from HTR2A.

Capacity traits
For body depth, height at front end, and animal size, each trait was associated with one significant SNP; for stature and loin strength, each trait was associated with two SNPs; and for chest width and capacity, each trait was associated with five SNPs (Table 4). Among them, the SNP on BTA3 was 7 kb away from DARC and was associated with both body depth and capacity; and the SNP on BTA25 was 9 kb away from TMEM130, and was associated with both body depth and animal size. The SNPs at 39 Mb on BTA9, 115 Mb on BTA6, 35 Mb on BTA15, 53 Mb on BTA12, and 10 Mb on BTA 18 were associated with capacity, stature, loin strength, height at front end, and chest width, respectively, and all of them were located in regions of the chromosomes that contained known genes. The remaining SNPs were at distances of 3 kb to 19 kb from the nearest known genes.

Rump traits
Eleven significant SNPs on different chromosomes were associated with rump traits (Table 5). Two and three of these SNPs were associated with rump and rump angle, respectively, and all of them were located within regions of the chromosomes that contained known genes. The remaining significant SNPs were at distances of 48 kb to 826 kb from the nearest known genes.

Feet and legs traits
Twelve significant SNPs were detected for feet and legs traits ( Table 6). Three of these SNPs, for feet and legs, foot angle, and rear leg side view, were located within DHX35 on BTA13, PLEKHB2 on BTA2, and DOCK10 on BTA2, respectively. Two SNPs on BTA3 and BTA27 for feet and legs, two SNPs on BTA1 and BTA15 for bone quality, three SNPs on BTA3, BTA4, and BTA22 for foot angle, two SNPs on BTA14 and BTA 24 for rear leg side view were located at distances of 3 kb to 420 kb from the nearest known genes.

Mammary system traits
A total of 17 significant SNPs were detected for mammary system traits (Table 7). Of these SNPs, one associated with rear udder was located within LOC100337279 on BTA14; two associated with udder texture were within LOC100295233 and DRG1 on BTA3 and BTA7, respectively; two associated with median suspensory fell were within LRP2 and MACROD2 on BTA2 and BTA13, respectively; one associated with fore teat placement was located within SLC39A11 on BTA19; and one associated Note: Heritability and threshold were obtained using the LASSO method; -log 10 (P) was calculated using SMMA. a SNP detected by SMMA only; b -log 10 (P) obtained from SMMA; c threshold of SMMA. Nearest gene are symbols of gene full name in the NCBI database (http://www.ncbi.nlm.nih.gov/). Note: Heritability and threshold were obtained using the LASSO method; -log 10 (P) was calculated using SMMA. a SNP detected by SMMA only; b -log 10 (P) obtained from SMMA; c threshold of SMMA. Nearest gene are symbols of gene full name in the NCBI database (http://www.ncbi.nlm.nih.gov/).
with rear teat placement was located within SH3RF3 on BTA11. The other 10 SNPs were located at distances of 960 bp to 448 kb from the nearest known genes.

Final conformation score
A SNPs on BTA5 (Table 3) was found to be associated with final conformation score, and this SNP was harbored within ANKRD54, which encodes an ankyrin repeat domain-containing protein.
The estimated heritability for 29 conformation traits obtained using improved LASSO was plotted and the figures are available in Additional file 3.
The results obtained using SMMA are also listed in Tables 6, 7 and 8. Only 11 significant SNPs were detected and eight of them were significantly associated with rump width. The other three SNPs were associated with rear legs side view, median suspensory, and feet and legs.
When we compared our results with those of Cole et al. [8] and Bolormaa et al. [21], we found that none of our  significant SNPs were the same as the SNPs reported by Cole et al. [8] or Bolormaa et al. [21]. However, some of our SNPs were close to the significant SNPs reported by Cole et al. [8] that were associated with different traits (Table 8).

Discussion
In this study, we performed a GWAS for 29 conformation traits in a population of Chinese Holstein cows. A two-step strategy was applied to estimate SNP effect, and first we selected 500 SNPs using SMMA. We originally planned to select SNPs with P-values < 0.01 (−log 10 (P) > 2), and we found that about 500 SNPs met this condition for the 29 traits (the -log 10 (P) values at the 500th marker were sorted into descending order for the 29 traits and ranged from 2.089 to 2.421). Therefore, we decided to use the top 500 SNPs for the multiple QTL analysis. In other words, the selected 500 SNPs include nearly all the SNPs with P-values < 0.01. We found five SNPs located within previously reported QTL regions that were associated with conformationrelated traits. The SNP on BTA12 associated with angularity is 261 kb away from HTR2A and is located within a QTL region that has been reported by Schrooten et al. [22] to be associated with angularity. The SNP on BTA29 associated with stature is 81 kb away from LOC782090 and is within a large QTL region that has been found to significantly affect Angus body height at maturity [35]. The SNP on BTA24 associated with rear leg side view is near ZNF521 and is within a QTL region that has been reported to have a significant effect on dairy cattle rear leg set [22]. The SNP on BTA10 associated with teat length is near PDIA6 and is located within a QTL region that has been shown to have a significant effect on teat length [36]. And, the SNP on BTA25 associated with animal size is near TMEM130 and is within a QTL region that has been reported to affecting calf size in Danish Holstein cattle [37]. Besides, most of significant SNPs that we detected in this study are located within QTL regions that have been reported previously to affect production, longevity, and reproduction traits in dairy cattle [21,35,36,38,39].
We also found several SNPs located within genes that are known to have functions related to the development and metabolism of animal tissues. The SNP (Hap-map40339-BTA-117016; Table 4) on BTA3 which was associated with both capacity and body depth is 7 kb away from the gene, Duffy blood group, chemokine receptor (DARC). Hai et al. [40] performed a bivariate GWAS in human to identify the SNPs associated with lean body mass and age at menarche and reported that DARC may play an important role in regulating the metabolisms of both these traits. The SNP (BTA-110160no-rs; Table 4) on BTA8 associated with chest width is 121 kb away from the growth arrest specific 1 (GAS1) gene. GAS1 is highly expressed in quiescent mammalian cells and its over-expression in normal and some cancer cell lines was reported to inhibit G0/G1 transition [41]. It was found that GAS1 was expressed by chondrocytes after the cartilage started to differentiate [41]. The SNP on BTA4 associated with foot angle is 37 kb away from the myotrophin (MTPN) gene, which plays an important role in cell and skeletal muscle growth [42]. These genes are suggested as functional candidate genes for body conformation traits.
Generally, different SNPs are associated with different traits, but some SNPs have been found to affect more than one trait. In our study, SNP Hapmap40339-BTA-117016 (Table 4) was associated with both capacity and body depth, SNP ARS-BFGL-NGS-115067 (Table 4) was associated with both capacity and animal size, SNP ARS-BFGL-NGS-14022 (Table 3) was associated with both dairy character and angularity, and SNP BTB-01238380 (Tables 3 and 4) was associated with both dairy character and height at front end. The genetic correlation between each of these pairs of genes was 0.51, 0.77, 0.86, and 0.35, which suggested that these four SNPs likely contribute to genetic correlation and perhaps have a pleiotropic effect on each pair of traits.