Genome wide association study and genomic prediction for fatty acid composition in Chinese Simmental beef cattle using high density SNP array

Background Fatty acid composition of muscle is an important trait contributing to meat quality. Recently, genome-wide association study (GWAS) has been extensively used to explore the molecular mechanism underlying important traits in cattle. In this study, we performed GWAS using high density SNP array to analyze the association between SNPs and fatty acids and evaluated the accuracy of genomic prediction for fatty acids in Chinese Simmental cattle. Results Using the BayesB method, we identified 35 and 7 regions in Chinese Simmental cattle that displayed significant associations with individual fatty acids and fatty acid groups, respectively. We further obtained several candidate genes which may be involved in fatty acid biosynthesis including elongation of very long chain fatty acids protein 5 (ELOVL5), fatty acid synthase (FASN), caspase 2 (CASP2) and thyroglobulin (TG). Specifically, we obtained strong evidence of association signals for one SNP located at 51.3 Mb for FASN using Genome-wide Rapid Association Mixed Model and Regression-Genomic Control (GRAMMAR-GC) approaches. Also, region-based association test identified multiple SNPs within FASN and ELOVL5 for C14:0. In addition, our result revealed that the effectiveness of genomic prediction for fatty acid composition using BayesB was slightly superior over GBLUP in Chinese Simmental cattle. Conclusions We identified several significantly associated regions and loci which can be considered as potential candidate markers for genomics-assisted breeding programs. Using multiple methods, our results revealed that FASN and ELOVL5 are associated with fatty acids with strong evidence. Our finding also suggested that it is feasible to perform genomic selection for fatty acids in Chinese Simmental cattle. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3847-7) contains supplementary material, which is available to authorized users.


Background
Fatty acids are required by daily normal metabolism, and can be obtained from food and meat. Improving nutritional value of meat products for human health has attracted extensive attention in current society [1,2]. Fat content and fatty acid composition in beef products are associated with meat taste and flavor, and these are considered as main sensory properties in consumer's selection and acceptance [3].
Fatty acids are important indicators of beef meat quality, and previous studies have been conducted to examine fatty acids for various cattle breeds in different feeding environments [4]. Fatty acid composition are often lowly or moderately heritable traits in various populations with different genetic architecture [5]. Several studies have revealed that the level of heritability and genetic correlation theoretically allow for genetic improvement of fatty acid composition by selection of both major candidate genes and genomic selection strategies [6][7][8][9][10]. Therefore, application of molecular genetics approaches can provide an opportunity for genetic improvement for fatty acid composition of beef cattle [11][12][13].
During the last decades, tremendous works have been done to elucidate the genetic mechanism of fatty acids using candidate gene [14][15][16][17][18][19][20][21] and linkage mapping approaches [22][23][24]. In recent years, genome-wide association studies (GWAS) have been widely used to study the molecular mechanism underlying important traits in beef and dairy cattle [25][26][27][28][29]. Previous GWAS and genomic predictions have identified candidate markers associated with various fatty acid composition and evaluated the accuracy of genomic prediction for these traits [8,[30][31][32][33][34]. However, many studies were conducted in populations with relatively low density SNP arrays. Despite recent GWAS for fatty acid composition have been investigated using the high density BovineHD (770 K) SNP array, those studies were mostly limited in Nellore cattle [35,36]. On the other hand, extensive attention has been paid to investigate the accuracies of genomic prediction using multiple methods in different populations [32,33]. A recent study in Nellore cattle indicated that the accuracies of genomic prediction were moderate to high and it was feasible to apply genomic selection in cattle. However, their results were limited to carcass traits in Nellore population [37]. Therefore, understanding the molecular mechanisms underlying fatty acid composition and evaluating the accuracy of genomic predictions in other important cattle breeds still need further investigation.
The objectives of the current study were to explore the associated genomic regions and estimate the predictive accuracies for fatty acid composition using the BovineHD SNP array in Chinese Simmental population. In this study, we identified several potential candidate markers and genes associated with fatty acid composition. Our findings will facilitate the elucidation of the molecular mechanism and help us design optimal genomic selection strategies for fatty acid composition in cattle and other farm animals.

Bayesian based GWAS and candidate regions
We performed GWAS using the BayesB method for 11 individual fatty acids that showed estimated genomic heritability ≥ 0.10, including 3 saturated fatty acids (C14:0, C18:0 and C20:0), 3 monounsaturated fatty acids (C14:1 cis-9, C16:1 cis-9 and C18:1 cis-9) and 5 polyunsaturated fatty acids (C18:2 n-6, C18:3 n-3, C20:3 n-3, C20:5 n-3 and C22:5 n-3). To identify potential regions associated with fatty acids, we divided the genome into 100 kb windows, leading to 24,900 regions across the genome. Regions that explain more than 1% of additive genetic variances were considered as candidates and subject to further analyses to identify the associated genes within these regions. Summary statistics including genetic variances explained, position for the 100 kb windows, flanking rs number ID for these regions, and candidate genes of saturated fatty acid, monounsaturated fatty acid and polyunsaturated fatty acids were represented in Tables 2 and 3, respectively.

Saturated fatty acids
We detected a total of 16 candidate regions that explain more than 1% of genetic variance for saturated fatty acids. These regions were distributed on BTA2, BTA4, BTA6, BTA7, BTA12, BTA14, BTA15, BTA19, BTA22, BTA23 and BTA25 (Table 2). Among them, we found four, three and nine candidate regions for C14:0, C18:0 and C20:0, respectively (Additional file 2). Intriguingly, the detected window with the largest genetic variance (10.04%) near 51.3 Mb on BTA19 for C14:0, containing gene fatty acid synthase (FASN) that is related to fatty acid synthesis. We also found one region explaining about 1.46% of genetic variance for C14:0 and located at 25.1 Mb on BTA23. This region overlapped with the The concentrations of fatty acids were expressed as a percentage of total fatty acid methyl esters quantified elongation of very long chain fatty acids protein 5 (ELOVL5) whose function involves in fatty acid elongase activity (Fig. 2). In addition, we identified 10 candidate genes that are likely to be related to fatty acids composition embedded in these candidate regions ( Table 2).

Monounsaturated fatty acids
For monounsaturated fatty acids, we detected 8 candidate regions, each of which captured more than 1% of the total genetic variance across six chromosomes (Additional file 3). Notably, we detected the same region   Fig. 3), and this region was also associated with C14:0. The top associated region was detected near 30.3 Mb on BTA14, which explains 19.44% of genetic variance for C18:1 cis-9 ( Fig. 4), while no known gene was observed near this region. Overall, there are three, two and three regions identified to be associated with C14:1 cis-9, C16:1 cis-9 and C18:1 cis-9, respectively (Table 3).

Fatty acid groups
To systematically explore the genetic mechanism underlying fatty acid composition beyond individual fatty acid, we also conducted GWAS using BayesB method on eight fatty acid groups, including SFA, MUFA, PUFA, PUFA/SFA, n-3, n-6, n-6/n-3, and HI (Additional file 5). We found three associated regions for HI located at BTA4, BTA10 and BTA15, two regions for MUFA at BTA12 and BTA14, one for n-3 at BTA4 and one for n-6/n-3 at BTA20 (Table 4). Intriguingly, we observed two regions accounting for 10% of the genetic variance for MUFA, while two regions located at BTA4 and BTA20 accounting for 1.25% and 4.66% of genetic variance for n-3 and n-6/n-3, respectively. One region located at BTA12 overlapped with two genes claudin 10 (CLDN10) and DAZ interacting zinc finger protein 1 (DZIP1). For health index (HI), the genetic variances contributed by three identified regions were 1.52%, 1.19% and 2.03%, which located at BTA4, BTA10 and BTA15, respectively. These regions overlapped with sarcoglycan epsilon (SGCE), paternally expressed 10 (PEG10) and DEAD-box helicase 10 (DDX10).

Region-based association test and LD analyses
To explore potential associated loci which might fail to be identified due to the strict threshold for high density SNPs, we investigated two 100 kb associated regions on BTA19 and BTA23 (BTA19:51.3-51.4 Mb and BTA23: 25.1-25.2 Mb) using region-based association tests implemented in R package FREGAT. The two regions contain two candidate genes FASN and EVOL5 involved in fatty acid synthesis. For the region at 51.3 Mb on BTA19, we found 19 SNPs showing significant association with C14:0 (P < 0.01), and among them, five SNPs were identified within FASN, and one SNP near FASN with the strongest association signal (P = 5.17E-10). We found that the P value for region-based test for FASN was 0.0048, which indicated that FASN may be considered as a candidate gene for C14:0. Moreover, the LD and haploblock analyses revealed that this region showed high LD level with multiple haploblocks, which may imply a potential selection signature involved in fatty acids within this region in Chinese Simmental cattle population (Fig. 5a). For region at 25.1 Mb on BTA23, two SNPs were detected with P < 0.01, and the top SNP (BovineHD2300006955) was detected at 25.1 Mb showing significant association (P = 3.1E-5). This region partly overlapped with gene ELVOL5. Therefore, we next examined the 500 kb upstream and downstream of the region. However, no other SNPs were found which were significantly associated with C14:0 (Fig. 5b).

Discussion
Fatty acid is an important indicator of meat quality and taste, and its strongly influences consumer's preferences [3,38]. Previous genome-wide association studies have been conducted for fatty acid composition in multiple cattle breeds, including Angus, Japanese Black, Nellore and other crossbreds [8,30,32,35,36]. To our knowledge, this study is the first attempt to investigate molecular mechanism underling fatty acid composition

Genomic wide scan identified candidate regions and loci
We investigated 21 individual fatty acids including 6 saturated fatty acids, 4 monounsaturated fatty acids, 11 polyunsaturated fatty acids and 8 fatty acid groups. Despite the fact that single-SNP based GWAS methods have been widely used in many studies, these methods may not be powerful for studying the complex traits with low or moderate heritability. Due to the polygenic characteristics of fatty acid composition in cattle, GWAS using the Bayesian methods have enabled to identify many associated loci that have missed by the single-SNP regression approach [33,35,36]. BayesB has been widely used for GWAS of complex trait in farm animals [39][40][41][42]. In this study, we utilized BayesB and GRAMMAR-GC to identify candidate regions associated with fatty acids in Chinese Simmental cattle population. We detected a total of 35 candidate regions on 16 autosomes associated with fatty acid composition using BayesB. However, these identified regions may not include all potentially significant associated SNPs due to use of 100 kb window-based strategy in BayesB. Therefore, we conducted GWAS using the single locus GRAMMAR-GC method implemented in GenABLE package. With this approach, we detected a total of 44 and 8 significant associated SNPs for individual fatty acids and fatty acid groups using a suggestive adjust threshold. The suggested threshold was set to avoid overestimation of the significant SNPs caused by high LD level in the high density SNPs array [43]. In current study, we found a total of 24 candidate SNPs overlapping with these regions identified by BayesB. For instance, the same candidate peaks for C14:1, C14:1 cis-9 and C18:1 cis-9 were identified using both methods (Figs. 2-4). Utilization of multiple complementary methods is an effective way to detect candidate regions or SNPs and helps elucidate genetic architecture of complex traits in farm animals [33,44].

Candidate genes for fatty acid composition
Several genes were identified as potential candidates contributing to the genetic architecture of fatty acids in this study. Among them, we observed FASN at 51 Mb on BTA19 overlapped with a 100 kb associated region, which explains 10% and 6.5% of the genetic variances for C14:0 and C14:1 cis-9, respectively. Notably, we also found multiple significant SNPs around this gene that were associated with C14:0 using the GRAMMAR-GC method. A region-based association test revealed strong evidence of association for multiple SNPs within this gene. Furthermore, we found strong LD at the upstream of FASN (Fig. 5). This gene encodes a multifunctional protein enzyme to catalyze the synthesis of palmitate (C16:0) from acetyl-CoA and malonyl-CoA, in the presence of NADPH. Previous studies based on candidate gene approaches had revealed polymorphisms within FASN that were related to fatty acid composition in multiple beef cattle populations [45][46][47][48][49] and milk fat content in dairy cattle [50,51]. For instance, several studies were conducted to explore and evaluate the association between fatty acid composition and candidate SNPs using GWAS in Japanese beef cattle [8,30,52]. These studies provided multiple evidences that several SNPs near or within FASN may be regarded as responsible mutations for fatty acid composition and contribute largely to meat quality in the Japanese Black cattle population. Saatchi et al performed GWAS using BovineSNP50 in Angus beef cattle and reported FASN located at 51 Mb on BTA19 (from 51,384,922 to 51,403,614 bp) was associated with fatty acids [32]. Chen et al. found SNP rs41921177 (BTA19:51,326,750) located near FASN. This SNP rs41921177 had relative large effects on multiple fatty acids in both subcutaneous adipose and longissimus lumborum muscle tissues of crossbred beef cattle [33]. However, investigation of genetic  [8,49]. Another gene called ELOVL5 encodes a multi-pass membrane protein which is involved in the elongation of long-chain polyunsaturated fatty acids. This gene was identified in the associated region at 25.1 Mb on BTA23 accounting for 1.5% of the genetic variance for C14:1 cis-9. ELOVL5 plays an important role in de novo synthesis of specific MUFA species in mammalian cells, ELOVL5 knockdown decreased the elongation of C16:1 cis-9, n-7, and ELOVL5 overexpression increased synthesis of C18:1 cis-9, n-7 [53]. Also, previous study using mice models revealed that a reduced ELOVL5 activity can lead to hepatic steatosis, and endogenously synthesized PUFAs are critical regulators of fatty acid synthesis [54]. Lemos et al. reported a candidate region embedded in ELOVL5 can explain 4% of the genetic variance for C20:4 n-6 using ssGBLUP based on window association test in Nellore cattle [35]. The consistent role of ELOVL5 gene involved in fatty acid synthesis and composition was also extensively investigated in diverse pig populations [55][56][57].
Moreover, previous studies suggested that ELOVL5 are involved in the production of multiple acids including C16:0, C16:1, C18:0 and C18:1 in cattle [58]. Also, ELOVL5 was found associated with C20:1n9/C18:1n9 and C20:2n6/C18:2n6 in a F2 population derived from Erhulian pig [55]. As a result, ELOVL5 may have pleiotropic effects on multiple fatty acid composition and also appear to exhibit pleiotropic effects in multiple metabolic steps. However, we only identified ELOVL5 that was associated with C14:0 in Chinese Simmental cattle. In addition, several previous studies have suggested variants within SCD gene and the expression level of SCD gene should be significantly associated with fatty acids [18,20,30,32,47,48,59,60], the SCD gene was not detected in this study, probably due to heterogeneous genetic architecture of fatty acids differ across different populations.

Genomic prediction for fatty acid composition
Previous studies have investigated predictive abilities of genomic selection for fatty acid composition in American Angus [32], Japanese Black cattle [61] and Canadian beef cattle [33]. In the current study, we explored, for the first time, genomic prediction for fatty acid composition in Chinese Simmental cattle. We found that the accuracies of genomic prediction for most of fatty acids were relatively low (<0.30) using both GBLUP and BayesB, which was consistent with a previous report by Chen et al. [33]. This finding may be explained by the relatively low and inaccurate estimates of heritability for the measured fatty acid composition [62]. Our studies also revealed that BayesB provided slightly higher average regression coefficients as compared to GBLUP. Considering the complex architecture of fatty acid composition, this finding implied that BayesB which allows a fraction of SNPs to be allocated with relatively large effects is superior over GBLUP which assumes the same genetic variance for each SNP. Fatty acid composition are commonly recognized as complex traits with a polygenic nature and, to some extent, they are difficult to measure, thus the application of genomic selection for fatty acids will be valuable in future selection breeding programs. With increasing public understanding of the relationships between diet and health, much attention should be paid to the studies of some important fatty acids related to human health [63]. As consumer become more health conscious, they have increased preference for better tasting and healthier products in their diet such as unsaturated fatty acid levels. Further investigation of causal mutations will promote our understanding of lipid metabolism, fat deposition and application of selection for fatty acids in cattle.

Conclusions
We identified several significant associated regions and loci as the potential candidate markers for genomics-assisted breeding programs. Using multiple methods, our results revealed that FASN and ELOVL5 associated with fatty acids with strong evidences. Our analyses also suggested that it is feasible to perform genomic selection for fatty acids in the Chinese Simmental cattle population.

Ethics statement
All animals used in the study were treated following the guidelines established by the Council of China Animal Welfare. Protocols of the experiments were approved by the Science Research Department of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS) (Beijing, China).

Genotyping and quality control
In total, 723 Simmental cattle were genotyped for the Illumina BovineHD BeadChip. Before statistical analysis, SNPs were pre-processed using PLINK v1.07 [67]. SNPs were selected based on minor allele frequency (>0.05), proportion of missing genotypes (<0.05), and Hardy-Weinberg equilibrium (P > 10E-6). Moreover, individuals with more than 10% missing genotypes were excluded. After these quality controls, the final data consisted of 685 individuals and 595,715 autosomal SNPs.

Heritability and genetic correlation estimation
Phenotypic and genetic (co) variances of fatty acids were estimated using the pairwise bivariate animal model implemented in the ASReml v3.0 software package [68]. The model is where y 1 and y 2 are vectors of phenotypic values of trait 1 and 2, respectively; × 1 and × 2 are incidence matrices for fixed effects; b 1 and b 2 are the vectors of the fixed effects; Z 1 and Z 2 are incidence matrices relating the phenotypic observations to vectors of the polygenic (a) effects for two traits; e 1 and e 2 are random residuals for two traits. The polygenic effects were treated as random and assumed to be mutually independent.
Variances of the random effects are defined as V(a) = Gσ a 2 for the polygenes and V(e) = Iσ e 2 for the residuals, where G is the additive genetic relationship matrix, I is the identity matrix, σ a 2 is the additive genetic variance and σ e 2 is the residual variance. Matrix G matrix was inferred from the SNP markers according to the study of VanRaden [69]. Fixed effects in the model included effects of gender, farm and year. In addition, ages at slaughter, days between slaughter and fatty acid extraction, hot carcass weight and marbling score were considered as covariates in the model. Genomic heritability of each trait was estimated using Pairwise bivariate analyses were performed for each combination of fatty acids to estimate the (co) variance components, phenotypic and genetic correlations as well as the heritability.

Genome-wide association study using BayesB
Fatty acid composition was adjusted for fixed effects and covariates using a linear mixed model, and fixed effects and covariates were defined above. We conducted genome-wide association analyses using BayesB, which analyzed all autosomal SNP simultaneously and assumed different genetic variance for each SNP [40,70]. The model is described as follows, where y i is the adjusted phenotypic value for the i th individual, u is the mean (after removing fixed effects and all covariants), M is the number of SNP loci, Z ij is the j th SNP genotype of animal i coded as the number of B alleles in the genotype, α j is the average effect of allele substitution for SNP j, and is assumed to be normally distributed N (0, σ j 2 ), δ j is an indicator variable to show the presence (δ j = 1) and absence (δ j = 0) of marker j, and the presence is given a prior probability, and e i is the residual error with an assumed normal distribution N (0, σ e 2 ). The prior distribution of variance σ j 2 (or σ e 2 ) is assumed to be a scaled inverse Chi-square with degrees of freedom v α = 4 (or v e = 10) and scale parameter S α 2 (or S e 2 ). The scale parameter was usually derived from an assumed additive-genetic variance [71]. π was set to 0.9998, which meant that about 100-150 SNPs were fitted simultaneously in each MCMC iteration. The Markov chains were run for 50,000 cycles of iterations with the first 10,000 iterations being discarded as burnin followed by additional 40,000 iterations to form the posterior sample. All SNPs effects were estimated from the posterior sample. We performed GWAS for all the 21 fatty acids but only reported the results for the traits with genomic heritability ≥ 0.10. We inferred the associations for fatty acids using a 100 kb window rather than single marker [35,36]. There were 24,900 SNP windows across the 29 autosomes. The variance for each window was estimated using the genetic value of all adjacent SNPs within 100 kb window, and proportion of genetic variance explained by each window was obtained by dividing the variance of window breeding value by the variance of the whole genome breeding value. Genome windows with the highest posterior mean proportion of genetic variance ≥1% were considered as the most important regions associated with the traits. Positional candidate genes were investigated for the 100 kb windows using the UCSC Genome Browser, which allowed visualization of SNP based on the Bos taurus genome assembly UMD 3.1.

Genome wide association study using GRAMMAR
We also performed genome-wide association study using GRAMMAR-GC implemented in an R package GenA-BEL v1.8-0 [72]. The method accounts for population stratification and covariance structure of individuals inferred from all by SNPs. Bonferroni corrected threshold of 8.39E-08 (0.05/595,715) was adopted for the top 5% genome-wide significance. This correction was highly conservative for GWAS using high density SNPs array. To avoid the "overcorrection" for SNPs that may not truly independent due to LD across genome, we used a suggestive P value (P = 0.05/163,473) as thresholds proposed by Duggal et al., considering approximate the number of "independent" SNPs by counting 1 SNP per LD block, plus all SNPs outside of the LD blocks (interblock SNPs) [43].

Region-based association test and haploblock analyses
Region-based association test is a more powerful approach of gene mapping than the association test of an individual genetic variant. In this study, we performed the region-based association test for several target 100 kb regions identified using BayesB. SNPs in these regions and the adjusted fatty acid composition were investigated with this region-based association test using R package FREGAT [73]. The LD of these regions were estimated using PLINK v1.7 software [67].

Genomic prediction
Genomic best linear unbiased prediction (GBLUP) and BayesB were used for genomic prediction. Five-fold cross validation was used to evaluate the accuracy of genomic prediction. The data were split into five approximately equal-sized groups. For each cross validation, four groups were used as the training sample to estimate parameters and the remaining group was used as the test sample. The linear model is written as, where y is the vector of adjusted phenotypic values in the training sample, μ is the overall mean, a is the vector of breeding values for all animals, e is the vector of residuals errors and Z is the incidence matrix for the random effects. For the BayesB, SNP effects were estimated based on the training population using the statistical model described in the GWAS analyses. The GEBV for animal i in the validation population was predicted by summing up SNP effects over all loci as follows: GEBV i = ∑ j = 1 M Z ij α j , where α j is the estimated effect for SNP j. Predictive accuracy was measured as the correlation between the estimated breeding values and the adjusted phenotypic values divided by the square root of heritability separately for each of the 5-fold cross-validation replicates.

Additional files
Additional file 1: Table S1.