Genome-wide association and prediction of direct genomic breeding values for composition of fatty acids in Angus beef cattlea

Background As consumers continue to request food products that have health advantages, it will be important for the livestock industry to supply a product that meet these demands. One such nutrient is fatty acids, which have been implicated as playing a role in cardiovascular disease. Therefore, the objective of this study was to determine the extent to which molecular markers could account for variation in fatty acid composition of skeletal muscle and identify genomic regions that harbor genetic variation. Results Subsets of markers on the Illumina 54K bovine SNPchip were able to account for up to 57% of the variance observed in fatty acid composition. In addition, these markers could be used to calculate a direct genomic breeding values (DGV) for a given fatty acids with an accuracy (measured as simple correlations between DGV and phenotype) ranging from -0.06 to 0.57. Furthermore, 57 1-Mb regions were identified that were associated with at least one fatty acid with a posterior probability of inclusion greater than 0.90. 1-Mb regions on BTA19, BTA26 and BTA29, which harbored fatty acid synthase, Sterol-CoA desaturase and thyroid hormone responsive candidate genes, respectively, explained a high percentage of genetic variance in more than one fatty acid. It was also observed that the correlation between DGV for different fatty acids at a given 1-Mb window ranged from almost 1 to -1. Conclusions Further investigations are needed to identify the causal variants harbored within the identified 1-Mb windows. For the first time, Angus breeders have a tool whereby they could select for altered fatty acid composition. Furthermore, these reported results could improve our understanding of the biology of fatty acid metabolism and deposition.


Background
In response to the constant bombardment of healthrelated stories, consumers are becoming more health conscious and are becoming increasingly aware of the amount and type of fats and fatty acids they consume. Red meat is often perceived as a fatty protein source with certain health risks associated with its consumption. Beef could be viewed more favorably from a human health standpoint if strategies could be applied to decrease saturated fatty acid (SFA) content while increasing the concentration of beneficial polyunsaturated fatty acids (PUFA), especially omega-3 PUFA, and conjugated linoleic acid.
Beef producers continue to strive to produce a high quality product that meets consumer demands in a costeffective manner. While fatty acid profiles can be altered through the diet [1,2], identification of genetic markers that would allow producers to select beef for altered fatty acid composition could ultimately increase value and consumer satisfaction with beef. While producers have recently selected cattle with a higher propensity to marble, because of the premiums that they are paid, some consumers favor lower concentrations of SFA because of their perceived negative effect on human health. Therefore, the goal of the present study was to assess the utility of genetic markers to select for fatty acids composition, identify regions of the genome that account for genetic variation, and evaluate genome architecture of fatty acid regulation.

Results and discussion
Summary statistics for the fatty acid phenotypes analyzed in this study are reported in Table 1.

Posterior genetic and residual variances and heritability
The discovery process generates an estimate, similar to pedigree-based heritability, of the proportion of phenotypic variation that can be accounted for using SNP markers for each of the fatty acids studied on a beef meat or fat percent basis ( Table 2). The proportion of phenotypic variance explained (h 2 ) by SNP genotypes varied from a very low amount (0.06) for 18:1c13, 18:1t6pt9, 18:3n6, and 20:3n3, which indicates that the marker predictions will be poor, to relatively high (> 0.49) for 14:0, 14:1,16:0, 16:1, 18:0, 18:1c9 and 24:0, which indicates potential for relatively good marker predictions. In general, the percentage of phenotypic variance explained by markers was higher when fatty acids were analyzed on a fat percent compared with beef basis. This result is not unexpected given that, on a beef basis, the level of any given fatty acid is influenced by both its relative amount in comparison to other fatty acids as well as the amount of lipid present in the given sample. In contrast, on a fat percent basis, only variation relative to other fatty acids is taken into account. If total fatty acid content is included as a covariate when analyzing fatty acids on a beef basis, heritabilities similar to a weight percent basis are obtained (data not shown). This comparison would indicate that much of the variation in heritability estimates between methods result from variation in total fatty acid content. On a fat percent basis, fatty acids with chain length >18 carbons (with the exception of 24:0), had lower heritability (0.06 to 0.24) than shorter chain fatty acids (0.08 to 0.57). This heritability difference might indicate that genes involved in the production and/or metabolism of these longer-chain fatty acids are under selective pressure to minimize variation. Alternatively, given the fact that de novo fatty acid synthesis in cattle is limited to primarily 14, 16 and 18 carbon fatty acids [3], it is possible that the observed variation in longer chain fatty acids result from host genetic variation influencing the population of rumen microbiota, which modify ingested fatty acids [4,5].
Given the relatively high amount of phenotypic variance in 14:0, 16:0, 18:0, 18:1c9, (14:0 + 16:0)/All and AI that variation that can be accounted for by molecular markers, it should be possible to select for a more heart healthy fatty acid composition.

Direct genomic breeding values (DGV) coefficients, correlations and accuracy
The numbers of individuals in each K-means clustered group are shown in Table 3. The pooled regression coefficients and the simple correlations between DGV and phenotypes over 5 K-means clustered groups, and the realized accuracies of DGV for some fatty acid traits are in Table 4. The pooled regression coefficient ranged from 1.53 for CLAc9t11 to −0.47 for 20:3n3, the pooled simple correlation ranged from 0.43 for 14:0, MCFA, and AI to −0.02 for 20:3n3, while the accuracies of genomic prediction varied from 0.57 for 14:0, LCFA, and MCFA to −0.06 for 20:3n3 (Table 4). Given the higher accuracies associated with 14:0 and 16:0, it should be possible to develop a selection index to minimize these two fatty acids. Alternatively, producers could use ratios like (14:0 + 16:0)/All or AI to select for animals that have decreased levels of shorter chain saturated fatty acids.

Whole genome association
The 1-Mb SNP windows with the highest genetic variances and a posterior probability of having non-zero genetic variance greater than 90% (PPI) for fatty acids on a fat percent (Table 5) and beef basis (Table 6), respectively. The proportion of genetic variance explained by 1-Mb SNP windows ranged from 78.6% for 18:3n6 to 1.6% for 24:5 (Table 5) on a fat percent basis, and 60.5% for 10:0 and 1.5% for 24:0 on a beef basis ( Table 6). Many of the 1-Mb windows were associated with more than one fatty acid. For example, the 51 st Mb window on chromosome 19 was associated with 14:0, 14:1, 16:0, 16:1, 18:1c9, LCFA, MCFA, MUFA, SFA, (14:0+16:0)/ All, and AI on a fat percent basis. Whereas, only the 49 th Mb on chromosome 24 was associated with 17:1 (Table 5). No other region on chromosome 24 was associated with any other fatty acid.
Many of the 1-Mb windows that were identified harbored good candidate genes. For example, fatty acid synthase (FASN) is located on chromosome 19 between 51,384,922 and 51,403,614 bp, which is almost exactly in the middle of this 1-Mb window. Previously, our group reported that variants in FASN were associated with fatty acid composition in Angus [11]. In addition, FASN has been reported to be associated with bovine adipose composition, milk fat content, and fatty acid composition of beef in several different breeds of cattle, which indicates that it has a conserved role across genetic backgrounds [12][13][14][15][16][17][18][19][20][21][22]. Interestingly, there are several different variants that are responsible for FASN effects in the different breeds [11,12]. Furthermore, Sterol-CoA desaturase (SCD) is located on chromosome 26 between 21,132,751 and Table 1 The summary statistics of mean (μ), standard deviation (SD) and coefficient of variation (CV) for all studied fatty acids traits in both meat and fat percent bases Beef meat basis 1 Fat percent basis 2  Tables 5 and 6). Previously, SCD variants have been reported to be associated with fatty acid composition of meat and milk fat [17,18,[20][21][22][23][24][25][26][27][28][29]. In contrast, other 1-Mb regions contain no obvious candidate genes, for example the 20 th Mb window on chromosome 28 that is associated with cis-11 18:1. After the 1-Mb window that harbors FASN, a region on chromosome 29 (18 th Mb window) could account for the second greatest amount of genetic variance in 14:0, 14:1, 16:0, 16:1, cis-9 C18:1, LCFA, and MCFA. This region has not previously been reported to be associated with any adipose trait other than subcutaneous fat thickness (http://animalgenome.org/cgi-bin/QTLdb/ BT/index) [30]. Interestingly, thyroid hormone responsive (THRSP) has been reported to act at the level of transcription to regulate genes that encode enzymes required for long-chain fatty acid synthesis [31]. In addition in knockout studies, it has been reported that THRSP null mice showed a marked deficiency in de novo lipogenesis. Moreover, knockout studies have also revealed that THRSP may work in the cytoplasm by tethering FASN to the microtubule [32]. Thus, it would appear that THRSP is a good candidate gene, which was recently reported to be associated with fatty acid composition in Korean cattle [33]. It should be noted that none of the 1-Mb windows that harbor SREBP1, ACACA, PPARG, FABP4, ACSL1, LEP, or LXRA, which are all genes that have been previously associated with fatty acid composition in beef [17,[34][35][36][37][38], were associated with variation in any fatty acid in this study. When taken in concert with the fact that different FASN alleles appear to be segregating in different breeds [11,16], this may indicate that the genetic mechanisms controlling fatty acid composition may vary greatly from breed to breed. This is further supported by the fact that the FASN region in Japanese Black cattle appears to account for the vast majority of the genetic variance, while in contrast several regions are reported here for American Angus.

Within regions correlation
The correlations between DGV within 19_51, 26_21 and 29_18 windows (windows harboring the candidate genes FASN, SCD and THRSP, respectively) for each pair of C14:0, C14:1, C16:0, C16:1, C18:0 and cis-9 C18:1 fatty acids on the fat percent basis are summarized in Figure 1. There are two clear patterns in the within windows estimated correlations between fatty acid. The first pattern involves the regions located on chromosome 19 (19_51) and 29 (29_18), which harbor FASN and THRSP as candidate genes, respectively. Estimates of the DGV correlations were very high and positive among 14:0, 14:1, 16:0 and 16:1, however regional DGV correlation between this group of fatty acids and 18:0 and cis-9 18:1 were large and negative. While the DGV correlation between 18:0 and cis-9 18:1 were very high and positive. Regions 19_51 and 29_18 were found to be associated to all fatty acids except for 18:0, where only the region on chromosome 29 was identified (Table 5). These results indicate that both, FASN and THRSP, exhibit pleiotropic effects for most fatty acids and act in a coordinate manner to contribute to the formation of fatty acid involved in de novo synthesis. However, for the formation of 18:0 and cis-9 18:1 a different elongase [39] is required. Therefore, the negative correlation may indicate competition between enzymes for the same substrate.
The second correlation pattern involves the region on chromosome 26 (26_21), which harbors SCD. Correlations were in general lower than the ones obtained in the previous two regions. The within region correlation between the 14:0, 16:1 and 18:0 were all strong and positive. Weaker positive correlations were also observed with 16:0. However, the correlations of DGV for those fatty acids with 14:1 and cis9 18:1 were negative.  The amount of fatty acid in 1 gram beef meat. 2 The percent of fatty acid in total fatty acid. 3 n-3 fatty acids. 4 n-6 fatty acids. 5 Atherogenic Index. 6 NA = Not Analyzed. Table 2 The posterior estimates of genetic (σ 2 g ) and residual (σ 2 e ) variances, and the estimated heritability (h 2 ) for all studied fatty acids traits in both meat and fat percent bases summarize the within regions correlations among the same fatty acids on the beef meat basis. The same patterns of correlations were obtained on the beef basis as those obtained for fat percentage basis except for 16:0 (at all three windows) and cis9 18:1 (at 26_21 and 29_18 windows) where no QTL was detected on these regions for these fatty acids on the beef basis analysis ( Table 6). Patterns of correlations illustrate how the selection to change fatty acid composition of fat could have a differential effect depending upon the region that is affected by selection. Thus, the use of genomic information creates an opportunity for a more precise selection by using specific regions information rather than pedigree based selection. On the other hand, we have been assuming that the observed correlations are due to pleiotropic effects, which might not be the case. To what extent the correlations are due to selection for increased marbling in the Angus population is unknown.

Conclusion
This study is the first genome selection and genome wide association analyses for fatty acid composition in American Angus sired cattle. Fatty acid composition is of paramount importance due to their role in cardiovascular health. The genetic dissection of fatty acid composition could lead to a better understanding of the molecular mechanisms that control fatty acid content in meat. We utilized a large Angus-sired population to calculate genomic breeding values of individual animals and to identify genomic regions harboring genetic variation associated with fatty acid composition. Molecular markers were able The amount of fatty acid in 1 gram beef meat. 2 The percent of fatty acid in total fatty acid.   Table 4 The pooled regression coefficient of phenotype on DGV (b (P,DGV) ), the pooled simple correlation between DGV and phenotype (r (DGV,P) ), and the realized accuracy 1 of DGV for some fatty acid traits as percent in total fatty acid As the pooled simple correlations between DGV and phenotypes in validation groups divided by the square root of trait heritability. Table 2 The posterior estimates of genetic (σ 2 g ) and residual (σ 2 e ) variances, and the estimated heritability (h 2 ) for all studied fatty acids traits in both meat and fat percent bases (Continued)  to account for between 6 and 57% of the observed variance in an individual fatty acid. In addition, the accuracy of the DGV (measured as simple correlations between DGV and phenotype) ranged from −0.06 to 0.57. Furthermore, we identified 57 1-Mb windows with a posterior probability of inclusion (> 0.90) that harbor genetic variation associated with individual fatty acid content. This large number of genomic regions might indicate the presence of an elaborate molecular mechanism that control fatty acid content in skeletal muscle. In addition, the correlation of DGV among the different fatty acids within specific genomic regions might help to articulate the genetic correlations between any two traits. Taken together these results provide the most comprehensive evaluation of the genetic mechanisms that control fatty acid composition in skeletal muscle.

Methods
All animal work was approved by the Iowa State University Animal Care and Use committee before the conduction of this study.

Genotype and Phenotype data
A total of 2,177 Angus-sired calves sired by 134 Angus sires were genotyped with the BovineSNP50 BeadChip (Illumina, San Diego, CA). Sixty-seven animals that had incomplete phenotype or fixed effect information were removed, leaving 2,110 animals represented by bulls (n = 500), steers (n = 1,210), and heifers (n = 400), born between 2002 and 2008. Production characteristics and additional detail of the sample collection and preparation of these cattle were reported previously [40]. After external fat and connective tissue were removed, the 1.27-cm steaks were freeze ground in liquid nitrogen to produce a powder that was analyzed for fatty acid composition. Total lipid was extracted with a chloroform and methanol (2:1, vol:vol) mixture and then quantified [41]. The individual lipid spots were derivatized to methyl esters with acetyl chloride in methanol prior to gas chromatography for determination of fatty acid composition. Fatty acid methyl esters (FAME) were analyzed by gas chromatography (model 3400, Varian, Palo Alto, CA) using a Supelco SP-2380 column (30 m × 0.25 mm i.d. × 0.20 μm film thickness) and a flame ionization detector. The column started at a temperature of 100°C and was ramped up to 170°C at a rate of 2°C per minute, followed by an increase to 180°C at 0.5°C per minute and to 250°C at 10°C per minute. The total running time was 62 min. The temperature of the injector was programmed to increase from 68°C to 250°C at a rate of 250°C per minute. The detector was maintained at 220°C.
The phenotypic observations on fatty acid composition were used as response variables to estimate marker effects for each fatty acid separately. In total, 49 fatty acid traits were analyzed in this study. Each trait was measured in two different ways: 1) beef basis = weight of a given fatty acid, g×10 -5 , in 1 gram meat, 2) fat percent = weight of a given fatty acid in relation to total extracted fatty acid times 100. The individual fatty acids analyzed were  Posterior probability of inclusion (non-zero genetic variance).

Statistical model
In this study, all 53,367 SNP markers were used as predictors with fatty acid phenotypes as response variables to estimate SNP effects. The "BayesB" method [43] that fits a mixture model where non-zero SNP effects are drawn from distributions with marker specific variance and some known fraction of markers (π) have zero effect was used to estimate marker effects for genomic predictions. For each trait the following model was fit to the estimate marker effects: y¼XbþZuþe; where y is the vector of observations for a particular fatty acid trait; b is the vector of fixed effects including population mean, contemporary group (defined as feed location-harvest date-sex), and covariates including subcutaneous fat thickness at 12 th rib, longissimus muscle area at 12 th rib, hot carcass weight, and the amount of chemically extracted fat; u is a vector of random marker effects, where element j of u has σ 2 u j > 0 (with probability 1 -π) or σ 2 u j ¼ 0 (with probability π) as described by [44]; X and Z are design matrices which relate phenotypic observations to fixed and marker effects, respectively, with each element of Z representing allelic state (i.e., number of B alleles from the Illumina A/B calling system); and e is the vector of random residuals~N(0, σ 2 e ). In this study, parameter π was set to 0.999 for all analyses as high π values were estimated for fatty acid traits in preliminary analyses using BayesCπ method [44]. MCMC methods with 41,040 iterations were used to obtain estimates of marker effects and variances as the posterior means of the corresponding sampled values after discarding the first 1,000 samples to allow for burn-in. In preliminary analyses, the BayesC method [45], which has been shown to be less sensitive to prior assumptions than BayesB [44], was first fitted using prior genetic and residual variances equal to half of total phenotypic variance of each trait and π=0.95 to obtain posterior estimates of genetic and residual variances for constructing priors of genetic and residual scale parameters for BayesB, and to estimate the heritabilities (as the ratios of posterior means of genetic variances over the posterior phenotypic variances) of fatty acid traits.
The DGV for individual i was derived by multiplying the number of copies of B alleles by their corresponding posterior mean SNP effect, and summing these values over all k marker loci: where DGV i is the DGV for individual i, z ij is the marker genotype of individual i for marker j, andû j is the posterior mean effect of marker j obtained from the 40,000 post burn-in samples. Estimated effects of markers within each 1-Mb window (defined by the UMD3.1 assembly) were used every 40 th iteration to compute genomic breeding values of all animals for every window. The variance of DGV for any particular window (across all animals) were used to compute the genetic variance of that window. Unmapped markers were considered as an extra window. Posterior probability of inclusion (PPI) for a given window, which is the proportion of samples in which at least one SNP from a given window was included in the model with a non-zero effect, was used for significance testing [46]. A window with PPI > 90%  (across 1,000 samples obtained from 40,000 post burn-in samples) was selected as a window containing (or being) a QTL. The PPI has close connections with frequentist approaches that control the false discovery rate [47]. All analyses were performed using GenSel software [48].
Estimates of the proportion of genetic variation explained by each 1-Mb window obtained from the genome-wide association study was plotted against genomic location using SNPLOTz v.1.52 [49]. Individual 1-Mb that explained the largest proportion of genetic variation were then visualized in GBrowse [50] for detailed inspection of the chromosomal region containing the 1-Mb window. Gene searches were performed for these genomic regions with the highest genetic variances.

Accuracies of DGV
A cross-validation strategy was applied to estimate the accuracies of DGV for traits that may be of interest for breeding. First, the genotyped animals were divided into 6 unequally sized mutually exclusive groups using Kmeans clustering whereby genomic relatedness was increased within each group and decreased between each of the groups. In this way the detection of true linkage disequilibrium is favored versus just family linkage. Two resultant small groups were combined together to make a single, fifth group. The method of VanRaden et al. [51] was used to construct a genomic relationship matrix between genotyped animals. The Hartigan and Wong [52] algorithm, implemented using R [53] was used for Kmeans clustering based on a difference matrix obtained from the genomic relationships among the genotyped animals. Details concerning K-means clustering for assigning animals to groups are in Saatchi et al. [54].
Second, a training analysis was undertaken whereby the data excluded one group to train on the remaining groups to estimate marker effects, which then were used to predict DGV of individuals from the omitted group (validation set). This analysis resulted in every animal having its predicted DGV obtained without using its own phenotype nor those of close relatives in training. For each trait, the realized accuracy of DGV was calculated as the pooled correlations between DGV and phenotypes in validation groups divided by the square root of trait heritability.

Correlation of within 1-Mb region DGV
The DGV for each of three important 1-Mb windows (19_51, 26_21 and 29_18), which harbor the candidate genes FASN, SCD and THRSP, respectively, were calculated for C14:0, C14:1, C16:0, C16:1, C18:0 and cis-9 C18:1 fatty acids (those involved in de novo synthesis and other abundant fatty acids that are generated by further elongation and desaturation) on both fat percent and beef meet bases. The correlations between DGV for a given 1-Mb window were estimated for each pair of fatty acids using posterior mean of covariances and relevant variances to gain an insight into possible pleiotropic effects of QTL regions associated with these fatty acids.
Endnote a