Genome-wide association and genotype by environment interactions for growth traits in U.S. Gelbvieh cattle

Background Single nucleotide polymorphism (SNP) arrays have facilitated discovery of genetic markers associated with complex traits in domestic cattle; thereby enabling modern breeding and selection programs. Genome-wide association analyses (GWAA) for growth traits were conducted on 10,837 geographically diverse U.S. Gelbvieh cattle using a union set of 856,527 imputed SNPs. Birth weight (BW), weaning weight (WW), and yearling weight (YW) were analyzed using GEMMA and EMMAX (via imputed genotypes). Genotype-by-environment (GxE) interactions were also investigated. Results GEMMA and EMMAX produced moderate marker-based heritability estimates that were similar for BW (0.36–0.37, SE = 0.02–0.06), WW (0.27–0.29, SE = 0.01), and YW (0.39–0.41, SE = 0.01–0.02). GWAA using 856K imputed SNPs (GEMMA; EMMAX) revealed common positional candidate genes underlying pleiotropic QTL for Gelbvieh growth traits on BTA6, BTA7, BTA14, and BTA20. The estimated proportion of phenotypic variance explained (PVE) by the lead SNP defining these QTL (EMMAX) was larger and most similar for BW and YW, and smaller for WW. Collectively, GWAAs (GEMMA; EMMAX) produced a highly concordant set of BW, WW, and YW QTL that met a nominal significance level (P ≤ 1e-05), with prioritization of common positional candidate genes; including genes previously associated with stature, feed efficiency, and growth traits (i.e., PLAG1, NCAPG, LCORL, ARRDC3, STC2). Genotype-by-environment QTL were not consistent among traits at the nominal significance threshold (P ≤ 1e-05); although some shared QTL were apparent at less stringent significance thresholds (i.e., P ≤ 2e-05). Conclusions Pleiotropic QTL for growth traits were detected on BTA6, BTA7, BTA14, and BTA20 for U.S. Gelbvieh beef cattle. Seven QTL detected for Gelbvieh growth traits were also recently detected for feed efficiency and growth traits in U.S. Angus, SimAngus, and Hereford cattle. Marker-based heritability estimates and the detection of pleiotropic QTL segregating in multiple breeds support the implementation of multiple-breed genomic selection.


Background
Growth traits are commonly recorded and used as selection criteria within modern beef cattle breeding programs and production systems; primarily because of their correlation with increased overall meat production and other economically important traits [1][2][3][4]. Some of the most commonly investigated growth traits include birth weight (BW), weaning weight (WW) and yearling weight (YW); with BW considered as both a production indicator, and a primary selection criterion for improving calving ease by reducing dystocia events [1,2,[5][6][7]. Moreover, while previous studies have demonstrated that low estimated breeding values (EBVs) for BW are associated with reductions in both calf viability [6] and growth rates [5,7], increased dystocia rates may also occur if sires with high EBVs for BW are used in conjunction with dams that possess small pelvic size. Therefore, modern beef breeding programs and production systems generally strive to increase calving ease, and maximize other growth-related traits such as WW and YW, particularly considering the known correlations between growth traits and other economically important carcass and reproductive traits [3,5,7].
Given the increasing economic importance of growth traits in beef cattle, a number of studies have sought to identify quantitative trait loci (QTL) influencing bovine body weight, growth, and aspects of stature, including both linkage studies and modern genome-wide association analyses [2,[8][9][10][11][12][13]. Several recent studies have also established moderate heritability estimates for bovine growth traits in U.S. beef cattle including BW, WW, and YW [14][15][16][17], with a number of relevant QTL and positional candidate genes identified to date, including orthologous genes that affect both human and bovine height [2,[18][19][20][21][22]. Notably, with the advent of the bovine genome assembly [23], the development of the Illumina Bovine SNP50 and 778K HD assays [23,24], and more recently, the demonstrated ability to impute high density genotypes with high accuracy [25], an industry-supported research framework [26] has emerged that allows for very large-sample studies to be conducted without the costs associated with directly ascertaining high density genotypes (≥ 778K) for all study animals.
Herein, we used 10,837 geographically diverse U.S. Gelbvieh beef cattle and a union set of 856,527 (856K) imputed array variants to conduct GWAA with marker-based heritability estimates for BW, WW, and YW. Additionally, we used thirty-year climate data and K-means clustering to assign all Gelbvieh beef cattle to discrete U.S. climate zones for the purpose of estimating genotype-by-environment (GxE) interactions for BW, WW, and YW. This study represents the largest, high-density, single breed report to date with both standard GWAA and GxE GWAA for BW, WW, and YW. Additionally, we also evaluate the general concordance of GWAAs conducted using two popular methods (GEMMA; EMMAX) [27][28][29]. The results of this study are expected to positively augment current beef cattle breeding programs and production systems, particularly for U.S. Gelbvieh cattle, but also serve to highlight the increasing potential for eliciting economic impacts from industrysupported research frameworks that were developed for enhancing U.S. food security.

Results and discussion
Heritability estimates for BW, WW, and YW in U.S. Gelbvieh beef cattle Herein, we used two approaches to generate markerbased heritability estimates for all investigated traits. Specifically, standardized relatedness matrices produced with GEMMA (G s ) [27] and genomic relationship matrices (GRM) normalized via Gower's centering approach and implemented in EMMAX [25,[28][29][30], were used to compare the chip or pseudo-heritability estimates for each investigated trait (Table 1). Notably, both approaches produced moderate heritability estimates with small standard errors for BW, WW, and YW; and heritability estimates for YW were highest among all investigated traits for U.S. Gelbvieh beef cattle. Moderate heritability estimates produced here using both approaches further support the expectation of positive economic gains resulting from the implementation of genomic selection [30].
GxE GWAA for BW, WW, and YW in U.S. Gelbvieh beef cattle To investigate the potential for significant GxE interactions in relation to BW, WW, and YW in U.S. Gelbvieh beef cattle, we conducted six additional single-marker (856K) analyses using both GEMMA and EMMAX [27][28][29]. For all a GEMMA chip heritability [27]; EMMAX pseudo-heritability [28,29] Fig. 1 Birth weight (BW) QTL. Manhattan plot with GEMMA -log 10 P-values. Lead and supporting SNPs for QTL represented at or above the blue line (P ≤ 1e-05; −log 10 P-values ≥ 5.00) for n = 10,837 U.S. Gelbvieh beef cattle. A summary of all markers passing the nominal significance threshold [31] is presented in Table 2 analyses, we included a variable for Gelbvieh geographic zone, which was generated via K-means clustering using thirty-year U.S. climate data, and treated as an interaction term (See Methods). Notably, a BW GxE QTL detected on BTA2 (2_32 Mb; lead SNP is intergenic) revealed multiple biologically relevant positional candidate genes, including GRB14, which has been shown to regulate insulin in mice [72], and FIGN, which has been associated with plasma folate levels in humans (Fig. 4, Table 5, Additional File 2) [73]. Importantly, maternal folate levels have been shown to influence human birthweight [74], and a role for insulin regulation in bovine feed efficiency and growth traits has also been described [30]. Beyond BTA2, BW GxE QTL were also detected on BTA17 (17_66 Mb) and BTA13 (13_67 Mb). Positional candidate genes for these QTL have been implicated in the removal of uracil residues from DNA and apoptosis (UNG) as well as human obesity (CTNNBL1) (Fig. 4, Table 5, Figure S4, Table S6, Additional File 1) [75,76]. Examination of the lead SNPs for all GxE QTL detected for Gelbvieh BW (Table 5, Table S6, Additional File 1, Additional File 2) revealed three noncoding variants, which is suggestive of quantitative (i.e., regulatory) effects.
Genomic inflation factors and correlation coefficients for Pvalues obtained from all GxE BW analyses are shown in Tables S2-S3 (Additional File 1). Our analyses (GEMMA, EMMAX) to evaluate the potential for significant GxE interactions with respect to WW in U.S. Gelbvieh beef cattle produced evidence for one GxE QTL on BTA2 (2_18 Mb) which was only detected by GEMMA, and included relatively few supporting SNPs (P ≤ 1e-05, Table 6; Fig. 5, Figure S5, Additional File 1). The lead SNP defining this QTL was located in exon 304 of TTN, and encoded a nonsynonymous variant ( Table 6, Fig. 5, Additional File 2). Interestingly, TTN is known to function as a myofilament system for skeletal and cardiac muscle, with mouse M-line deficient knockouts resulting in sarcomere disassembly as well as muscle atrophy and death [77][78][79].
Analyses (GEMMA; EMMAX) to evaluate the potential for significant GxE interactions with respect to YW in U.S. Gelbvieh beef cattle revealed two GxE QTL with three positional candidate genes (LRAT/LOC101904475/FGG) on BTA17 (17_03 Mb), and one positional candidate gene on BTA5 (PHF21B at 116 Mb; P ≤ 1e-05, Table 7, Fig. 6, Table  S7, Figure S6, Additional File 1, Additional File 2). The signal on BTA17 (i.e., GEMMA lead SNP in Intron 4 of LOC101904475 and supporting SNPs) was replicated by EMMAX ( Figure S6, Additional File 1); but at a less stringent significance threshold (i.e. P < 6e-04). Notably, while the function of LOC101904475 remains unclear, LRAT is known to catalyze esterification of retinol (i.e., from Vitamin A) [80], and Vitamin A has been shown to promote growth in beef cattle as well as humans [81][82][83]. However, FGG is also an intriguing candidate, as fibrinogen has been shown to constrict blood vessels [84]. This vasoconstriction may alter the ability to cope with heat stress, but in the context of cattle production, the relationship between vasoconstriction and fescue toxicosis is perhaps more noteworthy. Fescue toxicosis is the result of ergot alkaloids produced by the endophytic fungus in fescue forage [85], especially the Kentucky 31 variety. One of the major symptoms of fescue toxicosis is vasoconstriction, thus variation in FGG expression levels may potentially alter cattle's innate degree of vasoconstriction; perhaps further complicating both fescue toxicosis and heat     Table 4 stress. The other interesting positional candidate gene on BTA5 (PHF21B) is known to be involved in the modulation of stress responses, and the regulation of cellular division [86,87].

Conclusions
Herein, we present evidence for pleiotropic QTL influencing BW, WW, and YW in U.S. Gelbvieh beef cattle, and further confirm the involvement of PLAG1 in various aspects of bovine growth and stature across breeds [2,14,18,21,30,[32][33][34]. Additionally, we also present compelling evidence for QTL segregating in multiple breeds; with at least seven U.S. Gelbvieh growth QTL that were also detected for feed efficiency and growth traits in U.S. Angus, SimAngus, and Hereford beef cattle [30]. Despite the involvement of major genes such as NCAPG, PLAG1 and LCORL, more of the phenotypic variance in Gelbvieh BW, WW, and YW was explained by many other genome-wide loci (See Additional File 1, Additional File 2). Moreover, we demonstrate that most of the Gelbvieh QTL are detectable by two different large-sample analyses (GEMMA; EMMAX). However, some discordant QTL detected by the GxE GWAAs can also be attributed to differences in the model specifications for these analyses, as implemented by GEMMA and EMMAX (See Methods). While relatively few GxE QTL were detected, the identified GxE QTL harbor physiologically meaningful positional candidates. Moreover, the results of this study demonstrate that imputation to a union set of high-density SNPs (i.e., 856K) for use in large-sample analyses can be expected to facilitate future discoveries at a fraction of the cost associated with direct genotyping, which also underscores the present impact of genomic tools and resources developed by the domestic cattle research community.

Methods
Cattle phenotypes were received from the American Gelbvieh Association (pre-adjusted for age of animal [i.e. 205day weight for WW] and age of dam as per breed association practice), and corresponding genotypes were transferred from their service provider Neogen GeneSeek. For GWAA analyses, the phenotypes were pre-adjusted for sex and contemporary group consisting of 5-digit breeder zipcode, birth year, and birth season (Spring, Summer, Fall, and Winter) using the mixed.solve() function from the rrBLUP package v4.4 [88] in R v3.3.3 [89].
To group individuals into discrete climate zones, Kmeans clustering was performed on three continuous climate variables. Thirty-year normal values for temperature, precipitation, and elevation were drawn from the PRISM climate dataset [90]. Each one km square of the continental United States was assigned to one of nine climate zones using K-means clustering implemented in the RStoolbox R package [91,92]. The optimal number of zones was identified using the pamk function from the R package fpc [93]. Individuals were assigned to zones based Fig. 4 Birth weight genotype-by-environment (BW GxE) QTL. Manhattan plot with GEMMA -log 10 P-values. Lead and supporting SNPs for QTL represented at or above the blue line (P ≤ 1e-05; −log 10 P-values ≥ 5.00) for n = 10,837 U.S. Gelbvieh beef cattle. A summary of all markers passing the nominal significance threshold [31] is presented in Table 5 Table on the zip code of their breeder as recorded in the American Gelbvieh Association herdbook. Quality control was performed on genotypes for 13,166 Gelbvieh individuals using PLINK 1.9 [94]. Individuals with call rates < 0.90 were removed on an assay-by-assay basis (For assay information see Additional File 3). Variants with call rates < 0.90 or Hardy-Weinberg Equilibrium (HWE) P-values <1e-20 were also removed. For this analysis, only autosomal chromosomes were analyzed. After filtering, genotypes for the 12,422 individuals that remained were merged using PLINK and then phased using EagleV2.4 [95]. Genotypes inferred by Eagle were removed with bcftools [96]. Imputation was performed with IMPUTE2 [97] using the "merge_ref_panels" flag. This allowed the phased haplotypes for 315 individuals genotyped on the Illumina HD (Illumina, San Diego, CA) and 559 individuals genotyped on the GGP-F250 (GeneSeek, Lincoln, NE) to be recursively imputed and treated as reference haplotypes. These reference haplotypes were used to impute the remaining 11,598 low-density genotypes from various assays (Additional File 3) to the shared number of markers between the two high-density research chips. The resulting dataset consisted of 12,422 individuals with 856,527 markers each (UMD3.1). To account for uncertainty in imputation, IMPUTE2 reports dosage genotypes. Hard-called genotypes were inferred from dosages using PLINK. When making hard-calls, PLINK treats genotypes with uncertainty > 0.1 as missing. This resulted in a hard-called dataset of 856,527 variants, which includes genotypes set as missing. Prior to the execution of all GWAAs (GEMMA; EMMAX), we filtered the Gelbvieh samples and all SNP loci as follows: Gelbvieh sample call rate filtering (< 90% call rate excluded); thereafter SNP filtering by call rate (> 15% missing excluded), MAF (< 0.01 excluded), polymorphism (monomorphic SNPs excluded), and HWE (excludes SNPs with HWE P < 1e-50), which resulted in 618,735 SNPs. Additionally, prior to all GWAAs (GEMMA; EMMAX) hard-called genotypes were numerically recoded as 0, 1, or 2, based on the incidence of the minor allele. Missing hard-called genotypes (i.e., that met our filtering criteria) were modeled as the SNP's average value (0, 1, or 2) across all samples.
Using the numerically recoded hard-called genotypes and the adjusted Gelbvieh phenotypes, we employed GEMMA to conduct univariate linear mixed model GWAAs where the general mixed model can be specified as y = Wα + xβ + u + ϵ; where y represents a n-vector of quantitative traits for n-individuals, W is an n x c matrix of specified covariates (fixed effects) including a column of 1s, α is a c-vector of the corresponding coefficients including the intercept, x represents an n-vector of SNP genotypes, β represents the effect size of the SNP, u is an n-vector of random effects, and ϵ represents an n-vector of errors [27]. Moreover, it should also be noted that u ∼  [77][78][79]; Rabbit, rat, human; aids in myofibrillar assembly, positioning of myosin filaments in muscle, coordinates multiple signaling pathways for gene activation, protein folding, quality control and degradation, heart disease relation a Indicates a predicted nonsynonymous mutation Arg➔Gln, exon 304 Fig. 5 Weaning weight genotype-by-environment (WW GxE) QTL. Manhattan plot with GEMMA -log 10 P-values. Lead and supporting SNPs for QTL represented at or above the blue line (P ≤ 1e-05; −log 10 P-values ≥ 5.00) for n = 10,837 U.S. Gelbvieh beef cattle. A summary of all markers passing the nominal significance threshold [31] is presented in Table 6 Smith et al. BMC Genomics (2019) 20:926 MVN n (0, λτ −1 Κ) and ϵ ∼ MVN n (0, λτ −1 Ι n ), where MVN denotes multivariate normal distribution, λτ −1 is the variance of the residual errors, λ is the ratio between the two variance components, Κ is a known n x n relatedness matrix, and Ι n represents an n x n identity matrix [27]. Using this general approach, GEMMA evaluated the alternative hypothesis for each SNP (H 1 : β ≠ 0) as compared to the null (H 0 : β = 0) by performing a likelihood ratio test with maximum likelihood estimates (−lmm 2) as follows: , with l 1 and l 0 being the likelihood functions for the null and alternative models, respectively, whereλ 0 andλ 1 represent the maximum likelihood estimates for the null and the alternative models, respectively, and where P-values come from a X 2 distribution, as previously described [27]. Herein, the only fixed-effect covariate specified for all GWAAs was an environmental variable (geographic zone for each individual). For all GxE GWAAs (−gxe command), the environmental variable (geographic zone for each individual) was treated as an interaction term, where the resulting P-values represent the significance of the genotype x environment interaction. Specifically, for the GxE GWAAs in GEMMA, the model is specified as y = Wα + x snp β snp + x env β env + x snp × x env β snp × env + u + ϵ; where y represents a n-vector of quantitative traits for n-individuals, W is an n x c matrix of specified covariates (fixed effects) including a column of 1s, α is a cvector of the corresponding coefficients including the intercept, x snp represents an n-vector of SNP genotypes, β snp represents the effect size of the SNP, x env represents an nvector of environmental covariates, β env represents the fixed effect of the environment, β snp × env is the interaction between SNP genotype and environment, u is an n-vector of random effects, and ϵ represents an n-vector of errors. GEMMA evaluated the alternative hypothesis for each interaction (H 1 : β snp × env ≠ 0) as compared to the null (H 0 : β snp × env = 0). Marker-based relatedness matrices (G s ) relating instances of the random effect specified to each of the growth phenotypes among all genotyped cattle were used to estimate the proportion of variance explained (PVE) by the hard-called genotypes in GEMMA, which is also commonly referred to as the "chip heritability" [27,98]. For all investigated traits, single-marker P-values obtained from GEMMA (−lmm 2, −gxe) were used to generate Manhattan plots in R (manhattan command) and QTL were defined by ≥ 2 SNP loci with MAF ≥ 0.01 (i.e., a lead SNP plus at least one additional supporting SNP within 1 Mb) which also met a nominal significance threshold (P ≤ 1e-05) [30,31]. Using hard-called genotypes and the adjusted Gelbvieh phenotypes, we performed a second set of GWAAs using a mixed linear model with variance component estimates, as  6 Yearling weight genotype-by-environment (YW GxE) QTL. Manhattan plot with GEMMA -log 10 P-values. Lead and supporting SNPs for QTL represented at or above the blue line (P ≤ 1e-05; −log 10 P-values ≥ 5.00) for n = 10,837 U.S. Gelbvieh beef cattle. A summary of all markers passing the nominal significance threshold [31] is presented in Table 7 implemented by EMMAX [28][29][30][99][100][101]. Briefly, the general mixed model used in this approach can be specified as: y = Xβ + Zu + ϵ, where y represents a n × 1 vector of phenotypes, X is a n × q matrix of fixed effects, β is a q × 1 vector representing the coefficients of fixed effects, and Z is a n × t matrix relating the random effect to the phenotypes of interest [30,[99][100][101]. Herein, we must assume that Varð uÞ ¼ σ 2 g K and VarðϵÞ ¼ σ 2 e I , such that VarðyÞ ¼ σ 2 g ZK Z 0 þσ 2 e I , however, in this study Z represents the identity matrix I, and K represents a kinship matrix of all Gelbvieh samples with hard-called genotypes. Moreover, to solve the mixed model equations using a generalized least squares approach, we must estimate the variance components (σ 2 g and σ 2 e ) as previously described [28-30, 99, 100]. For this study, we estimated the variance components using the REML-based EMMA approach [29], with stratification accounted for and controlled using the genomic relationship matrix [25,30], as computed from the Gelbvieh hardcalled genotypes. Moreover, the only fixed-effect covariate specified for all GWAAs was an environmental variable (geographic zone for each individual). For all EMMAX GxE GWAAs utilizing hard-called genotypes, we used an implementation of EMMAX [29,102] where interaction-term covariates may be specified; with the environmental variable (geographic zone for each individual) specified as the interaction term. The basis of this approach is rooted in full versus reduced model regression [99], where interaction-term covariates are included in the model as follows: each specified interaction-term covariate serves as one reducedmodel covariate; each specified interaction-term covariate is also multiplied, element by element, with each SNP predictor (i.e., SNP × geographic zone) to create an interaction term to be included in the full model. Specifically, given n measurements of a Gelbvieh growth phenotype that is influenced by m fixed effects and n instances of one random effect, with one or more GxE effects (e) whereby the interaction is potentially with one predictor variable, we model this using a full and a reduced model. The full model can be specified as y = X c β kc + X i β ki + X k β kp + X ip β ip + u full + ϵ full , and the reduced model as y = X c β krc + X i β kri + X k β rkp + u reduced + ϵ reduced , where y is an n-vector of observed phenotypes, X c is an n × m matrix of m fixed-effect covariates, X i is an n × e matrix of e fixed terms being tested for GxE interactions, X k is an n-vector containing the covariate or predictor variable that may be interacting, and X ip is an n × e matrix containing the e interaction terms created by multiplying the columns of X i element-by-element with X k . All of the β terms correspond to the X terms as written above, and to the full or the reduced model, as specified, with u and ϵ representing the random effect and error terms, respectively. Like the EMMAX method without interactions [28,29], we approximate this by finding the variance components once, using the parts of the above equations that are independent of X k as follows: y = X c β cvc + X i β ivc + u vc + ϵ vc , where vc indicates the variance components. To estimate the variance components, we must again assume that Varð u vc Þ ¼ σ 2 g K and Varðϵ vc Þ ¼ σ 2 e I , such that VarðyÞ ¼ σ 2 g K þσ 2 e I . The EMMA technique can then be used to estimate the variance components σ 2 g and σ 2 e as well as a matrix B (and its inverse) such that BB 0 ¼ H ¼ VarðyÞ Thereafter, for every marker (k) we can compute (as an EMMAX-type approximation) the full and reduced models as: for the full model, where B −1 (u full + ϵ full ) is assumed to be an error term proportional to the identity matrix, and as B −1 X c β krc + B −1 X i β kri + B −1 X k β rkp + B −1 (u reduced + ϵ reduced ) for the reduced model, where B −1 (u reduced + ϵ reduced ) is assumed to be an error term proportional to the identity matrix. To estimate the significance of the full versus reduced model, an F-test was performed; with all analyses utilizing the EMMAX method [28,29] (i.e., GWAAs, GxE GWAAs) produced and further evaluated by constructing Manhattan plots within SVS v8.8.2 (Golden Helix, Bozeman, MT). Moreover, while SVS explicitly computes the full model mentioned above and outputs all of its β values, it only performs an optimization of the reduced model computation, which is sufficient to determine the SSE of the reduced-model equation, and thereafter, estimate the full versus reduced model P-value via F-test. This optimization is used to solve: MB −1 y = MB −1 X k β rkp + ϵ MB , where M = (I − QQ′), and Q is derived from performing the QR algorithm, as QR = B −1 [X c | X i ]. All Gelbvieh QTL were defined by ≥ 2 SNP loci with MAF ≥ 0.01 (i.e., a lead SNP plus at least one additional supporting SNP within 1 Mb) which also met a nominal significance threshold (P ≤ 1e-05) [30,31], and all EMMAX marker-based pseudo-heritability estimates were produced as previously described [28-30, 99, 100].
Additional file 3. Summary of SNP panels used in analyses, including number of SNPs and individuals available before and after filtering.