Identification of candidate genes and enriched biological functions for feed efficiency traits by integrating plasma metabolites and imputed whole genome sequence variants in beef cattle

Feed efficiency is one of the key determinants of beef industry profitability and sustainability. However, the cellular and molecular background behind feed efficiency is largely unknown. This study combines imputed whole genome DNA variants and 31 plasma metabolites to dissect genes and biological functions/processes that are associated with residual feed intake (RFI) and its component traits including daily dry matter intake (DMI), average daily gain (ADG), and metabolic body weight (MWT) in beef cattle. Regression analyses between feed efficiency traits and plasma metabolites in a population of 493 crossbred beef cattle identified 5 (L-valine, lysine, L-tyrosine, L-isoleucine, and L-leucine), 4 (lysine, L-lactic acid, L-tyrosine, and choline), 1 (citric acid), and 4 (L-glutamine, glycine, citric acid, and dimethyl sulfone) plasma metabolites associated with RFI, DMI, ADG, and MWT (P-value < 0.1), respectively. Combining the results of metabolome-genome wide association studies using 10,488,742 imputed SNPs, 40, 66, 15, and 40 unique candidate genes were identified as associated with RFI, DMI, ADG, and MWT (P-value < 1 × 10−5), respectively. These candidate genes were found to be involved in some key metabolic processes including metabolism of lipids, molecular transportation, cellular function and maintenance, cell morphology and biochemistry of small molecules. This study identified metabolites, candidate genes and enriched biological functions/processes associated with RFI and its component traits through the integrative analyses of metabolites with phenotypic traits and DNA variants. Our findings could enhance the understanding of biochemical mechanisms of feed efficiency traits and could lead to improvement of genomic prediction accuracy via incorporating metabolite data.


Background
Feeding-related costs are the major expense in beef cattle enterprises, representing 55 % -75 % of total production costs [1][2][3]. Reducing feed inputs per unit of production could significantly improve profitability by 9 to 33 % in beef production [4]. Additionally, with the projected increase of the global population to 9.6 billion by the year 2050, the growing demand for beef is likely to put more pressure on already limited production resources such as water, land, fertilizers and labor [5]. Moreover, studies have shown that more feed efficient beef cattle consume less feed for the same amount of beef produced, and meanwhile, have a reduced methane emission [6]. Therefore, improvements in feed efficiency of beef cattle can increase producer profitability and simultaneously lower the environmental footprint of beef production.
Residual feed intake (RFI) is an important indicator of feed efficiency, which is usually defined as the difference between an animal's actual daily dry matter intake (DMI) and the expected daily DMI given the animal's average daily gain (ADG) and metabolic body weight (MWT) [7]. Currently, measuring individual animal feed intake to calculate RFI is a complex and expensive process. Numerous studies in beef cattle have revealed moderate to high heritability estimates (0.16-0.68) for RFI [8][9][10][11], and thus make RFI suitable for genetic/genomic selection of efficient beef cattle. Over the decades, genome-wide association studies (GWAS) have detected thousands of single nucleotide polymorphisms (SNPs) and hundreds of candidate genes associated with RFI in beef cattle [12][13][14][15]. However, cellular and molecular functions associated with transcriptomic, metabolomic and proteomic levels of omic data, and detailed knowledge regarding the biological processes involved in feed efficiency still remain largely unknown. Metabolites are substrates or products of metabolic processes and are the results of combined endogenous and exogenous production [16], thus metabolites are considered as intermediate phenotypes between the genomic (base) and phenotypic (top) levels [16]. Integration of metabolomic data into feed efficiency studies could help reveal the relationship between animal genetics and physiological phenotypes (i.e. RFI and its component traits), thereby increasing the fundamental understanding of biological functions related to feed efficiency and improving genetic/genomic selection efficacy in beef cattle. Therefore, the objective of this study was to use metabolites as intermediate phenotypes to study genes and biological functions/processes related to feed efficiency in beef cattle. In this study, feed efficiency data were collected from a beef cattle population consisting of 493 crossbred bulls, heifers, and steers. Thirty-one metabolites and their concentration levels (µM) were quantified from plasma of these animals on the first day of feedlot tests. Linear regression models were applied to identify metabolites associated with RFI and its component traits (DMI, ADG, and MWT). Whole genome sequence variants were imputed and used in metabolome-genome wide association studies (mGWAS) to identify significant SNPs for trait associated metabolites. Candidate genes were mapped based on significant SNPs and gene functional enrichment analyses were subsequently performed on candidate genes of each trait to predict biological functions/processes associated with feed efficiency in beef cattle.

Associations between feed efficiency traits and metabolites
Of the 31 metabolites analyzed, 11 were found to be significantly associated with the feed efficiency traits (Pvalue < 0.1) and the results of regression analyses are shown in Table 1. Among the significantly associated metabolites with each trait, ten metabolites showed Pvalues less than 0.05, and four metabolites (choline for DMI, glycine, citric acid, and dimethyl sulfone for MWT) showed P-values ranging from 0.05 to 0.1 (0.09, 0.05, 0.06, and 0.09, respectively). At P-values less than 0.1, five metabolites, including L-valine, lysine, Ltyrosine, L-isoleucine, and L-leucine, were significantly associated with RFI, accounting for 5.90 % of the The unit of metabolite concentration is µM 3 The significance level of regression analysis is P-value < 0. phenotypic variance in RFI. Lysine, L-lactic acid, Ltyrosine, and choline were significantly associated with DMI, and these four metabolites accounted for 4.04 % of phenotypic variance in DMI. Of note, lysine and Ltyrosine were significantly associated with both RFI and DMI. Citric acid was the only metabolite that was significantly associated with ADG and accounted for 0.93 % of phenotypic variance in ADG. Four metabolites, Lglutamine, glycine, citric acid, and dimethyl sulfone, were significantly associated with MWT and accounted for 3.39 % of phenotypic variance of MWT.

Significant SNPs and candidate genes associated with metabolites
Heritability estimates of 11 metabolites associated with feed efficiency traits were calculated (Additional file 1: Table S1). However, these estimates had large standard errors that may result from the limited number of animals utilized in this study (n = 493). Thus, these estimates could be used as reference information and further study may be warranted. Metabolome-genome wide association studies were performed for the 11 metabolites associated with the feed efficiency traits. The range of P-value and allele substitution effect of significant SNPs, the range and average of proportion of metabolite phenotypic variance explained by each significant SNP, and the number of quantitative trait loci (QTLs) and candidate genes identified for each metabolite are summarized in Table 2. The details of the significant SNPs, including SNP position, allele substitution effect of each SNP, P-value and percentage of metabolite phenotypic variance explained by each SNP, for the 11 metabolites associated with feed efficiency traits are provided in Additional file 2. The candidate genes identified within a 140-kbp window of each significant SNP are shown in Additional file 3. In summary, 40, 66, 15 and 40 unique candidate genes were identified as related to RFI, DMI, ADG, and MWT, respectively (Table 3). Besides, 24 candidate genes were overlapped for RFI and DMI, 15 candidate genes were overlapped for ADG and MWT and 1 gene was common between DMI and MWT (Additional file 1: Table  S2 and Additional file 4: Fig. S1).

Significantly enriched biological functions and gene networks for feed efficiency traits
Of the 40, 66, 15, and 40 unique candidate genes, 39, 65, 15, and 39 genes for RFI, DMI, ADG, and MWT were mapped to the IPA database for functional enrichment analyses, respectively. In summary, 24, 25, 18, and 28 significant cellular and molecular functions were identified for RFI, DMI, ADG, and MWT (P-value < 0.05), respectively as presented in Additional file 1: Table S3-S6. The top five enriched cellular and molecular functions with corresponding candidate genes for each feed efficiency trait are shown in Table 4. Of the top five enriched cellular and molecular functions, lipid metabolism was the biological function with the lowest P-value for DMI and also significantly associated with RFI and MWT (Additional file 1: Table S4 and Table S7). Molecular transport was one of the top five biological The unit of metabolite concentration is µM 2 The P-value range (minimum to maximum) of significant SNPs, the significance level is P-value < 1 × 10 −5 3 β range the range of allele substitution effect of each significant SNP 4 V SNP /V P range the range of metabolite phenotypic variance explained by each significant SNP (%) 5 V SNP /V P mean the average of metabolite phenotypic variance explained by each significant SNP (%) 6 No. of QTLs the number of QTLs identified for each metabolite 7 No. of genes the number of candidate genes identified for each metabolite functions associated with DMI, ADG, and MWT. Small molecule biochemistry and nucleic acid metabolism were two top biological functions associated with both DMI and MWT. Among all significant biological functions, 15 biological functions were common for all four feed efficiency traits, and other biological functions shared among different feed efficiency traits are shown in Additional file 1: Table S7 and Additional file 4: Fig. S2. Additionally, in order to gain insight into potentially important biological functions, gene networks of lipid metabolism and carbohydrate metabolism were investigated and constructed through IPA. Within the lipid metabolism function for DMI, 16 candidate genes (ACACB, ADGRF5, ALDH3B1, AQP9, CCDC80, CHKA, CPT1A, DAB1, DDX5, HNF1A, IGHMBP2, LRP5, NOS1, PLSCR1, PVALB, and SSPN) were involved (Fig. 1). The lipid metabolism included nine subfunctions which were concentration of fatty acid, concentration of lipid, concentration of phosphatidylcholine, concentration of triacylglycerol, fatty acid metabolism, cholesterol metabolism, synthesis of fatty acid, synthesis of lipid, and transport of lipid (Fig. 1). Interestingly, seven genes (ACACB, AQP9, CCDC80, CHKA, CPT1A, HNF1A, and LRP5) involved in the lipid metabolism were also involved in the carbohydrate metabolism for DMI, which engaged three subfunctions including oxidation of Dglucose, concentration of D-glucose, and quantity of carbohydrate (Fig. 2).

Discussion
The role of metabolites in variation of feed efficiency traits Variation in RFI and its component traits could represent differences among animals in terms of metabolic process activity. For example, a study has shown that low RFI steers tend to have more efficient metabolic process activity and are able to meet their maintenance requirement with less energy intake than high RFI steers [17]. Blood is the major highway for absorption and transportation of nutrients to the different organs and tissues, and metabolites carried by blood are directly involved in metabolic processes as substrates or products, making blood metabolites prime candidates for further studies of feed efficiency in beef cattle. Additionally, some blood metabolites have the potential to serve as biomarkers for selection of efficient beef cattle [18,19].
In this study, 5 (L-valine, lysine, L-tyrosine, Lisoleucine, and L-leucine), 4 (lysine, L-lactic acid, Ltyrosine, and choline), 1 (citric acid) and 4 (L-glutamine, glycine, citric acid, and dimethyl sulfone) plasma metabolites were identified to be associated with RFI, DMI, ADG, and MWT, respectively (Table 1). Individual metabolites accounted for 0.59-1.50 % of the total phenotypic variance of RFI and its component traits. The results suggest that the feed efficiency traits could be associated with many metabolites with small effects. However, the identified metabolites associated with the feed efficiency traits in this study may require validation in independent beef cattle populations especially as a more relaxed threshold (P-value < 0.1) was used. Furthermore, we would like to highlight that only 31 metabolites were detected by the targeted method of NMR used in the current study. We therefore recommend that metabolomic profiles with more metabolites should be investigated in future with larger samples in order to identify more metabolites that are associated with RFI or its component traits.
To date, several metabolomic studies have attempted to identify relationships between serum or plasma The unit of metabolite concentration is µM metabolite levels and RFI in beef cattle [18,20,21]. We found good agreement between the results from those studies and the current study. In the current study, valine, lysine, tyrosine, and leucine showed higher concentrations in beef cattle with high RFI than those with low RFI. In line with our results, Karisa et al. [18] and Foroutan et al. [20] observed higher concentrations of valine, lysine, and tyrosine in beef cattle with high RFI as compared to those with low RFI. Similarly, Jorge-Smeding et al. reported that concentrations of valine and lysine were decreased in heifers with low RFI [21]. Additionally, Foroutan et al. reported the concentration of leucine was higher in high-RFI beef cattle [20], which is consistent with our results. The consistency of results from different studies suggests that these metabolites have the potential to be used as biomarkers for feed efficiency.
It is worth noting that, the three metabolites (isoleucine, leucine, and valine) associated with RFI are three essential branched-chain amino acids. These three metabolites share the first enzymatic steps in their oxidative pathways, including a reversible transamination followed by an irreversible oxidative decarboxylation to coenzyme-A derivatives [22]. The respective oxidative pathways subsequently diverge and at the final steps yield acetyl-and/or propionyl-CoA that enter the tricarboxylic acid cycle (TCA cycle) [22]. For animals, the TCA cycle is the main energy producing (mainly from carbohydrates and fatty acids) metabolic pathway [23], and some of the processes of the TCA cycle pathway have been previously reported to be associated with feed efficiency in beef cattle [18] and pigs [24]. Additionally, in this study, citric acid was the only metabolite that was significantly associated with ADG and was overlapping for ADG and MWT. Citric acid is an important intermediate in the TCA cycle [23] indicating a potential relationship between the TCA cycle related metabolic processes and feed efficiency traits. Interestingly, two other metabolites (lysine and Ltyrosine) were identified as associated with both RFI and DMI in this study. In the current study, we observed that The P-value range (minimum to maximum) of significant biological functions, the significance level is P-value < 0.05 the concentrations of lysine and L-tyrosine were significantly positively correlated (r = 0.29, P-value < 0.001). A previous study reported a higher positive correlation (r > 0.75, P-value < 0.001) between lysine and tyrosine [24]. The association of lysine and L-tyrosine with both RFI and DMI could be due to the significant positive correlation between lysine and tyrosine and the fact that RFI has a high and positive genetic correlation with DMI (r g = 0.66±0.11 to 0.75±0.10) [11,25]. Furthermore, lysine and tyrosine were reported as important amino acids involved in some important metabolic processes in beef cattle, such as amino acid metabolism and urea cycle [21], further supporting them as potential biomarkers for feed efficiency traits.
Candidate genes, enriched molecular functions and gene networks for feed efficiency traits In this study, we identified 40, 66, 15, and 40 unique candidate genes as related to RFI, DMI, ADG, and MWT respectively via integrative analyses of regression analyses and mGWAS (Table 3). In a previous study, Zhang et al. performed GWAS based on imputed whole genome sequence variants for RFI, DMI, ADG, and MWT using 7,500 beef cattle and reported 596, 268, 179, and 532 candidate genes for RFI, DMI, ADG, and MWT, respectively [12]. Comparing their results with those in this study, we found 10, 23, 6, and 7 candidate genes in common between the two studies for RFI, DMI, ADG, and MWT, respectively (Additional file 1: Table  S8). These overlapping genes indicated that metabolites are potentially important intermediate phenotypes between candidate genes and feed efficiency traits. Additionally, results from our study provide more knowledge and better understanding of how the previously identified candidate genes exert their influence on the variability of RFI and its component traits via intermediate phenotype metabolites. For instance, Zhang et al. reported that some genes were associated with more than one trait such as, ADGRF1 and ADGRF5 which were associated with both RFI and DMI [12]. However, the potential mechanism of how these genes could influence the two traits remained unclear. According to the results of the current study, these two genes were both associated with L-tyrosine as a common metabolite which was associated with RFI and DMI (Table 3). Similarly, according to Zhang et al., SLC28A3 was associated with ADG and MWT [12], and our results showed this gene was associated with citric acid as a common metabolite which was associated with ADG and MWT (Table 3). Interestingly, Zhang et al. identified ADGRF1, ADGRF5, GTPBP8, and NEPRO as associated with both RFI and DMI [12] and the same genes for RFI and DMI were identified in the current study. However, the results of this study indicated that the molecular background of these associations might be different. L-tyrosine might explain the associations of ADGRF1, ADGRF5 with RFI and DMI, because we identified that ADGRF1 and ADGRF5 were associated with L-tyrosine which was a metabolite associated with both RFI and DMI. As for GTPBP8 and NEPRO, both genes were associated with another common metabolite called lysine that was identified to be associated with both RFI and DMI in the current study. Additionally, we observed that certain genes might be associated with the same feed efficiency trait through different metabolites. For example, SHROOM3 was associated with L-valine and L-leucine and these two metabolites were associated with RFI (Table 3). Our study also showed that certain genes could be associated with different feed efficiency traits through different metabolites. For example, AQP9 was associated with DMI and MWT through L-lactic acid and glycine, respectively (Table 3). Therefore, our integrative analyses of feed efficiency traits, metabolites, and whole genome sequence variants will enhance our understanding on genetic influence of feed efficiency traits in beef cattle. Some candidate genes identified for feed efficiency traits in the current study have been reported in our previous transcriptomic studies involving animals related to those used in the current study [26,27]. For instance, CCDC80 was reported as a differentially expressed gene between beef steers with divergent RFI [26]. Additionally, CCDC80, CUX2, and ALDH3B1 were differentially expressed in the liver of beef steers for DMI, and SERPINE3 was a differentially expressed gene for ADG [27]. Our current study identified the same genes associated with these traits through integrating metabolites (Table 3). Indeed, CCDC80, CUX2, ALDH3B1, and SERPINE3 were associated with lysine, L-lactic acid, choline, and citric acid, respectively. Therefore, our results potentially provide further insight into how these differentially expressed genes affect the feed efficiency traits in beef cattle. It is worth noting that CUX2 has also been reported to be associated with DMI in the American [13] and Canadian beef population [12]. Therefore, these genes identified as associated with the same feed efficiency traits using genomic, transcriptomic and metabolomic data suggest the importance of these genes in influencing feed efficiency traits in beef cattle. Furthermore, some differentially expressed genes may affect RFI by influencing metabolites associated with its component traits (DMI, ADG, and MWT). For example, TCIRG1, AMN, and AQP9 were reported as differentially expressed genes in high-and low-RFI beef cattle [28,29] and these three genes were identified to be respectively associated with DMI, ADG, and MWT through different metabolites in this study.
Identification of enriched molecular processes, pathways and gene networks associated with feed efficiency traits using candidate genes from these different omics studies shed some light on underlying biological mechanism and gene interactions for complex traits. For the five topmost biological functions associated with RFI in the current study, cellular assembly and organization, cell morphology, cellular function and maintenance, and molecular transport were four biological functions that overlapped with the five topmost biological functions reported by Zhang et al. for RFI [12]. Lipid metabolism, small molecule biochemistry, and nucleic acid metabolism were three common top biological functions for DMI in the two studies. Lipid metabolism and small molecule biochemistry were also identified as two of the five topmost biological functions in our previous transcriptomic study for DMI in beef cattle [27]. Molecular transport, small molecule biochemistry, and cell morphology were three overlapping top biological functions for MWT in Zhang et al. [12] and in this study. These three biological functions were also top biological functions for MWT in our previous transcriptomic study [27]. For ADG, cell-to-cell signaling and interaction was a common top biological functions in Zhang et al. [12], Mukiibi et al. [27] and in the current study. Our results and those reported by previous studies indicated the overlapping top five biological functions have a potentially important relationship with feed efficiency traits in beef cattle. These important functions could further help to prioritize candidate genes and related functional SNPs associated with phenotypes.
Additionally, we would like to note that attention should be paid to nutrient or energy metabolic processes, such as lipid metabolism, since several studies have reported its potential role in feed efficiency related to DMI and RFI [11,12,26,27,[29][30][31][32][33][34]. Nkrumah et al. [34] and Mao et al. [11] reported that more efficient beef cattle tended to have less backfat and slightly less marbling. Transcriptomic studies reported that more efficient beef cattle were associated with differentially expressed genes related to reducing lipid metabolism in liver [26,27], implying an important relationship between lipid metabolism and feed efficiency. Weber et al. identified differentially expressed genes in multiple tissues (pituitary, skeletal muscle, liver, visceral adipose, and duodenum) of beef cattle with divergent RFI, and their pathway analyses showed that many of the differentially expressed genes were involved in the immune system and fat metabolism [29]. In this study, lipid metabolism was the most significant biological functions for DMI and also significantly associated with RFI and MWT. Lipid metabolism was identified as one of the top biological functions for ADG in previous studies [12,27] but it was not shown in the current study, which is likely due to limitations of relatively small number of metabolites analyzed. In addition, of the candidate genes identified for the metabolites, there is limited knowledge on how candidate genes influence the respective plasma metabolite levels. For instance, enzyme choline kinase alpha is encoded by CHKA [35]. In the biosynthesis pathway of phosphatidylcholine, the enzyme can catalyze the phosphorylation of choline to phosphocholine [36,37]. However, little is known on how concentrations of choline vary among animals due to their gene variants. Nevertheless, our integrative study of feed efficiency, blood metabolites, and DNA variants has provided additional insight into relationships between gene functionalities, metabolites, and feed efficiency traits, which may help develop strategies to enhance genomic prediction of feed efficiency traits with incorporation of metabolite data.

Conclusions
This study combined genomic, metabolomic and phenotypic data to investigate molecules and biological functions/processes related to feed efficiency in beef cattle. Several plasma metabolites associated with RFI and its component traits were identified, and some of metabolites showed the potential to serve as biomarkers for feed efficiency in beef cattle. Multiple candidate genes were identified as associated with RFI and its component traits based on the results of regression analyses between feed efficiency traits and metabolites, and mGWAS. Gene functional enrichment analyses indicated that lipid metabolism may have an important role in feed efficiency. Our findings showed good consistency with previous metabolomic studies and GWAS studies for feed efficiency and also added more information regarding biological mechanisms of feed efficiency. Therefore, this integrative method could enhance the understanding of genetic influence, metabolites and biological functions/processes involved in feed efficiency traits, which could lead to improvement of genomic prediction accuracy via incorporating metabolite data.

Methods
Animal population, data collection of feed efficiency traits and metabolites All animals in this study were cared for according to the guidelines of the Canadian Council on Animal Care (1993). The population of animals was obtained from the Phenomic Gap Project that aimed to generate phenotypes of feed efficiency, carcass and meat quality as well as genomic data for Canadian crossbred beef animals [38]. Details of animal management, the herd, and animal breeds were previously described [12,[39][40][41]. In summary, the population used in this study consisted of 493 crossbred bulls (n = 93), heifers (n = 125) and steers (n = 275) that were born between 2002 and 2011. These animals were from five different commercial herds and they were tested in feedlots from 2003 to 2012 [38]. The major breed components were primarily Charolais (n = 73), Hereford-Angus crosses (n = 191) and a Beefbooster composite breed (predominantly Charolais-based, n = 229). The GrowSafe system (GrowSafe Systems Ltd., Airdrie, Alberta, Canada) was used to measure the feed intake of finishing calves at the feeding test station for a period of 76 to 112 days. Serial body weights (BW) in kg were measured for each animal at the beginning and end of the test and at approximately 14-day intervals during the test. The daily DMI in kg was calculated as an average of dry matter intake over the test period and further standardized according to the energy content of the diet. The initial BW in kg at the start of the feeding test and the ADG in kg were derived from a linear regression of the serial BW measurements against time (day) [11,12,34,42]. The MWT in kg was calculated as midpoint BW 0.75 while the midpoint BW was computed as the sum of the initial BW in kg and the product of ADG multiplied by half of the days on test [11,12,34,42]. The RFI in kg of DMI per day was computed as the difference between the standardized daily DMI and the expected DMI that was predicted based on animal ADG and MWT [7]. Blood samples were collected from all animals by jugular venipuncture in the early morning on the first day of feedlot tests and immediately frozen at -80°C for storage. These blood samples were used to quantify metabolites using nuclear magnetic resonance (NMR) spectroscopy. The procedure of metabolite quantification using NMR was previously described by Li et al. [39]. Thirty-one metabolites and their concentration levels (µM) were quantified from plasma. Blood samples were also used to extract DNA for genotyping using the Illumina BovineSNP50 v2 BeadChip (Illumina Inc., CA, USA).

SNP genotype imputation, quality control and population admixture analyses
Theoretically, a higher marker density could improve the power of GWAS to identify significant SNPs, therefore, the 50 K genotypes were imputed to whole genome sequence variants using Beagle 5.1 software [43]. The SNP imputation for animals used in this study was completed using a step-wise approach as described by Zhang et al. [12] and Wang et al. [44] based on the latest genome assembly ARS-UCD 1.2. After the imputation, 53,258,178 SNPs and indels (they are all termed SNPs for simplicity) on 29 autosomes were obtained. Quality control for imputed whole genome sequence variants was performed to exclude DNA variants based on the following criteria: SNPs on 29 autosomes that had an imputation accuracy < 0.95, minor allele frequency < 0.05, and failed to pass the Hardy-Weinberg equilibrium test (P-value < 0.0001). Finally, a total of 10,488,742 SNPs remained after quality control and were used in further analyses.
Breed composition of each animal was predicted based on the 50 K genotypes using ADMIXTURE software to account for population stratification [45,46]. In order to find the best possible number of ancestors or breeds (K value), a 5-fold cross-validation procedure was performed as described in Zhang et al. [12]. The breed composition prediction had the smallest cross-validation error when the value of K = 6. The most accurate breed composition was then obtained for each individual and presented in Additional file 1: Table S9.
Data consolidation, quality control for feed efficiency traits and metabolites The variation in feed efficiency traits and metabolites could be affected by multiple systematic effects. A linear regression model implemented in R statistical software was used to assess the significant systematic effects that were associated with feed efficiency traits or metabolites. Animal type (bull, heifer, steer), birth year, herd, feedlot pen, age at the feedlot test, and breeding composition were found to be the significantly associated factors for both the feed efficiency traits and metabolites (P-value < 0.05). Therefore, phenotypic values of the feed efficiency traits and metabolites were pre-adjusted for the above factors using liner regression models. Residuals with more or less than 3 standard deviations from the mean of residuals were considered as outliers and were excluded. Additional file 1: Table S10 provides descriptive statistics of phenotypic data on the feed efficiency traits and metabolites.

Regression analyses between feed efficiency traits and metabolites and metabolome-genome wide association studies
After quality control and pre-adjustment of phenotypic data, regression analyses were conducted to identify associations between four feed efficiency traits and thirtyone metabolites using R statistical software. A feed efficiency trait and a metabolite were considered to be significantly associated when a P-value < 0.1 of the regression analyses was observed. This step was intended to determine the relationship between feed efficiency traits and metabolites. The mGWAS (metabolome-genome wide association studies) were performed for metabolites that were significantly associated with the feed efficiency traits using the mlma (mixed linear model association) option as implemented in the GCTA package [47] based on the following linear mixed model: where y ij is the adjusted metabolite value of the ith animal with the jth SNP (i.e. the ijth animal), b j is the allele substitution effect of the jth SNP, x ij is the jth SNP genotype of animal i coded as 0, 1, 2 for genotypes A 1 A 1 , A 1 A 2 , and A 2 A 2 , respectively, a ij is the additive polygenic effect of the ijth animalÑ 0; Gσ 2 a À Á , and e ij is the random residual effectÑ 0; Iσ 2 e À Á . The genomic relationship matrix G was derived based on total filtered SNP markers (i.e. 10,488,742 SNPs) as described by Yang et al. [48], which is essentially the same as the second VanRaden formulation [49]. The same G matrix was used to estimate variance components and heritability of metabolites via restricted maximum likelihood (REML) as implemented in the GCTA package.
The SNPs with P-value < 1 × 10 −5 were classified to be significantly associated with the metabolite according to the recommendation of The Wellcome Trust Case Control Consortium [50]. The phenotypic variance of the metabolite explained by each significant SNP was calculated by 2pqβ 2 S 2 Ã100%, where p and q denote the SNP allele frequency of A 1 and A 2 , respectively; β is the SNP allele substitution effect that was estimated by generalized least square and the significance of SNP allele substitution effect was conducted via a generalized least square F-test as implemented in the GCTA package; 2pqβ 2 is the additive variance of the SNP, and S 2 is the phenotypic variance of the metabolite.

Identification of candidate genes and functional enrichment analyses for feed efficiency traits
To identify candidate genes for concentration of each metabolite, a 140-kbp window (70-kbp upstream and 70-kbp downstream) of each significant SNP was surveyed based on SNP annotation information from ARS-UCD 1.2 bovine genome assembly from the Ensembl BioMart database (accessed on 02 February, 2021). The 70-kbp was the chromosomal length within which a high linkage disequilibrium phase correlation (r 2 > 0.77) was maintained across a sample of Canadian beef cattle breeds [51]. Small nucleolar RNA and micro-RNA were excluded because we are interested in protein coding genes. Then candidate genes (Entrez gene IDs) of all metabolites that were associated with the feed efficiency traits (RFI, DMI, ADG, or MWT) as identified in the regression analyses were combined and imported into the Ingenuity Pathway Analysis software (accessed on 02 February, 2021) (IPA; www.Ingenuity.com) to predict the enriched biological functions and gene networks for feed efficiency traits. Biological functions were considered significantly enriched if the P-value for the overlap comparison test between the input gene list and the knowledge base of IPA for a given biological function was less than 0.05. In order to provide insight into cellular and molecular functions associated with feed efficiency traits, gene networks for some significant biological functions were constructed in IPA.
Additional file 1: Table S1. Additive genetic variances and heritability estimates for 11 metabolites based on the imputed whole genome sequence (WGS) variants in a beef cattle multibreed population; Table  S2. Uniquely common candidate genes for feed efficiency traits in a beef cattle multibreed population; Table S3. Enriched biological functions significantly associated with RFI in a beef cattle multibreed population; Table S4. Enriched biological functions significantly associated with DMI in a beef cattle multibreed population; Table S5. Enriched biological functions significantly associated with ADG in a beef cattle multibreed population; Table S6. Enriched biological functions significantly associated with MWT in a beef cattle multibreed population; Table S7. Uniquely common biological functions for feed efficiency traits in a beef cattle multibreed population; Table S8. The comparison of candidate genes between the current study and Zhang et al; Table S9. Genomic breed composition for animals; Table S10. Descriptive statistics of phenotypic data on feed efficiency traits and metabolites.
Additional file 2. This file contains a summary of SNPs significantly associated with the 11 metabolites.
Additional file 3. This file contains a summary of candidate genes for the 11 metabolites.
Additional file 4: Figure S1. Uniquely common candidate genes for feed efficiency traits in a beef cattle multibreed population; Figure S2.