Skip to main content

Transcriptome-based prediction for polygenic traits in rice using different gene subsets

Abstract

Background

Transcriptome-based prediction of complex phenotypes is a relatively new statistical method that links genetic variation to phenotypic variation. The selection of large-effect genes based on a priori biological knowledge is beneficial for predicting oligogenic traits; however, such a simple gene selection method is not applicable to polygenic traits because causal genes or large-effect loci are often unknown. Here, we used several gene-level features and tested whether it was possible to select a gene subset that resulted in better predictive ability than using all genes for predicting a polygenic trait.

Results

Using the phenotypic values of shoot and root traits and transcript abundances in leaves and roots of 57 rice accessions, we evaluated the predictive abilities of the transcriptome-based prediction models. Leaf transcripts predicted shoot phenotypes, such as plant height, more accurately than root transcripts, whereas root transcripts predicted root phenotypes, such as crown root length, more accurately than leaf transcripts. Furthermore, we used the following three features to train the prediction model: (1) tissue specificity of the transcripts, (2) ontology annotations, and (3) co-expression modules for selecting gene subsets. Although models trained by a gene subset often resulted in lower predictive abilities than the model trained by all genes, some gene subsets showed improved predictive ability. For example, using genes expressed in roots but not in leaves, the predictive ability for crown root diameter was improved by more than 10% (R2 = 0.59 when using all genes; R2 = 0.66, using 1,554 root-specifically expressed genes). Similarly, genes annotated as “gibberellic acid sensitivity” showed higher predictive ability than using all genes for root dry weight.

Conclusions

Our results highlight both the possibility and difficulty of selecting an appropriate gene subset to predict polygenic traits from transcript abundance, given the current biological knowledge and information. Further integration of multiple sources of information, as well as improvements in gene characterization, may enable the selection of an optimal gene set for the prediction of polygenic phenotypes.

Peer Review reports

Background

Genotype-to-phenotype prediction is a major challenge in genetics, particularly for polygenic traits controlled by many small-effect genetic variants. Genomic prediction [1] is one of the most widely used statistical methods for predicting polygenic traits. Genomic prediction has successfully predicted genetic variations in different phenotypes of many species, as summarized in previous review articles [2, 3]. Popular statistical models in genomic prediction, such as GBLUP or Bayesian alphabet models [4], simply assume the additive effects of genetic variants. Although dominance or epistatic effects can be included in the statistical model, in addition to additive effects [5], improvements in predictive ability depend on the prediction target trait [6, 7] and are not always large.

Transcriptome-based prediction is another statistical approach that may be able to exploit the non-additive and interaction effects of causal genes to predict complex traits, as transcript abundance is often determined by complex cis and trans regulations caused by multiple DNA variants [8]. Although transcriptome-based predictions have not been investigated as extensively as genomic predictions, promising predictive abilities have been reported for multiple crop species. Starting from an early study using expression data quantified by microarrays for 21 maize parental inbred lines [9], the transcriptome-based prediction has been evaluated for the prediction of hybrid performance, not only in maize [10, 11] but also in rice [12] and oilseed rape [13]. Recent studies have applied transcriptome-based predictions to inbred lines, showing improved predictive ability using transcripts in addition to DNA variants [14,15,16]. While these studies in crop science mainly used linear models for prediction, machine learning algorithms have also been widely applied to predict phenotype from transcripts, particularly in systems biology or medical science [17]. Moreover, use of transcripts enables to borrow information from different species. For example, predictive ability for nitrogen use efficiency among maize cultivars was improved when using (homologous) genes that responded to nitrogen treatments not only in maize but also in Arabidopsis [18]. In summary, transcriptome is not just a new omics data to predict complex traits, but also may enable us to exploit available biological and/or evolutional features of genes to improve predictive ability.

When causal variants or genes are known a priori for the prediction target phenotype, the predictive ability can be improved by including their effects in the prediction model. In genomic prediction, predictive ability can often be improved by using large-effect variants as fixed covariates in the prediction model [19, 20]. Similarly, in transcriptome-based predictions, the predictive abilities of tocochromanols in maize grains were improved by selectively using the transcript abundance of some causal genes in addition to genome-wide DNA variants [21].

In contrast to oligogenic traits, for which causal or large-effect variants are easily identifiable, large-effect variants or causal genes are often unknown or do not exist for polygenic traits. Therefore, we need a different approach to select a subset of DNA variants or genes. For genomic prediction, studies have proposed the use of genomic features to select a set of DNA variants. For example, genomic prediction models using SNPs in exons, gene-coding regions, and intergenic regions were built for the prediction of quantitative traits in chickens, showing that the best subset of SNPs depends on the prediction target trait [22]. Similarly, a previous study divided SNPs into two or three subsets according to their proximity to genes, recombination rate, chromatin openness, minor allele frequency, and genomic evolutionary rate profiling score to predict maize agronomic traits [7]. However, gene subset selection in transcriptome-based prediction of polygenic traits has not been well studied. So far, a recent study in black poplar compared the predictive abilities of transcriptome-based prediction using core and peripheral genes in a co-expression network for various polygenic traits, showing lower predictive abilities of core genes compared with those of peripheral genes [23], and a subsequent study showed the importance of trans-eQTLs compared to cis-eQTLs for predictive abilities [16]. These studies imply that the predictive ability of transcriptome-based prediction can be improved by selecting an appropriate gene subset. However, at present, gene subset selection methods for transcriptome-based predictions have not been comprehensively investigated.

We used a set of rice accessions with high genetic diversity to investigate the methods for selecting gene subsets for transcriptome-based predictions. As rice is one of the most intensively studied crop species, multiple types of ontologies have been developed, including gene ontology (GO), trait ontology (TO), and plant ontology (PO), which are available at RAP-DB (https://rapdb.dna.affrc.go.jp/index.html) [24] and Oryzabase (https://shigen.nig.ac.jp/rice/oryzabase/) [25]. These ontology annotations enabled us to define gene subsets and test their predictive abilities in transcriptome-based predictions. To facilitate genetic studies, the World Rice Core Collection (WRC) was developed by the Genebank of the National Agricultural and Food Research Organization of Japan [26]. WRC accessions were recently resequenced [27], and their phenotypes and transcript abundances were quantified under an upland field condition [28]. Thus, three types of data–genome, transcriptome, and phenotype–are available for the WRC panel. Although we conducted an association study using this dataset [29], neither genomic prediction nor transcriptome-based prediction has been evaluated using this dataset to date. As the transcript abundance was quantified in both leaves and roots, we could characterize genes by the tissue specificity of their transcripts. Some genes were specifically expressed in leaves or roots, while others were commonly expressed in both tissues. Because our phenotype data include both root and shoot phenotypes, we can test, for example, whether root-specifically expressed genes show higher predictive ability for root phenotypes. Therefore, our WRC dataset provides a good opportunity to evaluate the performance of transcriptome-based prediction using multiple gene-level features.

In this study, using the transcriptome data for leaves and roots quantified in WRC accessions, we defined gene subsets based on three gene-level features: tissue specificity of transcripts, ontology annotation, and co-expression modules. The predictive abilities of transcriptome-based prediction using different gene subsets were evaluated for both shoot and root phenotypes, enabling us to discover some gene subsets that showed higher predictive ability than using all genes for a prediction target phenotype.

Methods

Phenotype, genotype, and transcriptome datasets

All the datasets used in this study have been published in our previous studies [27,28,29], and the details of each experiment are described in the respective studies. Below, we provide some essential properties of the datasets that are directly related to this study.

All statistical analyses were performed on the 57 rice accessions from the WRC (Supplementary Table S1). These accessions were evaluated in an upland field at the Institute of Crop Science, National Agriculture and Food Research Organization (Ibaraki, Japan; 36.0289 °N, 140.0997 °E) from June 5th to August 1st in 2018, including phenotyping and tissue sampling to quantify transcript abundance using RNA-seq. We used three aboveground phenotypes: plant height (PH), tiller number (TN), and shoot dry weight (SDW) and six root phenotypes: crown root length (RL_C), lateral root length (RL_L), crown root diameter (RD_C), lateral root diameter (RD_L), root dry weight (RDW), and ratio of deep rooting (RDR). Details of the phenotyping methods have been described in our previous studies [28,29,30], and the phenotypic values are publicly available as a supplementary file in the previous study [28].

The transcriptome data acquisition and processing pipeline have been described in our previous studies [28, 29]. For each accession, leaves and root tips were sampled from a plant in each of the three-block replicates in the field, and equal amounts of the three RNA samples were pooled before RNA-seq library preparation. Read count matrices of 55,986 genes in the leaves and roots were generated for the 57 accessions. For each read count matrix, fragments per kilobase of exon per million read (FPKM) values were calculated based on the trimmed mean of M value normalization in the {edgeR} package version 3.38.1 [31]. Following the calculation pipeline in our previous studies [28, 29], log2(FPKM + 1) value was calculated and defined as the expression level, and a gene was regarded to be expressed in a sample if the expression level was equal to or greater than 1. In this study, we removed genes expressed in none of the 57 accessions and retained 21,146 and 22,338 genes in the leaves and roots, respectively.

The genotype dataset used in this study is identical to that used in a previous association study [29]. The original sequence data of the 57 WRC accessions were obtained in a resequencing study [27]. We further filtered the variants based on minor allele frequency and pruned them based on pairwise linkage disequilibrium, retaining 427,751 SNPs for the 57 accessions to perform statistical analyses [29]. Because filtering and pruning sometimes affect the predictive ability of the genomic prediction model, we first tested the predictive abilities of genomic prediction using different sets of SNPs (with/without indels, with/without filtering, and with/without pruning) and confirmed that the filtered and pruned sets of 427,751 SNPs resulted in the best or close to the best predictive abilities for each of the nine phenotypes (Supplementary Figure S1).

Transcriptome-based prediction: baseline models and validation method

Our first prediction analysis compared the genomic prediction model (model abbreviation: Gonly) with transcriptome-based prediction models using all genes expressed in leaves and/or roots. A genomic relationship matrix (GRM) was calculated using the 427,751 SNPs based on the VanRaden’s first formula [32] by the A.mat function in the {rrBLUP} package version 4.6.3 [33]. Similarly, transcriptome-based relationship matrices (TRMs) were calculated using the linear kernel, as described in previous studies [15, 21], using standardized (i.e., for each gene, log2(FPKM + 1) values among the 57 lines were scaled to mean = 0 and variance = 1) expression matrices of the 21,146 and 22,338 genes expressed in leaves and roots, respectively. Either (model abbreviation: “Lall” or “Rall”) and both (model abbreviation: “Lall+Rall”) of the two TRMs calculated on the leaf and root expression data were used in the transcriptome-based prediction model. For the above three models, models with GRM (“G+” prefix is added in the model abbreviation) and without GRM were evaluated using different sets of TRMs, resulting in the six prediction models as summarized in Table 1. The effects of the four subpopulations (indica, aus, japonica, and admixed) were included in all the models as fixed effects. All prediction analyses, including those described in the following sections, were performed using the {EMMREML} package version 3.1 [34].

Table 1 Summary of the seven baseline prediction models

The predictive ability was defined using the R2 statistic (the coefficient of determination, which is the squared correlation between the phenotypic and predicted values) evaluated by leave-one-out cross-validation (shown as our main results) and 10-fold cross-validation repeated 10 times (to evaluate stability of the predictive ability caused by random splitting of training and test sets; shown in supplementary files). When the correlation between the phenotypic and predicted values was negative, R2 was set to zero. This evaluation method was applied to all the prediction analyses, including those described below.

Gene subset selection based on tissue specificity of transcript abundance

Some genes are commonly expressed in both leaves and roots, whereas others are specifically expressed in the leaves or roots. Hypothesizing that the genetic variation of belowground phenotypes could be mainly explained by the genes specifically expressed in roots and vice versa for the aboveground phenotypes, we evaluated the predictive abilities of the transcriptome-based prediction models using tissue-specific or commonly expressed gene subsets.

Of the 21,146 genes expressed in at least one of the 57 accessions in the leaves, 11,604 genes were expressed in all 57 accessions, and 812 of them were not expressed in any of the 57 accessions in the roots (i.e., leaf-specific). Similarly, of the 22,338 genes expressed in at least one of the 57 accessions in roots, 13,647 genes were expressed in all 57 accessions, and 1,554 of them were not expressed in any of the 57 accessions in leaves (i.e., root-specific). Meanwhile, there were 9,460 genes expressed in all 57 accessions in both leaves and roots (i.e., commonly expressed). Using the above three subsets of genes, we built four transcriptome-based prediction models using the TRM calculated based on (1) the expression level of the 9,460 commonly expressed genes in leaves (Lcmn), (2) the expression level of the 812 leaf-specifically expressed genes in leaves (Llsp), (3) the expression level of the 9,460 commonly expressed genes in roots (Rcmn), and (4) the expression level of the 1,554 root-specifically expressed genes in roots (Rrsp). For each of the four TRMs, we evaluated transcriptome-based predictions with and without GRM. Therefore, a total of eight transcriptome-based prediction models were newly evaluated in this analysis (Table 2), and the predictive abilities of those eight models were compared with the predictive abilities of the four corresponding models using all genes (Lall, G + Lall, Rall, and G + Rall) described in the previous section.

Table 2 Summary of the transcriptome-based prediction models using gene subset based on the expressed tissue

Gene subset selection based on ontology annotation

Gene annotation enables the extraction of a set of genes involved in a specific biological pathway or localized to a specific cellular component. Suppose that the predicted target phenotype has a strong biological relationship with one or more annotation terms. Then the predictive ability may be improved by selectively using these genes in the transcriptome-based prediction. Rice is one of the most intensively studied crop species, and three types of gene annotations (GO, PO, and TO) are available. Thus, we evaluated the predictive abilities of transcriptome-based prediction models trained by the gene subsets belonging to each ontology term.

Because our transcriptome data was originally mapped to gene models with MSU IDs, we needed to correspond MSU IDs with other rice gene IDs to assign ontology annotations. GO terms for each RAP ID were listed in “IRGSP-1.0_representative_annotation_2024-01-11.tsv” downloaded from RAP DB. Similarly, TO and PO terms for each Oryzabase ID were listed in “OryzabaseGeneListAll_20240125010001.txt” was downloaded from Oryzabase. First, we convert the MSU IDs to RAP and Oryzabase IDs. Each MSU ID was converted to RAP ID using the ID converter file “RAP-MSU_2024-01-11.txt” downloaded from RAP-DB, and the RAP ID was further converted to Oryzabase ID by the annotation summary file “IRGSP-1.0_representative_annotation_2024-01-11.tsv”. We converted the IDs with a one-to-one correspondence: when one MSU ID had more than one RAP ID or when one RAP ID had more than one Oryzabase ID, we did not convert the former ID to the latter. Using the converted RAP and Oryzabase IDs, we extracted gene subsets for each GO, PO, and TO term. There were 3,401 ontology terms in total (2,437 terms for GO, 340 terms for TO, and 624 terms for PO), but most had a small number of genes. To avoid testing too many ontology terms that possibly makes it difficult to interpret results, we excluded ontology terms with less than 50 expressed genes in either leaf or root transcriptome data from our prediction analysis. This retained 186 terms to be analyzed (Supplementary Table S2). For each of the 186 ontology terms, genes belonging to the ontology term were selected as the gene subset to calculate the TRM for transcriptome-based prediction. For each phenotype, we selected the best model from the four previously evaluated combinations of TRM and GRM (L, G + L, R, and G + R) and trained the model using each gene subset instead of all expressed genes.

Different gene subsets result in different predictive abilities, even if the subsets were chosen at random. This random variation of the predictive ability caused by the sampling (selection) of genes needed to be evaluated, because hundreds of ontology terms with different gene subset sizes were compared in this ontology-based prediction analysis. To evaluate the random variation in the predictive abilities, we chose 50, 75, 100, 250, 500, 750, 1,000, 2,500, and 5,000 genes at random 1,000 times each and evaluated the predictive abilities of the same transcriptome-based prediction model for each phenotype. The top 1% predictive ability (i.e., the 10th highest R2 value of 1,000) was calculated for each number of random subsets of genes and each model, and the highest value of the top 1% predictive abilities among the nine different numbers of gene subsets was set to an empirical threshold to declare a significant improvement in predictive ability.

Gene subset selection based on co-expression network

Co-expression network analysis divides genes into clusters with common biological or genetic features. Although a previous study investigated the differences in predictive abilities between core and peripheral genes in an estimated network structure [23], we evaluated the differences in predictive abilities among different co-expression modules.

The co-expression network was estimated for each leaf and root expression dataset using the standard pipeline of the weighted gene correlation network analysis (WGCNA) package version 1.72.5 [35]. First, we filtered the genes expressed in more than 50% of the 57 accessions and retained 15,138 and 16,901 genes in the leaf and root expression data, respectively. The similarity matrix was calculated using the optimal soft-threshold parameter (power = 5 for leaf and 6 for root dataset) determined using the pickSoftThreshold function. The hierarchical clustering was performed by the hclust function with method = “average” option. The generated tree (dendrogram) was divided into clusters using the cutreeDynamic function, and closely related clusters were merged using the mergeCloseModules function, setting all parameters to default values. This resulted in 17 and 11 co-expression modules (including the “gray” module, which is the set of unclustered genes) for leaf and root expression data, respectively (Supplementary Table S3). The transcriptome-based prediction was performed using the same method as the baseline models but using the genes belonging to each co-expression module instead of all expressed genes. The predictive abilities of the different modules were compared with one of the three baseline models: Gonly, Lall, or Rall (the model with the highest predictive ability for the target phenotype was used as the baseline). For some clusters showing better predictive abilities than the baseline model, GO enrichment analysis was performed using the enrichr function in the {clusterProfiler} package version 4.10.0, with its default parameters [36, 37], using the converted RAP IDs and corresponding gene ontology terms described in the previous section. We set the false discovery rate adjusted (FDR-adjusted) P value < 0.10 as our threshold to declare a significant enrichment.

Results

Transcriptome-based prediction using all genes expressed in leaves and roots

Aboveground phenotypes may be more accurately predicted using transcript abundance in the leaves, whereas belowground phenotypes may be more accurately predicted using transcript abundance in the roots. To test this hypothesis, the predictive abilities of the seven prediction models using different combinations of expression and SNP datasets, summarized in Table 1, were evaluated for the nine phenotypes (Fig. 1; Supplementary Figure S2; Supplementary Table S4 and S5). Predictive abilities (R2) approximately ranged from 0.3 to 0.6 for most of the phenotypes irrespective of the model, except for RDR (R2 = 0.01–0.10 depending on the models) and RD_L (R2 < 0.01 in all models). The low predictive abilities of RD_L and RDR may have been caused by the small phenotypic variations among the subpopulations, as visualized in our previous study [28].

Fig. 1
figure 1

Predictive abilities of the genomic and transcriptome-based prediction using all expressed genes. Predictive abilities of the six transcriptome-based models using leaf and/or root expression data with/without genomic relationship matrix were compared to those of the genomic prediction model for the nine phenotypes. Model with the highest predictive ability for each phenotype is denoted by a star symbol

One of the six transcriptome-based prediction models outperformed the genomic prediction (Gonly) for all nine phenotypes (Fig. 1). Prediction using the root expression data resulted in high predictive abilities for belowground phenotypes, with the largest improvement in R2 observed for RD_C (R2 = 0.34 in Gonly, and R2 = 0.59 in Rall). Similarly, the predictive abilities of PH and SDW were improved using leaf expression data instead of, or in addition to, SNP genotype data. For example, the predictive ability for SDW was greatly improved using leaf expression data (R2 = 0.36 in Gonly and R2 = 0.54 in Lall). Intriguingly, the predictive ability of TN was highest when using root expression data (Rall model, R2 = 0.55), implying that tiller number is more closely related to transcript abundance in roots than in leaves.

Models using both leaf and root expression data (Lall+Rall and G + Lall+Rall) did not show the best predictive ability for any phenotype, suggesting the importance of selectively using an appropriate dataset rather than using all datasets. Given these results, we did not evaluate the transcriptome-based prediction models using both leaf and root expression data in subsequent analyses.

Transcriptome-based prediction using gene subsets based on tissue-specificity of transcript abundance

One may hypothesize that genes specifically expressed in roots may have stronger relationships with belowground phenotypes than commonly expressed genes in both leaves and roots (and vice versa for aboveground phenotypes with leaf-specifically expressed genes). A contrasting hypothesis is that most polygenic phenotypes, typically those related to plant size, such as SDW and RDW, are controlled by complex genetic interactions across all plant tissues. Therefore, we evaluated the predictive abilities of eight transcriptome-based prediction models using different subsets of genes with and without GRM, as summarized in Table 2.

The predictive abilities of the 12 transcriptome-based prediction models (eight models in Table 2 and four baseline models using all expressed genes) for the nine phenotypes are shown in Fig. 2 (Supplementary Figure S3; Supplementary Table S6 and S7). For seven of the nine phenotypes tested in this study, neither the tissue-specific nor the commonly expressed gene subsets improved the predictive ability. In particular, the use of a tissue-specific gene subset significantly decreased the predictive ability for PH (R2 = 0.36 in Lall, while R2 = 0.29 in Llsp), TN (R2 = 0.55 in Rall, while R2 = 0.46 in Rrsp), and RL_L (R2 = 0.55 in Rall, while R2 = 0.51 in Rrsp). This means that not only tissue specifically expressed genes, but also commonly expressed genes should be exploited for the prediction of polygenic phenotypes.

Fig. 2
figure 2

Predictive abilities of the transcriptome-based prediction based on commonly expressed or leaf/root specifically expressed genes. Twelve transcriptome-based prediction models were evaluated using different set of data for the nine phenotypes. For better visualization, models using the leaf expression data are shown in the top panel, while the models using the root expression data are displayed in the bottom panel. The model with the highest predictive ability for each phenotype is denoted by a star symbol. Results of the four baseline models (Lall, G + Lall, Rall, and G + Rall) are reiterated in this figure for comparison with the corresponding results shown in Fig. 1

In contrast, the predictive ability for RD_C was improved using the root-specifically expressed gene subset (R2 = 0.59 in Rall, while R2 = 0.66 in Rrsp). Although the predictive abilities were extremely low, the same improvement using root-specifically expressed genes was observed for RD_L (R2 < 0.01 in Rall, while R2 = 0.01 in Rrsp). Compared to the other phenotypes tested in this study, root diameter is thought to be a ‘local’ phenotype that might be less affected by the biological mechanisms in other tissues. This may explain why the Rrsp model performed the best for RD_C and RD_L.

Transcriptome-based prediction using gene subsets based on ontologies

If an ontology term has a close relationship with the prediction target phenotype, genes belonging to that term are expected to show a higher predictive ability than randomly chosen genes. We evaluated the predictive ability of the transcriptome-based prediction for nine phenotypes using each of the 186 gene subsets according to gene, trait, and plant ontologies, resulting in 9 × 186 = 1,674 predictive abilities, as shown in Supplementary Table S8 and S9. These ontology-based prediction results were compared with the prediction results obtained using randomly selected gene subsets (Supplementary Table S10 and S11).

The prediction results for the RDW are shown in Fig. 3 as a representative case, and the same figures for the other phenotypes are provided in Supplementary Figure S4. Overall, the predictive ability was improved by using more genes in the model, and the best top 1% predictive ability was observed when using a random set of 250 genes (R2 = 0.63; Supplementary Table S10). Two ontology terms “gibberellic acid sensitivity” (TO:0000166) and “phosphorus sensitivity” (TO:0000102) passed the empirical significance threshold for RDW. Considering the role of gibberellic acid in root development [38, 39] and the relationship between exogenous phosphorus availability and root growth [40, 41], this result seems reasonable.

Fig. 3
figure 3

Predictive abilities of the transcriptome-based prediction using ontology-based gene subsets for root dry weight. Gray shaded area represents top and bottom 1% predictive abilities obtained from the different sizes of 1,000 random gene subsets, while the black curve represents the average of the 1,000 predictive abilities. The black horizontal dashed line represents the predictive ability using all expressed genes, while the red horizontal dashed line represents the highest top 1% predictive ability among the different sizes of random gene subsets. If an ontology term resulted in a higher predictive ability than the threshold shown by the red dashed line, its name is displayed in a colored box

In addition to the above two ontology terms for RDW, nine combinations of ontology terms and phenotypes showed significant improvements in the predictive ability (Table 3). For example, a transcriptome prediction model using transcription factor genes in leaves improved the predictive ability of PH and SDW. Although some well-known transcription factors affect shoot growth, such as OsSPL14 [42] and OsPIL1 [43], it is difficult to interpret this result because almost all biological processes are transcriptionally regulated to some extent.

Table 3 Ontology terms showing significantly improved predictive ability

Prior to this analysis, we hypothesized that root-related ontology terms would improve the predictive ability of root phenotypes. However, as shown in Table 4, models using gene subsets according to root-related ontology terms did not improve the predictive ability compared to the model using all genes in most cases. This suggests a limitation of the ontology in predicting genetic variation among diverse rice accessions using expression levels. For instance, gene functions are often specific to growth stages. In our root expression dataset, 198/845 (23.4%) genes having the “root” annotation (PO: 0009005) were not expressed (Supplementary Table S2), implying that those genes could be expressed at a different growth stage. Furthermore, some genes may have functional variations owing to their protein sequence and not transcript abundance. In short, ontologies do not guarantee a quantitative relationship between transcriptomic and phenotypic variation among genotypes, particularly in our analysis of diverse rice accessions using single-time-point RNA-seq data.

Table 4 Predictive ability using gene subsets according to the root-related ontology terms

Transcriptome-based prediction using gene subsets based on co-expression modules

Different co-expression modules have different biological or genetic characteristics. Therefore, when selecting a gene subset according to a co-expression module, the predictive abilities of transcriptome-based predictions may vary among the modules. Using the standard WGCNA pipeline, we estimated 17 and 11 modules for the leaf and root expression data, respectively. Using the genes in each module, the predictive ability of the transcriptome-based prediction models with and without the GRM was evaluated for the nine phenotypes. Thus, a total of (17 + 11) modules × 9 phenotypes × 2 models (with/without GRM) = 504 predictive abilities were evaluated in this analysis (Supplementary Table S12 and S13) and compared with the model using all genes.

Among the 504 combinations of co-expression modules, target phenotypes, and inclusion/exclusion of GRM, only 39 showed improved predictive ability (Supplementary Table S7). There were five modules in the leaves (turquoise, brown, yellow, purple, and salmon) and seven modules in the roots (turquoise, blue, brown, green, black, red, and cyan) that improved the predictive ability for at least one phenotype when used in the transcriptome-based prediction model without GRM (Fig. 4). Of these, only two combinations improved R2 by more than + 0.10 from the baseline model: the yellow module in the leaf for PH (R2 = 0.50; +0.13 from the baseline model) and the cyan module in the root for TN (R2 = 0.65; +0.10 from the baseline model). Similar results were obtained using the GRM (Supplementary Figure S5).

Fig. 4
figure 4

Improvement of the predictive abilities of the transcriptome-based prediction based on different co-expression modules. This figure depicts the transcriptome-based prediction models without GRM. The predictive performance of each module and phenotype combination can be found in Supplementary Table S7

The yellow module estimated from the leaf expression data consisted of 1,121 genes. GO enrichment analysis revealed the enrichment of 11 GO terms (Supplementary Table S14), including ATP-dependent chromatin remodeling activity (GO: 0140658), zinc ion binding (GO: 0008270), and nucleic acid binding (GO: 0003676). Although these gene annotations can be associated with various plant phenotypes, we found a few genes with experimental evidence of their effects on plant height. For example, one of the 14 genes with ‘ATP-dependent chromatin remodeler activity’ annotation in the yellow module in the leaf was OsINO80 (LOC_Os03g22900), and its knockout line showed shorter plant height due to the reduction of endogenous gibberellic acid [44]. Likewise, one of the 52 genes with ‘zinc ion binding’ annotation in this module was SDG725 (LOC_Os02g34850), and its downregulation caused dwarf phenotype [45]. In addition, two (semi-) dwarf genes, ASD1 [46] and OsDSL1 [47], were included in this module.

The cyan module estimated from the root transcriptome data is a relatively small gene set consisting of 228 genes expressed in the roots. Although we applied GO enrichment analysis, all four enriched ontology terms with a 10% FDR cutoff consisted of less than ten genes (Supplementary Table S15), which was too small a number of genes for meaningful interpretation. Therefore, we simply looked at the list of manually curated publications for all 228 genes in RAP-DB, but none of them were mentioned in published studies showing their effects on tiller number.

Discussion

The predictive ability of transcriptome-based predictions may be improved by selecting an appropriate gene subset; however, clear methods for selecting genes, particularly for predicting polygenic traits, have not been established. Therefore, in this study, we performed transcriptome-based predictions on the WRC dataset using three types of gene-level features to select gene subsets: tissue specificity of transcripts, ontologies, and co-expression modules. Results considering the tissue specificity of transcripts were biologically reasonable: root diameter was more accurately predicted by using root-specifically expressed genes than by using all genes, but most of the phenotypes were most accurately predicted by using all genes. In contrast, the results based on gene ontologies showed a difficulty of polygenic trait prediction. We expected that genes with root-related ontology terms would show high predictive ability for root phenotypes, but our results did not support this simple idea. Similarly, the results obtained using the co-expression modules were not easily interpretable. Overall, we found it difficult to select an appropriate gene subset for polygenic traits given a current biological knowledge and information. Below, we summarized lessons and challenges from our analyses and discussed some possible ways to improve the gene subset selection for transcriptome-based prediction.

The connection between the tissue to quantify transcript abundance and the target phenotype is important for transcriptome-based prediction. While most previous studies tested the predictive abilities using the transcripts quantified in one tissue [8,9,10,11,12,13,14,15, 19], our dataset consisting of both shoot and root phenotypes with expression data quantified in both leaves and roots enabled a head-to-head comparison of the predictive ability using the transcripts in two different tissues. The predictive abilities for root phenotypes were increased by training the prediction model using transcripts in roots compared to the case using transcripts in leaves. However, it is not always easy to relate a target phenotype to a single tissue. For example, tiller number is often regarded as a shoot phenotype but was most accurately predicted by root transcriptome data in our study. This may reflect the pleiotropic genetic control of tiller and root growth, including the synchronized development of roots and shoots from a unit called the phytomer [48, 49]. Furthermore, transcript abundance is affected by the growth stage, time of day, tissue, or environmental conditions. For example, a recent study on black poplar reported that the use of transcriptome data in addition to genome data in the prediction model was beneficial for traits measured in the same environment as transcriptomic data collection [16]. To better understand genotype-to-phenotype relationships, it is necessary to further evaluate the differences and stability of transcript abundance across sampling conditions in terms of their predictive ability for different phenotypes.

By testing thousands of gene subsets according to three gene-level features (tissue specificity of transcripts, ontologies, and co-expression modules), we found a few combinations of gene subsets and target phenotypes showing higher predictive ability than when using all genes. Some of the identified combinations are biologically interpretable. For instance, the root diameter was most accurately predicted using genes specifically expressed in the roots, implying that local genetic effects at the root tip mainly determine the root diameter. In rice root tips, the number of asymmetric cell divisions in the epidermis-endodermis initial cells determines the number of cortical cell layers and, consequently, the root diameter [50]. We also found that the predictive ability of RDW became much higher by using the genes annotated as “gibberellic acid sensitivity” compared to the case using all genes, which is consistent with the importance of gibberellic acid for the root development [38, 39]. These results imply that our biological knowledge can be exploited for gene subset selection to improve the ability of transcriptome-based predictions. Meanwhile, most gene subsets showed lower predictive abilities than when all genes were used, indicating the difficulty in predicting polygenic traits using a small number of genes.

Ontology annotation is one of the most popular types of gene-level information that is widely used in genetic studies. Therefore, we hypothesized that certain ontology terms, such as root-related ones for the prediction of root phenotypes, would result in higher predictive abilities than using all genes. However, the results highlight the difficulty of gene subset selection based on ontology annotations. The selected genes contribute to the prediction only if (1) the gene is expressed in the sampled tissue at the time of sampling, (2) the transcript abundance varies among different genotypes, and (3) the transcript abundance and target traits are correlated. Although ontologies provide concise information about the function of genes, they do not provide information about growth-stage-dependent gene expression patterns or the tissue specificity of transcripts. It may be worth integrating other types of information with ontological annotations to select gene subsets. For example, the expression atlas [51,52,53], which is a collection of transcript abundances in multiple tissues, cells, or at different times, may complement ontology annotation for selecting a gene subset. Combining multiple sources of information to select an optimal gene subset for transcriptome-based predictions is an important topic for future studies.

Given that the size of the dataset (n = 57) used in this study was limited, it was difficult to draw a clear conclusion to select an appropriate gene subset. Obviously, it is worth testing our approach (or something relevant) using hundreds of lines as is often done in genome-wide association studies. On the other hand, particularly considering the application of the transcriptome-based prediction to plant breeding or some related agricultural science, it is not always possible to quantify phenotypic values for many cultivars or individuals. For example, some of our prediction target phenotypes were root traits, which requires extremely time- and labor-consuming phenotyping in an experimental filed such as the backhoe-based monolith method [30]. Therefore, it is important to develop a prediction method that uses various types of available information to accurately predict complex traits even when the size of the training data is limited, as discussed in a previous study using approximately 20 lines of maize and Arabidopsis [18].

To improve genotype-to-phenotype prediction, it may be worth putting more effort into characterizing genes under field conditions. Most genes are characterized based only on experiments under laboratory conditions (e.g., growth chamber), but there is often a gap between the molecular function revealed in the laboratory and the biological or physiological effects in the field [54, 55]. Fortunately, the decreasing cost of microarray-based RNA quantification and RNA-seq is making it possible to conduct large-scale field transcriptomics studies, which may contribute to bridging this “lab-field gap” [56, 57]. If most genes are characterized based not only on in vitro or in vivo functions but also on their biological role in natura [58], we may be able to define an optimal gene subset even for a complex phenotype in the field.

Conclusion

In the present study, we assessed the predictive ability of leaf and root expression data for shoot and root phenotypes in WRC accessions. Root phenotypes were better predicted by the transcripts in roots than in leaves, indicating the importance of the appropriate selection of sampling tissue to achieve a higher predictive ability. Furthermore, we evaluated three gene-level features–tissue specificity of transcripts, ontology annotation, and co-expression module–to select gene subsets for transcriptome-based prediction of polygenic traits in rice accessions. While some gene subsets showed higher predictive abilities than all genes for some phenotypes, our results showed the difficulty in selecting appropriate gene subsets according to a single feature. These results indicate the complex genetic mechanisms of the agronomic phenotypes among the rice accessions; however, much remains to be investigated. Although we tested the three types of information individually in the present study, it may be possible to select an optimal gene subset for each target phenotype by aggregating and combining multiple sources of information as well as characterizing the genes under field conditions.

Data availability

All the data used in the present analyses have been published previously. Raw sequence data were deposited in the DNA Data Bank of Japan Sequence Read Archive in a previous study [27]. Transcriptome data are available from the Gene Expression Omnibus (GSE162313) and phenotype data are available in the supplementary file of a previous study [28]. All codes for the data analysis are shown in Figshare (https://doi.org/10.6084/m9.figshare.26067532.v1).

Abbreviations

WRC:

World rice core collection

GO:

Gene ontology

TO:

Trait ontology

PO:

Plant ontology

PH:

Plant height

TN:

Tiller number

SDW:

Shoot dry weight

RL_C:

Crown root length

RL_L:

Lateral root length

RD_C:

Crown root diameter

RD_L:

Lateral root diameter

RDW:

Root dry weight

RDR:

Ratio of deep rooting

FPKM:

Fragments per kilobase of exon per million read

GRM:

Genomic relationship matrix

TRM:

Transcriptome-based relationship matrix

WGCNA:

Weighted gene correlation network analysis

FDR:

False discovery rate

References

  1. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29. https://doi.org/10.1093/genetics/157.4.1819.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hickey JM, Chiurugwi T, Mackay I, Powell W. Implementing genomic selection in CGIAR breeding programs workshop participants. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet. 2017;49(9):1297–303. https://doi.org/10.1038/ng.3920.

    Article  CAS  PubMed  Google Scholar 

  3. Voss-Fels KP, Cooper M, Hayes BJ. Accelerating crop genetic gains with genomic selection. Theor Appl Genet. 2019;132(3):669–86. https://doi.org/10.1007/s00122-018-3270-8.

    Article  PubMed  Google Scholar 

  4. Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the bayesian alphabet. Genetics. 2009;183(1):347–63. https://doi.org/10.1534/genetics.109.103952.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Varona L, Legarra A, Toro MA, Vitezica ZG. Non-additive effects in genomic selection. Front Genet. 2018;9:78. https://doi.org/10.3389/fgene.2018.00078.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Ishimori M, Hattori T, Yamazaki K, Takanashi H, Fujimoto M, Kajiya-Kanegae H, et al. Impacts of dominance effects on genomic prediction of sorghum hybrid performance. Breed Sci. 2020;70(5):605–16. https://doi.org/10.1270/jsbbs.20042.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ramstein GP, Larsson SJ, Cook JP, Edwards JW, Ersoz ES, Flint-Garcia S, et al. Dominance effects and functional enrichments improve prediction of agronomic traits in hybrid maize. Genetics. 2020;215(1):215–30. https://doi.org/10.1534/genetics.120.303025.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Azodi CB, Pardo J, VanBuren R, de Los Campos G, Shiu SH. Transcriptome-based prediction of complex traits in maize. Plant Cell. 2020;32(1):139–51. https://doi.org/10.1105/tpc.19.00332.

    Article  CAS  PubMed  Google Scholar 

  9. Frisch M, Thiemann A, Fu J, Schrag TA, Scholten S, Melchinger AE. Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize. Theor Appl Genet. 2010;120(2):441–50. https://doi.org/10.1007/s00122-009-1204-1.

    Article  CAS  PubMed  Google Scholar 

  10. Guo Z, Magwire MM, Basten CJ, Xu Z, Wang D. Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize. Theor Appl Genet. 2016;129(12):2413–27. https://doi.org/10.1007/s00122-016-2780-5.

    Article  CAS  PubMed  Google Scholar 

  11. Schrag TA, Westhues M, Schipprack W, Seifert F, Thiemann A, Scholten S, et al. Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics. 2018;208(4):1373–85. https://doi.org/10.1534/genetics.117.300374.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Xu S, Xu Y, Gong L, Zhang Q. Metabolomic prediction of yield in hybrid rice. Plant J. 2016;88(2):219–27. https://doi.org/10.1111/tpj.13242.

    Article  CAS  PubMed  Google Scholar 

  13. Knoch D, Werner CR, Meyer RC, Riewe D, Abbadi A, Lücke S, et al. Multi-omics-based prediction of hybrid performance in canola. Theor Appl Genet. 2021;134(4):1147–65. https://doi.org/10.1007/s00122-020-03759-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hu H, Campbell MT, Yeats TH, Zheng X, Runcie DE, Covarrubias-Pazaran D, et al. Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. Theor Appl Genet. 2021;134(12):4043–54. https://doi.org/10.1007/s00122-021-03946-4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hershberger J, Tanaka R, Wood JC, Kaczmar N, Wu D, Hamilton JP, et al. Transcriptome-wide association and prediction for carotenoids and tocochromanols in fresh sweet corn kernels. Plant Genome. 2022;15(2):e20197. https://doi.org/10.1002/tpg2.20197.

    Article  CAS  PubMed  Google Scholar 

  16. Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics. 2022;23(1):476. https://doi.org/10.1186/s12864-022-08690-7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Chantaraamporn J, Phumikhet P, Nguantad S, Techo T, Charoensawan V. Machine learning applications for transcription level and phenotype predictions. IUBMB Life. 2022;4(12):1273–87. https://doi.org/10.1002/iub.2693.

    Article  CAS  Google Scholar 

  18. Cheng CY, Li Y, Varala K, Bubert J, Huang J, Kim GJ, et al. Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships. Nat Commun. 2021;12(1):5627. https://doi.org/10.1038/s41467-021-25893-w.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Bernardo R. Genomewide selection when major genes are known. Crop Sci. 2014;54:68–75. https://doi.org/10.2135/cropsci2013.05.0315.

    Article  Google Scholar 

  20. Spindel JE, Begum H, Akdemir D, Collard B, Redoña E, Jannink JL, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity. 2016;116(4):395–408. https://doi.org/10.1038/hdy.2015.113.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, et al. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. Plant Genome. 2023;16(4):e20276. https://doi.org/10.1002/tpg2.20276.

    Article  CAS  PubMed  Google Scholar 

  22. Morota G, Abdollahi-Arpanahi R, Kranis A, Gianola D. Genome-enabled prediction of quantitative traits in chickens using genomic annotation. BMC Genomics. 2014;15:109. https://doi.org/10.1186/1471-2164-15-109.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Chateigner A, Lesage-Descauses MC, Rogier O, Jorge V, Leplé JC, Brunaud V, et al. Gene expression predictions and networks in natural populations supports the omnigenic theory. BMC Genomics. 2020;21(1):416. https://doi.org/10.1186/s12864-020-06809-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, et al. Rice annotation project database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol. 2013;54(2):e6. https://doi.org/10.1093/pcp/pcs183.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Yamazaki Y, Sakaniwa S, Tsuchiya R, Nonomura KI, Kurata N. Oryzabase: an integrated information resource for rice science. Breed Sci. 2010;60(5):544–8. https://doi.org/10.1270/jsbbs.60.544.

    Article  Google Scholar 

  26. Kojima Y, Ebana K, Fukuoka S, Nagamine T, Kawase M. Development of an RFLP-based rice diversity research set of germplasm. Breed Sci. 2005;55(4):431–40. https://doi.org/10.1270/jsbbs.55.431.

    Article  CAS  Google Scholar 

  27. Tanaka N, Shenton M, Kawahara Y, Kumagai M, Sakai H, Kanamori H, et al. Whole-genome sequencing of the NARO world rice core collection (WRC) as the basis for diversity and association studies. Plant Cell Physiol. 2020;61(5):922–32. https://doi.org/10.1093/pcp/pcaa019.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Kawakatsu T, Teramoto S, Takayasu S, Maruyama N, Nishijima R, Kitomi Y, et al. The transcriptomic landscapes of rice cultivars with diverse root system architectures grown in upland field conditions. Plant J. 2021;106(4):1177–90. https://doi.org/10.1111/tpj.15226.

    Article  CAS  PubMed  Google Scholar 

  29. Wei S, Tanaka R, Kawakatsu T, Teramoto S, Tanaka N, Shenton M, et al. Genome- and transcriptome-wide association studies to discover candidate genes for diverse root phenotypes in cultivated rice. Rice. 2023;16(1):55. https://doi.org/10.1186/s12284-023-00672-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Teramoto S, Kitomi Y, Nishijima R, Takayasu S, Maruyama N, Uga Y. Backhoe-assisted monolith method for plant root phenotyping under upland conditions. Breed Sci. 2019;69(3):508–13. https://doi.org/10.1270/jsbbs.19019.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. https://doi.org/10.1093/bioinformatics/btp616.

    Article  CAS  PubMed  Google Scholar 

  32. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23. https://doi.org/10.3168/jds.2007-0980.

    Article  CAS  PubMed  Google Scholar 

  33. Endelman JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2011;4(3):250–5. https://doi.org/10.3835/plantgenome2011.08.0024.

    Article  Google Scholar 

  34. Akdemir D, Godfrey OU. EMMREML: Fitting Mixed Models with Known Covariance Structures. 2015. https://CRAN.R-project.org/package=EMMREML

  35. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. https://doi.org/10.1186/1471-2105-9-559.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7. https://doi.org/10.1089/omi.2011.0118.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innov. 2021;l2(3):100141. https://doi.org/10.1016/j.xinn.2021.100141.

    Article  CAS  Google Scholar 

  38. Tanimoto E. Tall or short? Slender or thick? A plant strategy for regulating elongation growth of roots by low concentrations of gibberellin. Ann Bot. 2012;110(2):373–81. https://doi.org/10.1093/aob/mcs049.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Shtin M, Dello Ioio R, Del Bianco M. It’s time for a change: the role of gibberellin in root meristem development. Front Plant Sci. 2022;13:882517. https://doi.org/10.3389/fpls.2022.882517.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Cao Y, Ai H, Jain A, Wu X, Zhang L, Pei W, et al. Identification and expression analysis of OsLPR family revealed the potential roles of OsLPR3 and 5 in maintaining phosphate homeostasis in rice. BMC Plant Biol. 2016;16(1):210. https://doi.org/10.1186/s12870-016-0853-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Sun L, Tian J, Zhang H, Liao H. Phytohormone regulation of root growth triggered by P deficiency or Al toxicity. J Exp Bot. 2016;67(12):3655–64. https://doi.org/10.1093/jxb/erw188.

    Article  CAS  PubMed  Google Scholar 

  42. Wang J, Zhou L, Shi H, Chern M, Yu H, Yi H, et al. A single transcription factor promotes both yield and immunity in rice. Science. 2018;361(6406):1026–8. https://doi.org/10.1126/science.aat7675.

    Article  CAS  PubMed  Google Scholar 

  43. Todaka D, Nakashima K, Maruyama K, Kidokoro S, Osakabe Y, Ito Y, et al. Rice phytochrome-interacting factor-like protein OsPIL1 functions as a key regulator of internode elongation and induces a morphological response to drought stress. Proc Natl Acad Sci. 2012;109(39):15947–52. https://doi.org/10.1073/pnas.1207324109.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Li C, Liu Y, Shen WH, Yu Y, Dong A. Chromatin-remodeling factor OsINO80 is involved in regulation of gibberellin biosynthesis and is crucial for rice plant growth and development. J Integr Plant Biol. 2018;60(2):144–59. https://doi.org/10.1111/jipb.12603.

    Article  CAS  PubMed  Google Scholar 

  45. Sui P, Jin J, Ye S, Mu C, Gao J, Feng H, et al. H3K36 methylation is critical for brassinosteroid-regulated plant growth and development in rice. Plant J. 2012;70(2):340–7. https://doi.org/10.1111/j.1365-313x.2011.04873.x.

    Article  CAS  PubMed  Google Scholar 

  46. Kadambari G, Vemireddy LR, Srividhya A, Nagireddy R, Jena SS, Gandikota M, et al. QTL-Seq-based genetic analysis identifies a major genomic region governing dwarfness in rice (Oryza sativa L). Plant Cell Rep. 2018;37(4):677–87. https://doi.org/10.1007/s00299-018-2260-2.

    Article  CAS  PubMed  Google Scholar 

  47. Kubo FC, Yasui Y, Ohmori Y, Kumamaru T, Tanaka W, Hirano HY. DWARF WITH SLENDER LEAF1 encoding a histone deacetylase plays diverse roles in rice development. Plant Cell Physiol. 2020;61(3):457–69. https://doi.org/10.1093/pcp/pcz210.

    Article  CAS  PubMed  Google Scholar 

  48. Nemoto K, Morita S, Baba T. Shoot and root development in rice related to the phyllochron. Crop Sci. 1995;35(1):24–9. https://doi.org/10.2135/cropsci1995.0011183X003500010005x.

    Article  Google Scholar 

  49. Rebouillat J, Dievart A, Verdeil JL, Escoute J, Giese G, Breitler JC, et al. Molecular genetics of rice root development. Rice. 2009;2:15–34. https://doi.org/10.1007/s12284-008-9016-5.

    Article  Google Scholar 

  50. Coudert Y, Périn C, Courtois B, Khong NG, Gantet P. Genetic control of root development in rice, the model cereal. Trends Plant Sci. 2010;15(4):219–26. https://doi.org/10.1016/j.tplants.2010.01.008.

    Article  CAS  PubMed  Google Scholar 

  51. Nobuta K, Venu RC, Lu C, Beló A, Vemaraju K, Kulkarni K, et al. An expression atlas of rice mRNAs and small RNAs. Nat Biotechnol. 2007;25(4):473–7. https://doi.org/10.1038/nbt1291.

    Article  CAS  PubMed  Google Scholar 

  52. Fujita M, Horiuchi Y, Ueda Y, Mizuta Y, Kubo T, Yano K, et al. Rice expression atlas in reproductive development. Plant Cell Physiol. 2010;51(12):2060–81. https://doi.org/10.1093/pcp/pcq165.

    Article  CAS  PubMed  Google Scholar 

  53. Wang L, Xie W, Chen Y, Tang W, Yang J, Ye R, et al. A dynamic gene expression atlas covering the entire life cycle of rice. Plant J. 2010;61(5):752–66. https://doi.org/10.1111/j.1365-313X.2009.04100.x.

    Article  CAS  PubMed  Google Scholar 

  54. Zaidem ML, Groen SC, Purugganan MD. Evolutionary and ecological functional genomics, from lab to the wild. Plant J. 2019;97(1):40–55. https://doi.org/10.1111/tpj.14167.

    Article  CAS  PubMed  Google Scholar 

  55. Hashida Y, Tezuka A, Nomura Y, Kamitani M, Kashima M, et al. Fillable and unfillable gaps in plant transcriptome under field and controlled environments. Plant Cell Environ. 2022;45(8):2410–27. https://doi.org/10.1111/pce.14367.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Nagano AJ, Sato Y, Mihara M, Antonio BA, Motoyama R, et al. Deciphering and prediction of transcriptome dynamics under fluctuating field conditions. Cell. 2012;151(6):1358–69. https://doi.org/10.1016/j.cell.2012.10.048.

    Article  CAS  PubMed  Google Scholar 

  57. Nagano AJ, Kawagoe T, Sugisaka J, Honjo MN, Iwayama K, et al. Annual transcriptome dynamics in natural environments reveals plant seasonal adaptation. Nat Plants. 2019;5(1):74–83. https://doi.org/10.1038/s41477-018-0338-z.

    Article  PubMed  Google Scholar 

  58. Quintana-Murci L, Alcaïs A, Abel L, Casanova JL, Immunology. Naturanatura: clinical, epidemiological and evolutionary genetics of infectious diseases. Nat Immunol. 2007;8(11):1165–71. https://doi.org/10.1038/ni1535.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the editor and anonymous reviewers for their insightful comments. We would like to thank Editage (www.editage.jp) for English language editing.

Funding

This work was supported by Cabinet Office, Government of Japan, Moonshot Research, and Development Program for Agriculture, Forestry, and Fisheries (funding agency: Bio-oriented Technology Research Advancement Institution, No. JPJ009237).

Author information

Authors and Affiliations

Authors

Contributions

RT conceptualized the study, conducted the formal analyses, and drafted the original manuscript. Kawakatsu NT, MS, and YU acquired and curated the datasets. T. Kawai, SY, and YU supervised the study and edited the manuscript. All the authors have read and approved the final manuscript.

Corresponding author

Correspondence to Ryokei Tanaka.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12864_2024_10803_MOESM1_ESM.docx

Supplementary Material 1. Additional file 1: Supplementary Figures (docx file). Supplementary Figures S1 to S5 with figure titles and legends.

Supplementary Material 2. Additional file 1: Supplementary Tables (xlsx file). Supplementary Tables S1 to S15.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tanaka, R., Kawai, T., Kawakatsu, T. et al. Transcriptome-based prediction for polygenic traits in rice using different gene subsets. BMC Genomics 25, 915 (2024). https://doi.org/10.1186/s12864-024-10803-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12864-024-10803-3

Keywords