Correlation exploration of metabolic and genomic diversity in rice
© Mochida et al; licensee BioMed Central Ltd. 2009
Received: 25 May 2009
Accepted: 1 December 2009
Published: 1 December 2009
It is essential to elucidate the relationship between metabolic and genomic diversity to understand the genetic regulatory networks associated with the changing metabolo-phenotype among natural variation and/or populations. Recent innovations in metabolomics technologies allow us to grasp the comprehensive features of the metabolome. Metabolite quantitative trait analysis is a key approach for the identification of genetic loci involved in metabolite variation using segregated populations. Although several attempts have been made to find correlative relationships between genetic and metabolic diversity among natural populations in various organisms, it is still unclear whether it is possible to discover such correlations between each metabolite and the polymorphisms found at each chromosomal location. To assess the correlative relationship between the metabolic and genomic diversity found in rice accessions, we compared the distance matrices for these two "omics" patterns in the rice accessions.
We selected 18 accessions from the world rice collection based on their population structure. To determine the genomic diversity of the rice genome, we genotyped 128 restriction fragment length polymorphism (RFLP) markers to calculate the genetic distance among the accessions. To identify the variations in the metabolic fingerprint, a soluble extract from the seed grain of each accession was analyzed with one dimensional 1H-nuclear magnetic resonance (NMR). We found no correlation between global metabolic diversity and the phylogenetic relationships among the rice accessions (rs = 0.14) by analyzing the distance matrices (calculated from the pattern of the metabolic fingerprint in the 4.29- to 0.71-ppm 1H chemical shift) and the genetic distance on the basis of the RFLP markers. However, local correlation analysis between the distance matrices (derived from each 0.04-ppm integral region of the 1H chemical shift) against genetic distance matrices (derived from sets of 3 adjacent markers along each chromosome), generated clear correlations (rs > 0.4, p < 0.001) at 34 RFLP markers.
This combinatorial approach will be valuable for exploring the correlative relationships between metabolic and genomic diversity. It will facilitate the elucidation of complex regulatory networks and those of evolutionary significance in plant metabolic systems.
It is essential to elucidate the relationships between metabolic and genomic diversity to understand the genetic causes of phenotypic variation among natural populations. Visible and chemical variations associated with genomic diversity provide the information required to identify key genes associated with such phenotypic changes . Patterns of nucleotide polymorphisms and a population structure figured in a natural population have allowed us to understand the evolutionary significance of genetic variation under the influence of geographic factors or different environments [2, 3].
The technical development of metabolite profiling has provided quantitative phenotypic information on the metabolome among strains and identified genes involved in metabolic networks . Therefore, a comprehensive exploration of the correlations between metabolic and genomic diversity, achieved by superimposing data from these two "omics" approaches, could provide information regarding both the broad and specific relationships between metabolo-phenotypes and genotypes. This information would aid the identification of genetic associations between the metabolic and/or visible phenotypes [5, 6].
Metabolite quantitative trait loci (mQTLs) analysis using segregated populations has been applied to various plant species as a popular forward genetics approach [4, 5, 7–9]. Although mQTLs analysis has been used to dissect the genetic loci involved in metabolo-phenotypic changes in mapping populations of various organisms, no methods have been developed to identify the correlations between genetic polymorphisms and metabolo-phenotypic differences in divergent accessions or individuals from natural populations. To date, several attempts have been performed to find correlative relationships between genetic and metabolic diversity in various organisms [10–13]. It is still unclear whether it is possible to discover such correlations between each metabolite and the polymorphic pattern found at each chromosomal location to explain natural variation. Combinatorial approaches of population genomics coupled with metabolo-phenotyping should play a significant role in the exploration of the association of genetic variation with metabolic changes that have a significant impact on evolution and adaptation .
In this study, to assess the correlative relationship between metabolic and genomic diversity found in 18 rice accessions selected from the world rice collection, we compared the distance relationships of accessions for these two omics patterns, globally, with marker polymorphisms vs. the metabolomic fingerprints of NMR spectra, as well as locally, for each chromosomal region vs. each spectra. To perform these analyses, we developed a robust procedure to explore the relationship between metabolic and genomic diversity. As a result, we present the first correlation map of metabolic and genomic diversity among rice accessions.
Results and Discussion
The presentation of the correlations as a 2D heat map showed large non-correlative areas studded with correlative areas, which can be explored in omics-wide comparisons. The heat map showed clear correlations between the profiles of soluble metabolites and local genomic diversity at each chromosomal location. To identify metabolites, we performed 2D 1H-13C NMR analysis (Additional file 6, 7 and 8). Diversity in the choline profile (3.2 ppm) showed a high correlation with genetic diversity around the R1613 RFLP marker, located at 176.5 cM on chromosome (Chr.) 1 (rs = 0.67, P < 0.01). The lipid methylene signal (1.22 ppm) showed correlations with polymorphic sites at multiple chromosomal locations, with the highest located around the C43 RFLP marker at 92.8 cM on Chr. 5 (rs = 0.66, P < 0.01), and the second highest around R1943 at 123.7 cM on Chr. 8 (rs = 0.58, P < 0.01). The correlation peaks corresponding to each of the lipids (1.00, 1.16, 1.22, and 1.60 ppm) with Chr. 2 (153.6 and 95.7 cM), Chr. 5 (41.4, 92.8, 103.5, 112.3, and 9.8 cM), and Chr. 11 (123.3 and 85.8 cM) were in good accordance with the covalent bond networks within molecules of each compound. Furthermore, sucrose (3.40, 3.59, and 3.78 ppm) with Chr. 1 (186.8 cM) and Chr. 4 (31.4 cM) also showed good accordance. Similar intramolecular covalent bond networks were reported in NMR experiments of biological metabolite mixtures, including covariance total correlation spectroscopy (TOCSY) analysis of insect venom  and TOCSY analysis of intact human gut biopsies .
Local comparison between the dendrogram based on the correlative RFLP polymorphisms and NMR spectra showed a similar tree structure between both genetic and metabolic variation among the accessions. For example, the dendrogram of genetic polymorphisms around the C43 RFLP marker (92.8 cM on Chr. 5) and the diversity of the lipid methylene signal indicated the genetic and metabolic divergence between indica I and two other groups, japonica and indica II (Additional file 9).
To display the chromosomal locations of the correlative regions at 34 RFLP markers, the results of the correlation analysis were superimposed onto a genetic map of rice (Additional file 10). This genetic map should allow us to identify metabolic QTLs associated with each of the metabolic abundances allocated on a genetic map. It would be beneficial if future genetic analyses narrowed down candidate genes and/or isolated such metabolic QTLs in combination with other genomic resources for rice such as genetic marker systems and genome annotation databases.
Throughout this study, we developed a robust approach to explore the correlation between genomic and metabolomic diversity observed in natural variation, which is shown as a schematic flow in additional file 11. The procedure is summarized as follows: (1) selection of appropriate accessions based on population structures; (2) acquisition of the metabolic profile of selected accessions using various types of metabolomics platforms and the calculation of the distance among accessions on the basis of metabolite abundance or spectrum intensities; (3) genotyping of selected accessions using DNA markers covering the whole genome and the calculation of local genetic distance among accessions from genotype data of adjacent marker sets; (4) to discover the correlative relationships between each metabolite and chromosomal region and the coefficients of correlation for metabolic and genetic distances in all-against-all combinations and to display their distribution on a genetic map. In this analysis of rice, we did not find a correlation between metabolic and genomic diversity as observed in previously reported studies. On the other hand, local correlation analysis allowed us to identify small correlative areas mostly in non-correlative spaces. This suggests that local correlation analysis could be an effective method to discover such correlative areas allocated to chromosomal regions together with linkage maps or variation data of genomic sequences [18, 19].
This study illustrated a new integrative approach to explore the links between metabolomics and genomics. The procedure could be extended to a higher resolution by using denser polymorphic marker maps in various species.
This study assessed the correlative relationships between genetic and metabolic diversity among rice accessions. The data presents the first correlative relationship for the chromosomal distribution of each metabolic profile in rice. The combinatorial use of genomic and metabolomic data demonstrated here should facilitate the elucidation of complex regulatory networks and those of evolutionary significance in plant metabolic systems.
Rice seed materials
The rice accessions were chosen from the RFLP-based Rice Discovery Research Set of rice germplasms, which contains 68 rice accessions and is developed and maintained by Genebank http://www.gene.affrc.go.jp/about_en.php at the National Institute of Agrobiological Sciences (NIAS). The accessions were classified into three major groups by principal components analysis (PCA) of the RFLP data: japonica, indica I, and indica II . Population structure analysis using Structure software (v. 2.1, http://pritch.bsd.uchicago.edu/software/structure2_1.html) was also conducted on the world rice collection to determine the population structure in more detail, especially for the japonica type [20, 21]. Based on the results of the PCA and Structure analyses we selected nine accessions each of japonica (temperate, tropical I, II, and III) and indica (indica I and indica II) to be included in the correlation analysis between genetic and metabolomic diversity.
RFLP genotyping and phylogenetic analysis of rice accessions
We used 128 RFLP markers to evaluate the genetic diversity of the 18 accessions by the Southern hybridization method of Kojima et al. (2005). Each of the marker alleles was scored as Nipponbare-type (1), Kasalath-type (2), or other types (3-7) (Additional file 3), and the presence or absence of each allele was recorded. The genetic distances between the 18 rice accessions were calculated with the "restdist" program of the PHYLIP (v. 3.67) package http://evolution.genetics.washington.edu/phylip.html; we used the distance matrix to calculate the coefficients of correlation with the metabolic diversity matrix and to calculate a phylogenetic tree by the UPGMA method using the "neighbor" program of PHYLIP. The physical position of each RFLP marker was retrieved from the Rice Annotation Project Database (RAP-DB) http://rapdb.dna.affrc.go.jp/.
NMR spectroscopy and quantitative analysis
We prepared NMR samples from rice seeds, as previously described [22, 23] with slight modifications. All 1D Watergate  spectra were acquired at 298 K on a Bruker Avance DRX 500 NMR spectrometer operating at 500.13 MHz and equipped with a 1H inverse triple-resonance probe with triple-axis gradients.
The 1D NMR spectra were integrated between 0.0 and 10.0 ppm over a series of 0.04-ppm regions by our custom integration software . After exclusion of the water and DSS (2,2-dimethyl-2-silapentane-5-sulfonate) resonances, each integral region was normalized to the total integral region (Additional file 2). The data were analyzed by a partial least-squares projection, based on the spectral bins obtained from 1D spectral analysis and by using a hierarchical clustering analysis package running on R software.
Correlation display of genetic and metabolic variation
A Euclidian distance matrix based on the 0.04-ppm integral regions of the chemical shift from 0.708 to 4.292 ppm (104 bins) in the 1D NMR spectra was calculated by using the "dist" command of the R package. Genetic diversity among the 18 rice accessions was calculated from the genotype data of all alleles of the 128 scored RFLP markers. To assess the correlation between genome- and metabolome-wide diversity among the 18 accessions, we calculated Spearman's coefficient of correlation between the genetic and Euclidian distance matrices. Local metabolic diversity among the 18 accessions was calculated as a Euclidian distance matrix in each of the 104 bins. Local genomic diversity was also calculated as a genetic distance matrix using sliding bins that included three adjacent RFLP markers along each rice chromosome. To explore the local correlations between genomic and metabolic diversity, we calculated Spearman's rs and p-values for all combinations of the local metabolic and genomic distance matrices using the "cor.test" program of the R package (Additional file 4). An RFLP map with correlative chromosomal positions was generated with MapChart ver. 2.2 .
Annotation of candidate metabolites by 2D NMR of 13C-labelled rice seeds
13C-labelled rice extracts were prepared as previously described methods with minor modifications [27, 28]. In brief, seeds were powdered, 20 mg of which was suspended in 600 μL standard buffer (100 mM potassium phosphate, pH 7.0, and 1.0 mM 2,2-dimethyl-2-silapentane-5-sulfonate in D2O), heated at 65°C for 15 min, and centrifuged at 12,000 × g for 5 min. The supernatant (500 μL) was decanted into a 5 mm diameter NMR tube.
Two-dimensional (2D) heteronuclear single quantum coherence (HSQC)  spectra were recorded on a Bruker Avance 500 spectrometer equipped with an inverse triple resonance CryoProbe with a Z-axis gradient, operating at 500.13 MHz for 1H frequency (176.061 MHz for 13C frequency), and the temperature of the NMR samples was maintained at 298 K. A total of 128 complex f1 (13C) and 1024 complex f2 (1H) points were recorded with 96 scans per f1 increment, resulting in a total recording time of about 6 h. The spectral window and offset frequency in the f1 dimension were 7042.593 Hz (40 ppm) and 11971.59 Hz (68 ppm), respectively. The spectral window in the f2 dimension was 11160.7 Hz (16 ppm). The offset frequency in the f2 dimension was 3330 Hz (4.75 ppm).
The 2D HSQC spectra of the 13C-labelled rice samples were processed using the NMRPipe software package . To quantify the signal intensities, a Lorentzian-to-Gaussian window with a Lorentzian line width of 10 Hz and a Gaussian line width of 15 Hz was applied in both dimensions before Fourier transformation. An automatic polynomial baseline correction was subsequently applied in the f1 dimension. The indirect dimension was zero-filled to 4096 points in the final data matrix. Cross peaks of each metabolite were matched to our standard 1H/13C chemical shift database . The database is implemented with an in-house Java program, which allows for the systematic batch identification of large numbers of metabolites by simply matching the observed 13C-HSQC peaks with peaks in the database. Our chemical shift data are continuously updated and are available on the PRIMe website http://prime.psc.riken.jp/. The queried peaks were classified as annotated on the basis of whether the chemical shift difference in each dimension between the observed peak and the peak in the database was less than a specified tolerance value. Typically, tolerances of 0.03 ppm for 1H and 0.53 ppm for 13C were used in this study (Additional file 8). From these candidate metabolites, an identification or assignment was defined as unique if there was only one candidate in the database within the specified tolerances for an observed peak (Additional file 6).
We thank E. Chikayama (RIKEN) for technical assistance and data analysis, and T. Hirayama (RIKEN) and D. Saisho (Okayama University) for their valuable discussion and suggestions in this study. We also thank Y. Nishizuka (RIKEN) for providing technical assistance in rice RFLP marker analysis. This work was supported in part by the Research and Development Program for New Bio-industry Initiatives of the Bio-oriented Technology Research Advancement Institution (BRAIN).
- Dumas ME, Wilder SP, Bihoreau MT, Barton RH, Fearnside JF, Argoud K, D'Amato L, Wallis RH, Blancher C, Keun HC, et al: Direct quantitative trait locus mapping of mammalian metabolic phenotypes in diabetic and normoglycemic rat models. Nat Genet. 2007, 39: 666-672. 10.1038/ng2026.View ArticlePubMedGoogle Scholar
- Mitchell-Olds T, Schmitt J: Genetic mechanisms and evolutionary significance of natural variation in Arabidopsis. Nature. 2006, 441: 947-952. 10.1038/nature04878.View ArticlePubMedGoogle Scholar
- Keurentjes JJ: Genetical metabolomics: closing in on phenotypes. Curr Opin Plant Biol. 2009, 12: 223-230. 10.1016/j.pbi.2008.12.003.View ArticlePubMedGoogle Scholar
- Schauer N, Semel Y, Roessner U, Gur A, Balbo I, Carrari F, Pleban T, Perez-Melis A, Bruedigam C, Kopka J, et al: Comprehensive metabolic profiling and phenotyping of interspecific introgression lines for tomato improvement. Nat Biotechnol. 2006, 24: 447-454. 10.1038/nbt1192.View ArticlePubMedGoogle Scholar
- Schauer N, Semel Y, Balbo I, Steinfath M, Repsilber D, Selbig J, Pleban T, Zamir D, Fernie AR: Mode of inheritance of primary metabolic traits in tomato. Plant Cell. 2008, 20: 509-523. 10.1105/tpc.107.056523.PubMed CentralView ArticlePubMedGoogle Scholar
- Fu J, Keurentjes JJ, Bouwmeester H, America T, Verstappen FW, Ward JL, Beale MH, de Vos RC, Dijkstra M, Scheltema RA, et al: System-wide molecular evidence for phenotypic buffering in Arabidopsis. Nat Genet. 2009, 41: 166-167. 10.1038/ng.308.View ArticlePubMedGoogle Scholar
- Lisec J, Meyer RC, Steinfath M, Redestig H, Becher M, Witucka-Wall H, Fiehn O, Torjek O, Selbig J, Altmann T, et al: Identification of metabolic and biomass QTL in Arabidopsis thaliana in a parallel analysis of RIL and IL populations. Plant J. 2008, 53: 960-972. 10.1111/j.1365-313X.2007.03383.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Morreel K, Goeminne G, Storme V, Sterck L, Ralph J, Coppieters W, Breyne P, Steenackers M, Georges M, Messens E, et al: Genetical metabolomics of flavonoid biosynthesis in Populus: a case study. Plant J. 2006, 47: 224-237. 10.1111/j.1365-313X.2006.02786.x.View ArticlePubMedGoogle Scholar
- Rowe HC, Hansen BG, Halkier BA, Kliebenstein DJ: Biochemical networks and epistasis shape the Arabidopsis thaliana metabolome. Plant Cell. 2008, 20: 1199-1216. 10.1105/tpc.108.058131.PubMed CentralView ArticlePubMedGoogle Scholar
- Laurentin H, Ratzinger A, Karlovsky P: Relationship between metabolic and genomic diversity in sesame (Sesamum indicum L.). BMC Genomics. 2008, 9: 250-10.1186/1471-2164-9-250.PubMed CentralView ArticlePubMedGoogle Scholar
- Wolde-Meskel E, Terefework Z, Frostegard A, Lindstrom K: Genetic diversity and phylogeny of rhizobia isolated from agroforestry legume species in southern Ethiopia. Int J Syst Evol Microbiol. 2005, 55: 1439-1452. 10.1099/ijs.0.63534-0.View ArticlePubMedGoogle Scholar
- Wolde-Meskel E, Terefework Z, Lindstrom K, Frostegard A: Metabolic and genomic diversity of rhizobia isolated from field standing native and exotic woody legumes in southern Ethiopia. Syst Appl Microbiol. 2004, 27: 603-611. 10.1078/0723202041748145.View ArticlePubMedGoogle Scholar
- Seymour FA, Cresswell JE, Fisher PJ, Lappin-Scott HM, Haag H, Talbot NJ: The influence of genotypic variation on metabolite diversity in populations of two endophytic fungal species. Fungal Genet Biol. 2004, 41: 721-734. 10.1016/j.fgb.2004.02.007.View ArticlePubMedGoogle Scholar
- Kojima Y, Ebana K, Fukuoka S, Nagamine T, Kawase M: Development of an RFLP-based Rice Diversity Research Set of Germplasm. Breeding Science. 2005, 55: 431-440. 10.1270/jsbbs.55.431.View ArticleGoogle Scholar
- Keurentjes JJ, Fu J, de Vos CH, Lommen A, Hall RD, Bino RJ, Plas van der LH, Jansen RC, Vreugdenhil D, Koornneef M: The genetics of plant metabolism. Nat Genet. 2006, 38: 842-849. 10.1038/ng1815.View ArticlePubMedGoogle Scholar
- Zhang F, Dossey AT, Zachariah C, Edison AS, Bruschweiler R: Strategy for automated analysis of dynamic metabolic mixtures by NMR. Application to an insect venom. Anal Chem. 2007, 79: 7748-7752. 10.1021/ac0711586.View ArticlePubMedGoogle Scholar
- Wang Y, Cloarec O, Tang H, Lindon JC, Holmes E, Kochhar S, Nicholson JK: Magic angle spinning NMR and 1H-31P heteronuclear statistical total correlation spectroscopy of intact human gut biopsies. Anal Chem. 2008, 80: 1058-1066. 10.1021/ac701988a.View ArticlePubMedGoogle Scholar
- Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S, et al: Gramene: a resource for comparative grass genomics. Nucleic Acids Res. 2002, 30: 103-105. 10.1093/nar/30.1.103.PubMed CentralView ArticlePubMedGoogle Scholar
- Tanaka T, Antonio BA, Kikuchi S, Matsumoto T, Nagamura Y, Numa H, Sakai H, Wu J, Itoh T, Sasaki T: The Rice Annotation Project Database (RAP-DB): 2008 update. Nucleic Acids Res. 2008, D1028-1033. 36 Database
- Ebana K, Kojima S, Fukuoka S, Uga Y, Kawase K, Okuno K: DNA polymorphism detected in the rice diversity research set of germplasm. 5th International Rice Genetics Symposium. 2005Google Scholar
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.PubMed CentralPubMedGoogle Scholar
- Kikuchi J, Shinozaki K, Hirayama T: Stable isotope labeling of Arabidopsis thaliana for an NMR-based metabolomics approach. Plant Cell Physiol. 2004, 45: 1099-1104. 10.1093/pcp/pch117.View ArticlePubMedGoogle Scholar
- Kikuchi J, Hirayama T: Practical aspects of uniform stable isotope labeling of higher plants for heteronuclear NMR-based metabolomics. Methods Mol Biol. 2007, 358: 273-286. full_text.View ArticlePubMedGoogle Scholar
- Piotto M, Saudek V, Sklenar V: Gradient-tailored excitation for single-quantum NMR spectroscopy of aqueous solutions. J Biomol NMR. 1992, 2: 661-665. 10.1007/BF02192855.View ArticlePubMedGoogle Scholar
- Liu X, Deng Z, Gao S, Sun X, Tang K: Molecular cloning and characterization of a glutathione S-transferase gene from Ginkgo biloba. DNA Seq. 2007, 18: 371-379. 10.1080/10425170701389063.View ArticlePubMedGoogle Scholar
- Voorrips RE: MapChart: software for the graphical presentation of linkage maps and QTLs. J Hered. 2002, 93: 77-78. 10.1093/jhered/93.1.77.View ArticlePubMedGoogle Scholar
- Sekiyama Y, Kikuchi J: Towards dynamic metabolic network measurements by multi-dimensional NMR-based fluxomics. Phytochemistry. 2007, 68: 2320-2329. 10.1016/j.phytochem.2007.04.011.View ArticlePubMedGoogle Scholar
- Tian CJ, Chikayama E, Tsuboi Y, Shinozaki K, Kikuchi J, Hirayama T: Top-down phenomics of Arabidopsis thaliana-One and two-dimensional NMR metabolic profiling and transcriptome analysis of albino mutants. J Biol Chem. 2007, 282: 18532-18541. 10.1074/jbc.M700549200.View ArticlePubMedGoogle Scholar
- Bodenhausen G, Ruben DJ: Natural Abundance N-15 Nmr by Enhanced Heteronuclear Spectroscopy. Chemical Physics Letters. 1980, 69: 185-189. 10.1016/0009-2614(80)80041-8.View ArticleGoogle Scholar
- Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A: Nmrpipe - a Multidimensional Spectral Processing System Based on Unix Pipes. J Biomol NMR. 1995, 6: 277-293. 10.1007/BF00197809.PubMedGoogle Scholar
- Chikayama E, Suto M, Nishihara T, Shinozaki K, Hirayama T, Kikuchi J: Systematic NMR analysis of stable isotope labeled metabolite mixtures in plant and animal systems: coarse grained views of metabolic pathways. PLoS ONE. 2008, 3: e3805-10.1371/journal.pone.0003805.PubMed CentralView ArticlePubMedGoogle Scholar
- Akiyama K, Chikayama E, Yuasa H, Shimada Y, Tohge T, Shinozaki K, Hirai MY, Sakurai T, Kikuchi J, Saito K: PRIMe: a Web site that assembles tools for metabolomics and transcriptomics. In Silico Biol. 2008, 8: 339-345.PubMedGoogle Scholar