EXPath: a database of comparative expression analysis inferring metabolic pathways for plants
© Chien et al.; licensee BioMed Central Ltd. 2015
Published: 21 January 2015
In general, the expression of gene alters conditionally to catalyze a specific metabolic pathway. Microarray-based datasets have been massively produced to monitor gene expression levels in parallel with numerous experimental treatments. Although several studies facilitated the linkage of gene expression data and metabolic pathways, none of them are amassed for plants. Moreover, advanced analysis such as pathways enrichment or how genes express under different conditions is not rendered.
Therefore, EXPath was developed to not only comprehensively congregate the public microarray expression data from over 1000 samples in biotic stress, abiotic stress, and hormone secretion but also allow the usage of this abundant resource for coexpression analysis and differentially expression genes (DEGs) identification, finally inferring the enriched KEGG pathways and gene ontology (GO) terms of three model plants: Arabidopsis thaliana, Oryza sativa, and Zea mays. Users can access the gene expression patterns of interest under various conditions via five main functions (Gene Search, Pathway Search, DEGs Search, Pathways/GO Enrichment, and Coexpression analysis) in EXPath, which are presented by a user-friendly interface and valuable for further research.
In conclusion, EXPath, freely available at http://expath.itps.ncku.edu.tw, is a database resource that collects and utilizes gene expression profiles derived from microarray platforms under various conditions to infer metabolic pathways for plants.
Plants, which are classified as the kingdom Plantae, provide source of energy and oxygen in ecosystems and the majority of agricultural production worldwide. To maintain the autotrophic mechanisms as well as the resistance to impacts from surroundings (e.g., extreme weather, soil salinity, and pests), the elaborate control of gene expression and collaboration under various environments or conditions at molecular level is critical and related to growth, development, and the yield of crop production in plants . Since the lack of motility compels plants to be more tolerant against the threat of external stresses, genes involved in stress-related response, signal transduction pathways, and the induced transcription factors (TFs) were progressively discovered through the comparative genomics approaches [2–5]. Moreover, phytohormones, which are believed to modulate plant growth and diverse development processes, have been reported in relation to environmental variation in Arabidopsis and maize [6–8]. The evidence reveals that the complexity of gene regulation in significant pathways or biochemical reactions plays an important role in coping with plants survival and their self-defense mechanisms towards different circumstances.
In general, the expression of gene alters conditionally to catalyze a specific metabolic pathway [9, 10]. Comprehensively investigating how genes are activated or repressed, i.e., differentially expressed genes (DEGs), in vital biological processes under various conditions are essential to understand gene functions and the coexpression manner in metabolic routes. In recent decades, microarray-based datasets have been massively produced to monitor gene expression levels in parallel with numerous experimental treatments . This high-throughput detection of transcript quantity facilitates the comparative expression analysis by combining multiple microarray expression data among different samples and even different species . Due to the abundance of expression datasets generated by microarray platforms for plants, a plenty of databases and resources have promptly collected gene expression data that are publicly accessible. Among them, Gene Expression Omnibus (GEO) provides most profuse microarray expression datasets presented with the function of GEO DataSets, GEO Profiles, and GEO2R Analysis . Although GEO2R Analysis allows users to compare multiple expression data and then identify DEGs, GEO Profiles can only display the expression level of one gene across different samples in each dataset. Another powerful tool, eFP Browser, is easily adaptable for analyzing microarray or other large-scale datasets in plants by using pictographic representations . Additionally, PLEXdb, GENEVESTIGATOR, NASCArrays, and RiceXPro are also useful repositories for microarray gene expression profiles in Arabidopsis, rice, and plants [15–18]. On the other hand, to gain a comprehensive insight into plant metabolic pathways that are consisted of metabolites and enzymes, relevant databases were established recently. Gramene, a comparative resource for plants, summarizes ten databases of plant metabolic pathways, e.g., AraCyc, RiceCyc, MaizeCyc, BrachyCyc and SorghumCyc . Moreover, MetNet Online integrates information of metabolic pathways and regulatory networks for Arabidopsis thaliana, Glycine max and Vitis vinifera . Other instances of similar pathway knowledge bases are Arabidopsis reactome and Pathway studio [21, 22].
Construction and content
Repository for microarray gene expression data
Categories of microarray samples in EXPath expression database.
# of samples
P.syringae pv.tomato DC3000
Magnaporthe oryzae strain Guy11
X. oryzae pv. oryzae
X. oryzae pv. oryzicola
Sporisorium reilianum f. sp. Zeae (Kühn)
Data processing and normalization
After processing the raw data of microarray datasets, the normalization procedure was executed to avoid systematic biases arising from the variation between different trials (GEO series/GSEs) and samples. In this work, robust multi-array average (RMA) was performed by using the justRMA function in affy package, which is a part of the BioConductor project [27, 28]. For those genes with raw intensities from multiple probes and replicates, we first filtered the outliers by the interquartile range (IQR) rule and retained data between the upper and lower quartiles. Then, the average of all reserved replicates was calculated to represent the expression level for each gene in given conditions.
Collection of annotation files
EXPath offers gene general information including descriptions, cDNA and protein sequences, Pfam protein families, GO terms, and involved pathways for users' reference. The annotation files of Arabidopsis thaliana, Oryza sativa, and Zea mays were downloaded from TAIR10, RAP-DB, and MaizeGDB separately [29–31]. For descriptions, cDNA and protein sequences, and Pfam protein families, except the descriptions of Arabidopsis thaliana were from TAIR10 and the descriptions and sequences of Oryza sativa were from RAP-DB, other datasets were acquired by using Ensembl BioMarts . The latest GO terms and involved pathways were collected from gene ontology consortium and KEGG database [33, 34].
Comparative expression analysis
Differentially expressed genes
To determine genes that are differentially expressed under given conditions, t-test statistic method was applied by using function t.test() of R package in EXPath. Users can specify a treatment from biotic stress, abiotic stress, or hormone secretion that are well-categorized for three model plants, and then set the time point, fold change and p-value cutoffs. Statistics of fold change and DEG lists (up-regulated and down-regulated) are also provided in EXPath.
Coexpression gene groups
Co-expressed genes are a group of genes that express simultaneously under specific conditions. Theoretically, they tend to be controlled under similar transcriptional regulation and involve in identical biological processes or pathways. To investigate this concept, we calculated the coexpression levels of 111 KEGG pathways with number of genes more than 10 by using Pearson's correlation coefficient (PCC) in Arabidopsis thaliana. Among them, 92.6% of pathways are positively correlated with satisfied PCCs (most of them are between 0.6 ~ 0.9, see Figure S1, Additional file 1), which suggest that genes involved in the same pathway are generally co-expressed. In EXPath, Pearson's correlation coefficient and Spearman's rank correlation coefficient are applied by using cor() functions in R package to identify genes with co-expression patterns. Normalized raw intensities of genes without log transformation were used to calculate correlation coefficient because it may alter original expression levels that we mentioned previously . Users can customize positive/negative correlation and the conditions (abiotic stress, biotic stress, hormone treatment, and overall conditions) they intend to explore. The expression patterns of coexpression gene groups are illustrated based on z-score transformation:
The character z denotes z-score in the above formula, whereas x, μ, and σ represent the raw intensity, mean, and standard deviation of gene expression levels respectively.
Enriched KEGG pathways and GO terms
where N and M denote the number of background genes and total genes involved in specific KEGG pathways or GO terms, whereas i genes out of n genes in the gene group × belong to that KEGG pathways or GO terms. The usage of dhyper () and phyper () in R were applied to obtain hypergeometric p-values for each gene group.
Utility and discussion
Basic implement in EXPath
Advanced combination analysis in EXPath
In addition to explore five functions provided in EXPath separately, all of them are connected with each other using the linkage buttons or hyperlinks in output webpages. For example, in the Gene Search result page, EXPath not only maps the query gene to KEGG pathways to illustrate the involvement of that gene in the corresponding pathway map with its microarray expression levels under specified conditions but also furnishes the linkage button for performing coexpression analysis. Furthermore, advanced combination analysis, the most practical application in EXPath, exposes the powerful pipeline for comparative expression analysis in plants. By combining DEGs Search with Pathways/GO Enrichment, the differentially expressed genes between control and treatment samples are identified first. Then, users can designate up-regulated genes, down-regulated genes, or all DEGs to perform Pathways/GO Enrichment. The enriched KEGG pathways or GO terms of DEGs helps plant scientists to understand, for instance, the resistance to abiotic stresses, pathogenicity of microbes or viruses, and even hormone treatments. Another combination analysis of Coexpression analysis and Pathways/GO Enrichment aims to distinguish coexpressed genes for exhaustively inferring gene functions and their biological roles. A case study given below describes the details of this application.
Case study: the JAZ10
Enriched pathways of JAZ10 coexpressed gene group (partial, only shows top 10 results).
Hit number (Query)
Percentage in query
alpha-Linolenic acid metabolism
Linoleic acid metabolism
Phenylalanine, tyrosine and tryptophan biosynthesis
Plant hormone signal transduction
Isoquinoline alkaloid biosynthesis
Biosynthesis of amino acids
Tropane, piperidine and pyridine alkaloid biosynthesis
Enriched GO terms of JAZ10 coexpressed gene group (partial, only shows top 10 results).
Hit number (Query)
Percentage in query
response to jasmonic acid
response to wounding
jasmonic acid biosynthetic process
response to fungus
abscisic acid-activated signaling pathway
jasmonic acid mediated signaling pathway
response to stress
response to ethylene
hyperosmotic salinity response
EXPath is an overarching repository geared towards plant scientists to facilitate the retrieval of microarray gene expression data from publicly available resources and the analysis of comparative expression. As the novel database integrating gene expression data with metabolic pathways, the inferred pathways give an insight into the discovery of gene functions, pathogenicity of external invasion, and defense mechanisms for plants. By the usage of five main functions (i.e., Gene Search, Pathway Search, DEGs Search, Pathways/GO Enrichment, and Coexpression analysis) and the advanced combination analysis of them, EXPath indeed provides an effective interface for users to explore the information of interest that will be valuable for further research. Although EXPath facilitates the comparison of expression levels among genes involved in designated pathways, the limited number of plant genes recruited in KEGG database restricts the availability for comparative expression analysis. Another limitation is insufficient expression datasets in public for other plants rather than Arabidopsis, rice, and maize. For perspectives, in addition to the expectation of more available plant genes in KEGG database, we will keep surveying any relevant sample with expression profile released in public, especially for those derived from the treatments of biotic stress, abiotic stress, hormone secretion, and even development.
Availability and requirements
The EXPath database is publicly available at http://EXPath.itps.ncku.edu.tw.
This research was supported by a grant from National Science Council of the Republic of China for financially supporting this research under Contract NSC 102-2313-B-006 -004 and MOST 103-2311-B-006 -001.
This article has been published as part of BMC Genomics Volume 16 Supplement 2, 2015: Selected articles from the Thirteenth Asia Pacific Bioinformatics Conference (APBC 2015): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/16/S2
- Atkinson NJ, Urwin PE: The interaction of plant biotic and abiotic stresses: from genes to the field. Journal of experimental botany. 2012, 63 (10): 3523-3543. 10.1093/jxb/ers100.View ArticlePubMedGoogle Scholar
- Wang W, Vinocur B, Altman A: Plant responses to drought, salinity and extreme temperatures: towards genetic engineering for stress tolerance. Planta. 2003, 218 (1): 1-14. 10.1007/s00425-003-1105-5.View ArticlePubMedGoogle Scholar
- Cushman JC, Bohnert HJ: Genomic approaches to plant stress tolerance. Current opinion in plant biology. 2000, 3 (2): 117-124. 10.1016/S1369-5266(99)00052-7.View ArticlePubMedGoogle Scholar
- Mittler R: Abiotic stress, the field environment and stress combination. Trends in plant science. 2006, 11 (1): 15-19. 10.1016/j.tplants.2005.11.002.View ArticlePubMedGoogle Scholar
- Rizhsky L, Liang H, Shuman J, Shulaev V, Davletova S, Mittler R: When defense pathways collide. The response of Arabidopsis to a combination of drought and heat stress. Plant physiology. 2004, 134 (4): 1683-1696. 10.1104/pp.103.033431.PubMed CentralView ArticlePubMedGoogle Scholar
- Ren H, Gao Z, Chen L, Wei K, Liu J, Fan Y, Davies WJ, Jia W, Zhang J: Dynamic analysis of ABA accumulation in relation to the rate of ABA catabolism in maize tissues under water deficit. Journal of experimental botany. 2007, 58 (2): 211-219.View ArticlePubMedGoogle Scholar
- Gray WM: Hormonal regulation of plant growth and development. PLoS biology. 2004, 2 (9): E311-10.1371/journal.pbio.0020311.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Liu C, Li K, Sun F, Hu H, Li X, Zhao Y, Han C, Zhang W, Duan Y, et al: Arabidopsis EIN2 modulates stress response through abscisic acid response pathway. Plant molecular biology. 2007, 64 (6): 633-644. 10.1007/s11103-007-9182-7.View ArticlePubMedGoogle Scholar
- Boavida LC, Borges F, Becker JD, Feijo JA: Whole genome analysis of gene expression reveals coordinated activation of signaling and metabolic pathways during pollen-pistil interactions in Arabidopsis. Plant physiology. 2011, 155 (4): 2066-2080. 10.1104/pp.110.169813.PubMed CentralView ArticlePubMedGoogle Scholar
- Yonekura-Sakakibara K, Tohge T, Matsuda F, Nakabayashi R, Takayama H, Niida R, Watanabe-Takahashi A, Inoue E, Saito K: Comprehensive flavonol profiling and transcriptome coexpression analysis leading to decoding gene-metabolite correlations in Arabidopsis. The Plant cell. 2008, 20 (8): 2160-2176. 10.1105/tpc.108.058040.PubMed CentralView ArticlePubMedGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270 (5235): 467-470. 10.1126/science.270.5235.467.View ArticlePubMedGoogle Scholar
- Movahedi S, Van Bel M, Heyndrickx KS, Vandepoele K: Comparative co-expression analysis in plant biology. Plant, cell & environment. 2012, 35 (10): 1787-1798. 10.1111/j.1365-3040.2012.02517.x.View ArticleGoogle Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al: NCBI GEO: archive for functional genomics data sets--update. Nucleic acids research. 2013, D991-995. 41 DatabaseGoogle Scholar
- Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ: An "Electronic Fluorescent Pictograph" browser for exploring and analyzing large-scale biological data sets. PloS one. 2007, 2 (8): e718-10.1371/journal.pone.0000718.PubMed CentralView ArticlePubMedGoogle Scholar
- Dash S, Van Hemert J, Hong L, Wise RP, Dickerson JA: PLEXdb: gene expression resources for plants and plant pathogens. Nucleic acids research. 2012, D1194-1201. 40 DatabaseGoogle Scholar
- Sato Y, Takehisa H, Kamatsuki K, Minami H, Namiki N, Ikawa H, Ohyanagi H, Sugimoto K, Antonio BA, Nagamura Y: RiceXPro version 3.0: expanding the informatics resource for rice transcriptome. Nucleic acids research. 2013, D1206-1213. 41 DatabaseGoogle Scholar
- Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W: GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant physiology. 2004, 136 (1): 2621-2632. 10.1104/pp.104.046367.PubMed CentralView ArticlePubMedGoogle Scholar
- Craigon DJ, James N, Okyere J, Higgins J, Jotham J, May S: NASCArrays: a repository for microarray data generated by NASC's transcriptomics service. Nucleic acids research. 2004, D575-577. 32 DatabaseGoogle Scholar
- Monaco MK, Stein J, Naithani S, Wei S, Dharmawardhana P, Kumari S, Amarasinghe V, Youens-Clark K, Thomason J, Preece J, et al: Gramene 2013: comparative plant genomics resources. Nucleic acids research. 2014, D1193-1199. 42 DatabaseGoogle Scholar
- Sucaet Y, Wang Y, Li J, Wurtele ES: MetNet Online: a novel integrated resource for plant systems biology. BMC bioinformatics. 2012, 13: 267-10.1186/1471-2105-13-267.PubMed CentralView ArticlePubMedGoogle Scholar
- Nikitin A, Egorov S, Daraselia N, Mazo I: Pathway studio--the analysis and navigation of molecular networks. Bioinformatics. 2003, 19 (16): 2155-2157. 10.1093/bioinformatics/btg290.View ArticlePubMedGoogle Scholar
- Tsesmetzis N, Couchman M, Higgins J, Smith A, Doonan JH, Seifert GJ, Schmidt EE, Vastrik I, Birney E, Wu G, et al: Arabidopsis reactome: a foundation knowledgebase for plant systems biology. The Plant cell. 2008, 20 (6): 1426-1436. 10.1105/tpc.108.057976.PubMed CentralView ArticlePubMedGoogle Scholar
- Jensen PA, Papin JA: Functional integration of a metabolic network model and expression data without arbitrary thresholding. Bioinformatics. 2011, 27 (4): 541-547. 10.1093/bioinformatics/btq702.View ArticlePubMedGoogle Scholar
- Beltrame L, Bianco L, Fontana P, Cavalieri D: Pathway Processor 2.0: a web resource for pathway-based analysis of high-throughput data. Bioinformatics. 2013, 29 (14): 1825-1826. 10.1093/bioinformatics/btt292.PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng HQ, Chiang-Hsieh YF, Chien CH, Hsu BK, Liu TL, Chen CN, Chang WC: AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. BMC genomics. 2014, 15: 196-10.1186/1471-2164-15-196.PubMed CentralView ArticlePubMedGoogle Scholar
- Kilian J, Whitehead D, Horak J, Wanke D, Weinl S, Batistic O, D'Angelo C, Bornberg-Bauer E, Kudla J, Harter K: The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. The Plant journal: for cell and molecular biology. 2007, 50 (2): 347-363. 10.1111/j.1365-313X.2007.03052.x.View ArticleGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4 (2): 249-264. 10.1093/biostatistics/4.2.249.View ArticlePubMedGoogle Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20 (3): 307-315. 10.1093/bioinformatics/btg405.View ArticlePubMedGoogle Scholar
- Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, et al: The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids research. 2012, D1202-1210. 40 DatabaseGoogle Scholar
- Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, Wakimoto H, Yang CC, Iwamoto M, Abe T, et al: Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant & cell physiology. 2013, 54 (2): e6-10.1093/pcp/pcs183.View ArticleGoogle Scholar
- Schaeffer ML, Harper LC, Gardiner JM, Andorf CM, Campbell DA, Cannon EK, Sen TZ, Lawrence CJ: MaizeGDB: curation and outreach go hand-in-hand. Database: the journal of biological databases and curation. 2011, 2011: bar022-View ArticlePubMedGoogle Scholar
- Kinsella RJ, Kahari A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A, et al: Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database: the journal of biological databases and curation. 2011, 2011: bar030-View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M: Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic acids research. 2014, D199-205. 42 DatabaseGoogle Scholar
- Chien CH, Chiang-Hsieh YF, Tsou AP, Weng SL, Chang WC, Huang HD: Large-Scale Investigation of Human TF-miRNA Relations Based on Coexpression Profiles. BioMed research international. 2014, 2014: 623078-PubMed CentralPubMedGoogle Scholar
- Vanholme B, Grunewald W, Bateman A, Kohchi T, Gheysen G: The tify family previously known as ZIM. Trends in plant science. 2007, 12 (6): 239-244. 10.1016/j.tplants.2007.04.004.View ArticlePubMedGoogle Scholar
- Chini A, Fonseca S, Fernandez G, Adie B, Chico JM, Lorenzo O, Garcia-Casado G, Lopez-Vidriero I, Lozano FM, Ponce MR, et al: The JAZ family of repressors is the missing link in jasmonate signalling. Nature. 2007, 448 (7154): 666-671. 10.1038/nature06006.View ArticlePubMedGoogle Scholar
- Moreno JE, Shyu C, Campos ML, Patel LC, Chung HS, Yao J, He SY, Howe GA: Negative feedback control of jasmonate signaling by an alternative splice variant of JAZ10. Plant physiology. 2013, 162 (2): 1006-1017. 10.1104/pp.113.218164.PubMed CentralView ArticlePubMedGoogle Scholar
- Chung HS, Howe GA: A critical role for the TIFY motif in repression of jasmonate signaling by a stabilized splice variant of the JASMONATE ZIM-domain protein JAZ10 in Arabidopsis. The Plant cell. 2009, 21 (1): 131-145. 10.1105/tpc.108.064097.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.