- Database
- Open access
- Published:
GMMAD: a comprehensive database of human gut microbial metabolite associations with diseases
BMC Genomics volume 24, Article number: 482 (2023)
Abstract
Background
The natural products, metabolites, of gut microbes are crucial effect factors on diseases. Comprehensive identification and annotation of relationships among disease, metabolites, and microbes can provide efficient and targeted solutions towards understanding the mechanism of complex disease and development of new markers and drugs.
Results
We developed Gut Microbial Metabolite Association with Disease (GMMAD), a manually curated database of associations among human diseases, gut microbes, and metabolites of gut microbes. Here, this initial release (i) contains 3,836 disease-microbe associations and 879,263 microbe-metabolite associations, which were extracted from literatures and available resources and then experienced our manual curation; (ii) defines an association strength score and a confidence score. With these two scores, GMMAD predicted 220,690 disease-metabolite associations, where the metabolites all belong to the gut microbes. We think that the positive effective (with both scores higher than suggested thresholds) associations will help identify disease marker and understand the pathogenic mechanism from the sense of gut microbes. The negative effective associations would be taken as biomarkers and have the potential as drug candidates. Literature proofs supported our proposal with experimental consistence; (iii) provides a user-friendly web interface that allows users to browse, search, and download information on associations among diseases, metabolites, and microbes. The resource is freely available at http://guolab.whu.edu.cn/GMMAD.
Conclusions
As the online-available unique resource for gut microbial metabolite-disease associations, GMMAD is helpful for researchers to explore mechanisms of disease- metabolite-microbe and screen the drug and marker candidates for different diseases.
Background
Our gut harbours trillions of microbes which play essential roles in many physiological and pathological processes. The disturbances of gut microbiome homeostasis can cause many diseases. For example, the enrichment of Fusobacterium nucleatum can induce colorectal cancer metastasis [1]. Many other diseases are also affected by microbes, such as Parkinson disease and diabetes, yet their mechanisms are still unclear [2, 3]. Thanks to advances in technologies such as sequencing, we are now able to observe and analyse the composition and status of gut microbes. This can help us to diagnose diseases early and even propose new treatments based on the increase and decrease of different microbial populations in diseases [4]. Although numerous studies uncover the mechanism among diseases, microbes, and metabolites, these associations are still scattered in the literature [5].
In recent decades, more and more studies have shown that microbes affect human health through their metabolites [6]. The metabolite is one of the key factors that drive the interaction between human gut microbes and diseases. For example, Trimethylamine N-oxide (TMAO) is a metabolite derived from the gut microbiota, which has been widely reported to be associated with cardiovascular disease [7]. Recently, some studies also reported that TMAO might be a key activator of antitumor immunity [8, 9]. Another metabolite associated with cardiovascular disease is phenylacetylglutamine (PAGln). Stanley L. Hazen et al. found gut microbes that converted dietary phenylalanine into phenylacetic acid. Then, they combined with Gln and generated the PAGln. PAGln could enhance platelet activation and be associated with adverse cardiovascular events such as myocardial infarction and stroke [10]. Based on these findings, the metabolites are believed to have the potential to act as drugs of various disease, and the disruption of metabolites also identified as signatures of many diseases.
Some resources focused on the association of disease-microbe or microbe-metabolite. For example, HMDAD collect text-mining-based microbe-disease associations from peer-reviewed publications [11]. The current version contains 483 disease-microbe entries, which include 39 diseases and 292 microbes. The gutMDisorder is a main repository for dysbiosis of the gut microbiota in disorders and interventions, [12] which contains 2,263 associations between 579 gut microbes and 123 disorders or 77 interventions in Human. The mBodyMap is a curated database for tissue microbes’ associations with diseases and health [13], which contains a total of 63,148 runs, including 14,401 metagenomes and 48,747 amplicons related to health and 56 human diseases. Some researchers also developed some predicting methods to dig the potential disease-microbe associations. NTSHMDA can predict the human microbe-disease association based on random walk by integrating network topological similarity [14]. ABHMDA is a model to reveal the associations between diseases and microbes by calculating the relation probability of a disease-microbe pair using a strong classifier, which can be applied to new disease without any known related microbes [15]. Many other methods and models can predict disease-microbe associations [16,17,18,19]. On the other hand, there are a few resources that store the metabolites of microbes, such as VMH [20] and NJS16 [21]. However, there is still a shortage of resources of diseases’ associations with metabolites of gut microbes.
To fill this gap, we constructed the Gut Microbial Metabolite Association with Disease (GMMAD), which aims to provide users comprehensive metabolism related information of gut microbes in human diseases. Users can easily browse associations among diseases, microbes, and metabolites stored in GMMAD, and screen the potential candidate disease signatures or drugs from our predicted disease-metabolite associations (Fig. 1). The current version of GMMAD documents the 45,058 meaningful disease-metabolite associations which are predicted and involved with 31 diseases and 2,299 metabolites. We expect GMMAD to be a timely and valuable resource for gut microbes’ research.
Materials and methods
Data collection for the first two associations
All the names of diseases in GMMAD followed the principle from the Medical Subject Heading disease categories [22], and it contained 113 diseases. The annotation information of microbes was collected from the Taxonomy database [23]. All the metabolites’ details were obtained from the PubChem database [24]. For the association data, we integrated several public databases. The disease-microbe associations were collected from HMDAD, gutMDisorder, and Peryton [25]. All data from other public databases should meet the naming standards of disease (MeSH), microbe (ncbi Taxonomy) and metabolite (PubChem). Because most of the current sequencing technologies for microbes, such as metagenomic sequencing and 16sRNA sequencing, can accurately identify only at the genus level, the microbes in our disease-microbe associations are all defined on the genus level. The microbe-metabolite associations were obtained from VMH [20], NJS16 [21], and the study of Han S. et al. [6]. The microbes collected from VMH were in the strain level. We map the genus level of gut microbes for disease-microbe association to the strain level of metabolite-microbe association based on lineage data obtained from Taxonomy. In detail, if microbes were found to increase\decrease under the genus level in a particular disease, then it is assumed that all strains included in that genus have increased/decreased at the disease status. For example, if Faecalibacterium (in genus level) is decreased in obesity, it is assumed that the Faecalibacterium cf. prausnitzii KLE1255 and Faecalibacterium prausnitzii A2-165 (two strains of microbes belonging to Faecalibacterium) were both decreased in obesity. Furthermore, we used the keywords combination to search related literatures in PubMed, such as ‘gut microbiota AND cancer AND metabolites’, ‘intestinal microbiota AND diabetes AND metabolites’, etc. Then, we downloaded all published literature and available supplementary files describing the associations among diseases, microbes, and metabolites. Finally, we manually extracted experimentally supported associations from selected articles by at least two researchers. For all the association, we removed the contradictory entries to ensure that the data in GMMAD were harmonious. The redundant entries stored in the GMMAD database were removed (upper part of Fig. 1). Values of the above two association are discrete with the first one having two directions and the latter one being “appearing” or “deficiency”. For the latter one, we only record those metabolites appearing in one specific microbe.
Association strength score and confidence score
For the core disease-metabolite relationships, we defined one quantitative value. We supposed that common metabolites in multiple decreased numbers were beneficial to treat specific human disease, while the ones in multiple increased microbes would promote the development of that disease [26, 27]. We defined an association strength score to infer the association between the diseases and metabolites of gut microbes. We defined the number of microbes which increase/decrease in a specific disease as M/N, and the number of the increasing and decreasing microbes which generated a specific metabolite as m/n. The formula for the association strength score is:
Meanwhile, we defined a confidence score to judge the credibility of association strength scores:
Because the microbial levels in the two associations were inconsistent (disease-microbe with genus level and microbe-metabolite with strain level), we used the affiliation data of microbial lineage (finally, microbe in strain level) to establish the association. \({S}_{\mathrm{as}}\) of three and five diseases are all 1 and -1, respectively. After checking, it was found that only one microbe associated with these diseases, such as ‘Colitis, Microscopic’ and ‘Gastroesophageal Reflux’. \({S}_{\mathrm{as}}\) simulated the abundant alternation of a specific metabolite by counting the increasing or decreasing number of microbial strains in a disease sample compared to a normal sample’s gut. If a metabolite appears in only decreased microbes, its association score \({S}_{ac}\) will have a very large negative value and in principle, it will be beneficial for the treatment of the disease or at least strongly associated with the disease. Hence, a strong positive correlation score indicated that the metabolite might be considered as a potential marker of disease, while a strong negative correlation score indicated that the metabolite might be a candidate drug for disease or potential diagnostic marker. We only considered those with the absolute value of association strength score > 0.05 and confidence score > 1 to be meaningful.
Database construction
Finally, all data in GMMAD are stored and managed using MySQL (version 5.6.50). The web interfaces were built in HTML/CSS/JS on Linux and Apache platform. The data processing programs are written in Python (version 2.7.5). The GMMAD has been tested in the Google Chrome (version 104.0.5112.81) and Firefox (version 101.0.1) browsers. The GMMAD database is freely available at http://guolab.whu.edu.cn/GMMAD.
Results
Database content
With our method, we predicted 220,690 entries of disease-metabolite associations, the confidence scores are distributed from -1 to 1 (Fig. 2). Integrating and compiling all the associations we obtained from public databases and papers, GMMAD contains 3,836 disease-microbe associations and 879,263 microbe-metabolite associations which involve 113 diseases, 893 gut microbes, and 2,448 metabolites of gut microbes (Fig. 3A). Finally, there are 45,058 meaningful disease-metabolite associations by scoring prediction, which involve 31 diseases and 2,299 metabolites. Since each microbe produces thousands of metabolites and each disease has hundreds of associated microbes, the number of associations for the three elements disease-microbe-metabolite is huge. Showing all these associations is not helpful for user search, and it is hard for database maintenance. Thus, GMMAD only shows associations between any two of the three terms: disease, microbe and metabolite. For each entry, GMMAD restored the human disease name (e.g. Crohn disease), microbe name (e.g. Faecalibacterium), metabolite name (e.g. Actinomyces georgiae DSM 6843), alteration pattern (e.g. increase and decrease), the experimental method (e.g. 16S rRNA sequences), alteration strength score and confidence score of disease-metabolite associations, disease detail (e.g. disease description), metabolite information (e.g. charged formula, average molecular mass), and microbe information. The ID of disease, microbe, metabolite and PMID were also provided which could be linked to MeSH, Taxonomy, PubChem and PubMed for more details.
Figure 3B-E show the top 10 diseases, microbes, and metabolites in the different types of associations. The colorectal neoplasm is the disease with the most entries (Fig. 3B). This suggested that colorectal neoplasm is closely connected to gut microbes and their metabolites. Bacteroides are the microbe which associated with the most diseases (Fig. 3C). In addition, Yokenella regensburgei ATCC 43003 and ala_L (L-Alanine) are the microbe and metabolite with the most entries in microbe-metabolite associations (Fig. 3D, E).
GMMAD implementation
The GMMAD provides a user-friendly interface that allows users to browse, search, and download (lower part of Fig. 1). The tree browser organizes the data according to three types of associations: “Disease-Metabolite”, “Disease-Microbe” and “Microbe-Metabolite”. Each type of root category is divided into several sub-categories named “Disease”, “Metabolite”, “Microbe” and “Validated associations” (Fig. 4). By clicking one term in these sub-categories, all associations belonging to the corresponding term would be listed on the right table. On the ‘Search’ page, users can search associations according to name and ID of diseases, microbes, and metabolites (Fig. 4). Example names for searching are provided when users click the “Example” on the bottom of each search box. In the results table, users can click the icon in the upper right corner to download the search results. The ‘Detail’ link of each entry leads to the more detailed information including the microbes’ family, description of diseases, formula of metabolites, samples information and so on (Fig. 4). The ‘Download’ page allows users to download all information on disease-metabolite, disease-microbe and microbe-metabolite associations data in.txt format. GMMAD also provides external links to the related public databases. All tutorials and details are shown in the ‘Help’ page.
Validation of predicted disease-metabolite association
To prioritize and filter the meaning of disease-metabolite association and reveal the most interesting results, we defined an association strength score and a confidence score. As a rough evaluation of the accuracy of our method, we got 138 experimental disease-metabolite association from 49 published studies. Among them, 36 associations have meaning scores and 26 (72.2%) could be given consistent direction of disease-metabolite association (Supplementary table 1). We do not aim to pick out all the disease-metabolites associations, and our main purpose is to make our chosen association genuine in high probability and deserved to be studied as drug or marker candidates.
For example, by querying “Diabetes Mellitus, Type 2” (T2DM) in the “Search (Disease-Metabolite)” query box, GMMAD provides the 2137 T2DM associated metabolites. Each entry has an association strength score and a confidence score, which the user can browse in order of score value. For example, we sort all the “Diabetes Mellitus, Type 2” related entries according to their association strength scores, and the “nadide” is the third strongest negative association score with the T2DM (Sas = -0.625 and Sac = 3.13). Nadide (also called NAD + , Nicotinamide Adenine Dinucleotide) is an indispensable enzyme in the human body and an important coenzyme in the tricarboxylic acid cycle. It plays a crucial role in various biological processes such as human metabolism, stress, and cell differentiation. In patients with T2DM, NAD + synthesis is severely impaired [28, 29]. Many researchers are trying to promote the synthesis of NAD + as a treatment for patients with type 2 diabetes mellitus, and several drugs based on this mechanism have been developed [30, 31]. Similarly, the “Melatonin” is the fifth strongest negative association score with the T2DM (Sas = -0.500 and Sac = 2.00). Melatonin is well known for its sleep-promoting effects. In recent years, studies had found that melatonin has a certain hypoglycaemic effect, and its mechanism might be related to improving insulin resistance, protecting pancreatic β-cells, and regulating the hypothalamus–pituitary–adrenal axis. These findings reveal the importance of melatonin and melatonin-related bacteria and metabolites as potential therapeutic targets for type 2 diabetes [32, 33]. All of these demonstrate that our method can effectively expose the useful disease-metabolite associations.
Discussion
Despite a lot of research on the association between gut microbes and diseases, the mechanism of gut microbes in various diseases is still unclear. In recent years, more and more researchers have found that gut microbes affect through their metabolites in diseases [34, 35].
Currently, there are many disease-microbe databases have been built, such as HMDAD [11], gutMDisorder [12] and Peryton [25]. To the best of our knowledge, none of these databases focuses on the association data of disease and metabolites of gut microbes. However, the current version of our GMMAD contains 45,058 meaningful disease-metabolite associations.
As the resource uniquely stores the quantitative association of disease and gut microbes’ metabolites, GMMAD will suggest potential markers and drugs for specific diseases. When used as drug candidates for these metabolites, the safety of these candidates will be less concerning than usual drug candidates because they naturally exist in the human gut environment. Simultaneously, the association due to their predictive essence should be experimentally validated by the research in practical application. The role of our database is to pick out those most possible drug and biomarker candidate from the gut microbes’ metabolites and ranked them with quantitative scores. By referencing our database, the experimental scientists have the refined candidates in their hands, and they deserve to pay more attention. With the accumulation of data, the one sense is the promotion of associations between the disease and microbial strains, the other sense is that the metabolome research could provide more reliable microbe-metabolite associations. We will update the database with a higher quality list of associations between specific disease and metabolites of gut microbes. In the future, besides the content update, more features will be added to the database, such as the ability to search for broad chemical classes of compounds beyond specific compounds. Meanwhile, we will provide visualization tools for showing the associations network.
Availability of data and materials
GMMAD is publicly available: http://guolab.whu.edu.cn/GMMAD.
References
Chen S, Su T, Zhang Y, Lee A, He J, Ge Q, et al. Fusobacterium nucleatum promotes colorectal cancer metastasis by modulating KRT7-AS/KRT7. Gut Microbes. 2020;11(3):511–25.
Massey W, Brown JM. The gut microbial endocrine organ in type 2 diabetes. Endocrinology. 2021;162(2):bqaa235.
Tan AH, Lim SY, Lang AE. The microbiome-gut-brain axis in Parkinson disease - from basic research to the clinic. Nat Rev Neurol. 2022;18(8):476–95.
Chen Y, Zhou J, Wang L. Role and Mechanism of Gut Microbiota in Human Disease. Front Cell Infect Microbiol. 2021;11:625913.
Dang AT, Marsland BJ. Microbes, metabolites, and the gut-lung axis. Mucosal Immunol. 2019;12(4):843–50.
Han S, Van Treuren W, Fischer CR, Merrill BD, DeFelice BC, Sanchez JM, et al. A metabolomics pipeline for the mechanistic interrogation of the gut microbiome. Nature. 2021;595(7867):415–20.
Wang Z, Klipfell E, Bennett BJ, Koeth R, Levison BS, Dugar B, et al. Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature. 2011;472(7341):57–63.
Wang H, Rong X, Zhao G, Zhou Y, Xiao Y, Ma D, et al. The microbial metabolite trimethylamine N-oxide promotes antitumor immunity in triple-negative breast cancer. Cell Metab. 2022;34(4):581-94 e8.
Mirji G, Worth A, Bhat SA, El Sayed M, Kannan T, Goldman AR, et al. The microbiome-derived metabolite TMAO drives immune activation and boosts responses to immune checkpoint blockade in pancreatic cancer. Sci Immunol. 2022;7(75):eabn0704.
Nemet I, Saha PP, Gupta N, Zhu W, Romano KA, Skye SM, et al. A cardiovascular disease-linked gut microbial metabolite acts via adrenergic receptors. Cell. 2020;180(5):862–7722.
Ma W, Zhang L, Zeng P, Huang C, Li J, Geng B, et al. An analysis of human microbe-disease associations. Brief Bioinform. 2017;18(1):85–97.
Cheng L, Qi C, Zhuang H, Fu T, Zhang X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res. 2020;48(D1):D554–60.
Jin H, Hu G, Sun C, Duan Y, Zhang Z, Liu Z, et al. mBodyMap: a curated database for microbes across human body and their associations with health and diseases. Nucleic Acids Res. 2022;50(D1):D808–16.
Luo J, Long Y. NTSHMDA: prediction of human microbe-disease association based on random walk by integrating network topological similarity. IEEE/ACM Trans Comput Biol Bioinform. 2020;17(4):1341–51.
Peng LH, Yin J, Zhou L, Liu MX, Zhao Y. Human Microbe-Disease Association Prediction Based on Adaptive Boosting. Front Microbiol. 2018;9:2440.
Zou S, Zhang J, Zhang Z. A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network. PLoS ONE. 2017;12(9):e0184394.
Liu Y, Wang SL, Zhang JF. Prediction of microbe-disease associations by graph regularized non-negative matrix factorization. J Comput Biol. 2018;25(12):1385–94.
Niu YW, Qu CQ, Wang GH, Yan GY. RWHMDA: Random Walk on Hypergraph for Microbe-Disease Association Prediction. Front Microbiol. 2019;10:1578.
Wu C, Gao R, Zhang D, Han S, Zhang Y. PRWHMDA: Human Microbe-Disease Association Prediction by Random Walk on the Heterogeneous Network with PSO. Int J Biol Sci. 2018;14(8):849–57.
Noronha A, Modamio J, Jarosz Y, Guerard E, Sompairac N, Preciat G, et al. The virtual metabolic human database: integrating human and gut microbiome metabolism with nutrition and disease. Nucleic Acids Res. 2019;47(D1):D614–24.
Sung J, Kim S, Cabatbat JJT, Jang S, Jin YS, Jung GY, et al. Global metabolic interaction network of the human gut microbiota for context-specific community-scale analysis. Nat Commun. 2017;8:15393.
Nelson SJ, Schopen M, Savage AG, Schulman JL, Arluk N. The MeSH translation maintenance system: structure, interface design, and implementation. Stud Health Technol Inform. 2004;107(Pt 1):67–9.
Federhen S. The NCBI Taxonomy database. Nucleic Acids Res. 2012;40(Database issue):D136-43.
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021;49(D1):D1388–95.
Skoufos G, Kardaras FS, Alexiou A, Kavakiotis I, Lambropoulou A, Kotsira V, et al. Peryton: a manual collection of experimentally supported microbe-disease associations. Nucleic Acids Res. 2021;49(D1):D1328–33.
Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F, Falony G, et al. Richness of human gut microbiome correlates with metabolic markers. Nature. 2013;500(7464):541–6.
Verhaar BJH, Prodan A, Nieuwdorp M, Muller M. Gut microbiota in hypertension and atherosclerosis: a review. Nutrients. 2020;12(10):2982.
Verdin E. NAD(+) in aging, metabolism, and neurodegeneration. Science. 2015;350(6265):1208–13.
Garten A, Schuster S, Penke M, Gorski T, de Giorgis T, Kiess W. Physiological and pathophysiological roles of NAMPT and NAD metabolism. Nat Rev Endocrinol. 2015;11(9):535–46.
Yoshino J, Mills KF, Yoon MJ, Imai S. Nicotinamide mononucleotide, a key NAD(+) intermediate, treats the pathophysiology of diet- and age-induced diabetes in mice. Cell Metab. 2011;14(4):528–36.
Milne JC, Lambert PD, Schenk S, Carney DP, Smith JJ, Gagne DJ, et al. Small molecule activators of SIRT1 as therapeutics for the treatment of type 2 diabetes. Nature. 2007;450(7170):712–6.
Huang X, Qiu Y, Gao Y, Zhou R, Hu Q, He Z, et al. Gut microbiota mediate melatonin signalling in association with type 2 diabetes. Diabetologia. 2022;65(10):1627–41.
Karamitri A, Jockers R. Melatonin in type 2 diabetes mellitus and obesity. Nat Rev Endocrinol. 2019;15(2):105–25.
Agus A, Clement K, Sokol H. Gut microbiota-derived metabolites as central regulators in metabolic disorders. Gut. 2021;70(6):1174–82.
Oliphant K, Allen-Vercoe E. Macronutrient metabolism by the human gut microbiome: major fermentation by-products and their impact on host health. Microbiome. 2019;7(1):91.
Acknowledgements
Not applicable.
Funding
This study was supported by the National Key Research and Development Program [2018YFA0903702].
Author information
Authors and Affiliations
Contributions
FBG conceived the study and guided it throughout and revised the manuscript. CYW, XK, and QQW performed the dataset collection. CYW compiled the codes for the database and prepared the figures and tables and wrote the first draft of the manuscript. GQZ, ZSC and ZXD contributed to the review of the manuscript before submission. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1:
Supplementary Table 1. Predictive scores of experimentally validated disease-metabolite association data.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, CY., Kuang, X., Wang, QQ. et al. GMMAD: a comprehensive database of human gut microbial metabolite associations with diseases. BMC Genomics 24, 482 (2023). https://doi.org/10.1186/s12864-023-09599-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12864-023-09599-5