C2Maps: a network pharmacology database with comprehensive disease-gene-drug connectivity relationships
© Huang et al.; licensee BioMed Central Ltd. 2012
Published: 26 October 2012
Skip to main content
© Huang et al.; licensee BioMed Central Ltd. 2012
Published: 26 October 2012
Network pharmacology has emerged as a new topic of study in recent years. It aims to study the myriad relationships among proteins, drugs, and disease phenotypes. The concept of molecular connectivity maps has been proposed to establish comprehensive knowledge links between molecules of interest in a given biological context. Molecular connectivity maps between drugs and genes/proteins in specific disease contexts can be particularly valuable, since the functional approach with these maps helps researchers gain global perspectives on both the therapeutic profiles and toxicological profiles of candidate drugs.
To assess drug pharmacological effect, we assume that "ideal" drugs for a patient can treat or prevent the disease by modulating gene expression profiles of this patient to the similar level with those in healthy people. Starting from this hypothesis, we build comprehensive disease-gene-drug connectivity relationships with drug-protein directionality (inhibit/activate) information based on a computational connectivity maps (C2Maps) platform. An interactive interface for directionality annotation of drug-protein pairs with literature evidences from PubMed has been added to the new version of C2Maps. We also upload the curated directionality information of drug-protein pairs specific for three complex diseases - breast cancer, colorectal cancer and Alzheimer disease.
For relevant drug-protein pairs with directionality information, we use breast cancer as a case study to demonstrate the functionality of disease-specific searching. Based on the results obtained from searching, we perform pharmacological effect evaluation for two important breast cancer drugs on treating patients diagnosed with different breast cancer subtypes. The evaluation is performed on a well-studied breast cancer gene expression microarray dataset to portray how useful the updated C2Maps is in assessing drug efficacy and toxicity information.
The C2Maps platform is an online bioinformatics resource that provides biologists with directional relationships between drugs and genes/proteins in specific disease contexts based on network mining, literature mining, and drug effect annotating. A new insight to assess overall drug efficacy and toxicity can be provided by using the C2Maps platform to identify disease relevant proteins and drugs. The case study on breast cancer correlates very well with the existing pharmacology of the two breast cancer drugs and highlights the significance of C2Maps database.
Screening millions of chemical compounds to identify "hit" compounds for specific disease gene/protein targets has been a mainstream paradigm for modern drug discovery . While the conventional "One disease, One gene, and One drug" paradigm  works effectively for simple genetic disorders, it fails to produce effective drugs for complex diseases such as cancer . In complex diseases, many genes may be contributing to the disease's phenotype; therefore, identifying a "magic bullet" drug compound can be quite elusive.
Polypharmacology, which focuses on multi-target drugs, has become a new paradigm in drug discovery. Polypharmacology drugs have conventionally been viewed to have undesirable 'promiscuity'. However, recent research studies show, in the case of both older psychiatric drugs and modern anticancer therapies, that this promiscuity is intrinsic to the drug's therapeutic efficacy . Although there are over 40 drug-target (protein-compound interaction) databases according to Pathguide , (e.g. DrugBank , STITCH , CTT , CTD  and BindingDB , et al), a disease-specific searching platform is still needed to fully understand drug effects on the human body.
A new cancer systems biology approach to drug discovery has emerged in recent years. The primary focus of this paradigm is to understand the actions of drugs by considering targets in the context of the biological networks. By focusing on a systems level, it provides a better way to examine complicated diseases that can be caused by several gene mutations, such as cancer . However, most methods published so far focus on modeling the structure of the drug target network qualitatively . To examine a drug's effect on a molecular network representative of the disease, more quantitative and accurate modeling techniques need to be developed by utilizing the concept of network pharmacology  or network medicine .
In post-genome biology, molecular connectivity maps have been proposed to establish comprehensive knowledge links between molecules of interest in a given biological context . Molecular connectivity maps between drugs and genes/proteins in a disease-specific context can be particularly valuable because they allow researchers to evaluate drugs against each other using their unique gene/protein-drug association profiles. The functional approach to drug comparisons helps researchers gain global perspectives on both the toxicological profiles and therapeutic profiles of candidate drugs. Furthermore, the time it takes to develop high quality drugs in new therapeutic areas can also be reduced by using this method.
One approach for developing molecular connectivity map data is to generate disease-specific protein-drug association profiles computationally by mining bio molecular interaction networks and PubMed literature . The Connectivity Maps (C2Maps) web server  is an online bioinformatics resource that provides biologists with potential relationships between drugs and genes/proteins in specific disease contexts based on network mining and literature mining. It's based on the concept of network pharmacology by examining many drugs at the same time and studying the drug disease relationship based on the underlying protein interaction network instead of drugs' direct target. C2Maps provides quantitative measurements of protein's and drug's relevance to a specific disease by applying networking mining and the statistical testing methods in text mining and thus offers new insight to assess overall drug efficacy and toxicity.
Occurrences between proteins and drugs from literature mining of C2Maps don't necessarily tell research what type of relationships they have, therapeutic or toxic. To overcome these limitations, we further standardize the classifications between proteins and drugs and then perform literature curations to determine drugs' effect on proteins on higher resolutions. Such valuable information is not readily available from the existing drug-target (protein-compound interaction) databases (e.g. DrugBank, STITCH, CTT, et al) though they may be scattered within a description or referenced text.
To assess drug pharmacological effect, such as drug efficacy and toxicity, we assume that "ideal" drugs for a patient diagnosed with a certain disease should modulate the gene expression profiles of this patient to the similar level with those in normal healthy people. Therefore, for those statistically over-expressed genes, drugs should be able to inhibit their expression level to the normal range. Similarly, for those statistically under-expressed genes, drugs should be able to activate their expression level to the normal range. In this way, drugs can treat or prevent the disease through reversing the gene expression level from disease status to the normal range, thus modulating cellular function as in normal cells.
By assuming that if the gene expression profiles of disease and drug are opposing, then the drug might be a potential treatment option of the disease,  identified novel drug indications in diet-induced obesity or Alzheimer's disease. Another work by Atul  utilized the same gene expression data and algorithms with large scale gene expression data from GEO to study associations between 100 diseases and 164 drug molecules. They found candidate therapeutics for 53 of the diseases. These studies are proof of principle that how using public genomics database and similar hypothesis can benefit drug discovery. Though gene expression data are publicly available for more than 1000 compounds in the second release of , yet there are numerous compounds that are not part of the database. Another limitation of this overly simplified hypothesis lies in it doesn't differentiate important genes from unimportant ones. Ideally a biological meaningful scoring methods needs developed.
It's the first time that we have published this comprehensive C2Maps database server. Although  provides the underlying computational methodology, it only covers a small number of diseases such as Alzheimer's Disease.
We create an interactive interface for directionality annotation of drug-protein pairs with literature evidences from PubMed.
We curate the directionality information of drug protein pairs for three disease phenotypes: breast cancer, colorectal cancer and Alzheimer disease from 5133, 4869 and 3928 PubMed abstracts, respectively. We also upload these curated directionality information into the C2Maps, and perform a statistical analysis on them. Curation of additional diseases, like pancreatic cancer and autism, is still on-going.
We enhance the functionality of disease-specific searching for relevant proteins and drugs with directionality information.
We update the comprehensive disease-gene-drug connectivity data in the C2Maps databases, including 19,569,563 PubMed abstracts in the current version and 142,523 unique 3 star protein interactions in the current version.
We also use breast cancer as a case study to demonstrate the functionality of disease-specific searching for relevant drug-protein pairs with directionality information.
Based on the searching result, we show the feasibility of performing drug pharmacological effect evaluation for two important breast cancer drugs to show the power of updated C2Maps in drug efficacy and toxicity assessment.
Network mining component takes a query disease term as the input, and generates a ranked list of disease-relevant proteins as the output, through 1) MeSH term matching, 2) disease-associated gene searching from OMIM , 3) network expanding in HAPPI , and 4) network-based protein ranking;
Text mining component takes an input list of genes or proteins, and creates a list of enriched disease candidate drugs that are significantly associated with the disease-relevant proteins from the previous component as the output, through 1) gene/protein name mapping using UniProtKB, 2) article abstract retrieving from PubMed, 3) drug/chemical compound identification using MeSH term, and 4) disease-specific drug-protein pair ranking;
Drug effect annotating component can allow users to 1) retrieve disease-specific drug-protein association list, 2) curate drug-protein directionality information from PubMed abstract, 3) annotate these drug-protein directionality information interactively, and 4) browse disease-specific drug-protein directionality information online.
Here, p and q are indices for proteins in the cancer-related interaction network PPI, k is an empirical constant (k=2 in this study), conf(p, q) is the confidence score assigned to each interaction between protein p and q, and N(p, q) holds the value of 1 if the protein p interacts with q.
Here, is generated by sampling the entire collection of retrieved abstracts . is the size of each sample. refers to a random sample generated by randomly sampling the entire number of PubMed abstracts; the size of the random sample is . and refer to average document frequencies of in and . and refer to document frequency variances of in and in . A two-sided tails t-test was then performed to calculate the p-value. A thorough description of the computational components and algorithms used, along with data sets and data processing parameters, is described in detail by Li et al. .
The C2Maps platform follows a multi-tier architecture design. The back end was implemented as PL/SQL packages in the Oracle 11 g database server, with the Oracle Text engine enabled, to ensure scalable querying of PubMed text documents. The C2Maps application middleware was implemented in the Oracle Application Express (APEX) server, which bridged between the Apache web server and the Oracle database server.
Current statistics for the included database records
Human Protein-Protein Interaction
Unique HAPPI 3-star interactions
Disease and Drug Terminology
Therapeutic: if the drug activates the under-expressed protein or inhibits the over-expressed protein, we define that the drug has a therapeutic effect on that protein
Toxic: if the drug activates the over-expressed protein or inhibits the under-expressed protein, we define that the drug has a toxic effect on that protein
Ambiguous: if there is missing directionality information for either the nodes (i.e. proteins/drugs) or edges.
Curation of drug-protein relations from Pub-Med abstracts
"BRCA1 mRNA and protein levels were significantly decreased in estrogen-depleted MCF-7 and BT20T cells and increased again after stimulation with beta-estradiol".
"Treatment of cells with cycloheximide (CHX) prevented the activation of p53 in all phases of the cell cycle and its accumulation in G1/S and S".
"Hydroxyurea-mediated DNA synthesis arrest of S phase MCF7 cells led to a loss of BRCA1 from these structures".
"GRalpha and GRbeta transcripts are coordinately upregulatedin CEM-C7 cells and coordinately downregulated in IM-9 cells by dexamethasone".
The drug-protein relation is not mentioned in the text.
Activation - "Subsequent injection of tamoxifen triggers the transient activation of Akt/PKB in mice." (Tamoxifen and AKT1_HUMAN, PMID: 12640620).
Inhibition - "Treatment of cells with Cycloheximide (CHX) prevented the activation of p53 in all phases of the cell cycle and its accumulation in G1/S and S." (P53_HUMAN and Cycloheximide, PMID:9484835).
Indirect Yes - "Hydroxyurea-mediated DNA synthesis arrest of S phase MCF7 cells led to a loss of BRCA1 from these structures." (BRCA1_HUMAN and Hydroxyurea, PMID:9267023).
Ambiguous - "GRalpha and GRbeta transcripts are coordinately upregulated in CEM-C7 cells and coordinately downregulated in IM-9 cells by dexamethasone." (GCR_HUMAN and Dexamtheasone, PMID:12974663).
PubMed evidence for Tamoxifen's effect on ESR1
The major computational components of the C2Maps platform were developed using validated computational techniques. In the network mining component, protein interaction network expansion was able to reduce the initial biases and low data coverage, which may have existed in the seed list of protein. We used the new HAPPI database instead of other protein interaction databases because of its overall better data quality (comparable or better than data in the HPRD database for quality star grades of 3 and above) and coverage (more than 280,000 human protein interactions with star grades of 3 and above), which was thoroughly described in Chen et al. . In the text mining component, the PubMed abstract retrieval for each protein was shown to improve Information Retrieval (IR) recall performance without sacrificing precision The quality of disease drug identifications was shown to outperform comparable systems with balanced sensitivity, specificity, and positive predictive values (for details, refer to Li et al. ).
Performance assessment of C2Maps in varying cancers.
Drug efficacy can be measured by the ability of a drug to produce the desired phenotypic effect or molecular effect. To evaluate drug efficacy in the molecular level based on our hypothesis illustrated in Figure 2, we need to know how drugs can affect the expression of its interacting genes and how those genes are expressed in disease conditions. We have got the former from the above case study of C2Maps. To get the latter, we performed differential analysis on a well-studied microarray dataset-GSE3191. This experiment contains breast cancer subtype luminal A, basal-like and also normal breast tissues. We obtained the differential genes for both breast cancer subtypes - luminal A and basal-like when compared to normal. We identified 579 differential genes between luminal A and normal, 773 differential genes between basal like and normal. We used these two sets for the following case study.
Tamoxifen relevant proteins and their directionality
In Figure 7d, we portray the drug protein interaction for Tamoxifen in basal patients. Three proteins out of its 15 interacting proteins are differentially expressed between basal patients and normal. Tamoxifen has only 1 therapeutic effect by activating under-expressed JUN, while 2 toxic effects by activating over-expressed E2F1 and inhibiting under-expressed IRS1. However, all these three proteins are relatively insignificant for breast cancer. This implies a neutral role overall when using Tamoxifen in basal patients since it is not able to reverse its interacting proteins in basal condition (Figure 7d). This agrees well with the clinical fact that basal or triple negative breast cancer patients fail to benefit from Tamoxifen treatment.
Plicamycin was an approved antineoplastic antibiotic for a variety of advanced forms of cancer. It has been withdrawn from market in 2000. In Figure 7f, we showed the drug protein interaction for Plicamycin in Luminal A patients. It has 2 interacting proteins with directionality annotations (shown in Table 5) and both are not significant in breast cancer with a low r p score. Only 1 protein out of these 2 is differentially expressed between luminal A and normal. Plicamycin has a toxic effect overall by inhibiting under-expressed MYC. This implied a neutral or toxic effect when using Plicamycin in Luminal A subtype breast cancer patientssince it is not able to reverse its interacting proteins in the disease condition (Figure 7f). This may help explain why it was withdrawn in 2000.
In this study, we present an upgraded C2Maps platform to evaluate drug pharmacological effects based on the hypothesis that an ideal drug can reverse the gene expression level in a disease back to those in normal conditions. This online platform will enable users to query high-coverage protein-drug connectivity maps in real time. It enables users to research up-to-date knowledge of connectivity maps for a specific disease, explore therapeutic protein targets, design repurposed drug compounds, and assess toxicological impacts of drug compounds on disease-relevant genes/proteins. Three efficacy case studies prove the feasibility to apply the literature mined drug directionality data from C2Maps for drug efficacy study. It will be a major resource to biomedical researchers interested in developing disease-specific therapeutic and diagnostic applications based on progresses in network biology and network pharmacology.
We will increase the functionality of drug-orientated searching for relevant disease phenotypes and proteins in the C2Maps. It will allow users to input drug names, not just disease names. It should be able to retrieve all the disease names and genes/proteins related to this drug. This function will be very useful for drug repurposing.
We will increase the functionality of disease-orientated browsing for relevant proteins and drugs in the C2Maps by using disease phenotype trees. It will allow users to browse the database by clicking the disease name. The current version only supports the disease-specific searching function without any browsing function.
We will also enhance the functionality of interactive directionality information annotation for drug-protein pairs in the C2Maps by using natural language processing (NLP) techniques. The literature curations for breast cancer, colorectal cancer and Alzheimer disease took three experts nearly one year's effort to complete. While this ensures the data quality, it's time consuming. With those golden standard dataset from curation, we will NLP techniques to allow users to curate and annotate directionality information from PubMed abstracts more easily and semi-automatically.
Based on “Predicting drug efficacy based on the integrated breast cancer pathway model”, by Hui Huang, Xiaogang Wu, Sara Ibrahim, Marianne McKenzie and Jake Y Chen which appeared in Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on. © 2011 IEEE .
We would like to thank the IUPUI solution center, MURI, UROP and Indiana Center for Systems Biology and Personalized Medicine for financial support. We thank Dr. Davide Bolchini for his guidance on web site usability. We also thank Sina Reinhard for her curation on Alzheimer's disease, and Taiwo Ajumobi for her curation on autism.
This article has been published as part of BMC Genomics Volume 13 Supplement 6, 2012: Selected articles from the IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S6.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.