Analysis of signaling networks distributed over intracellular compartments based on protein-protein interactions
© Popik et al.; licensee BioMed Central Ltd. 2014
Published: 19 December 2014
Biological processes are usually distributed over various intracellular compartments. Proteins from diverse cellular compartments are often involved in similar signaling networks. However, the difference in the reaction rates between similar proteins among different compartments is usually quite high. We suggest that the estimation of frequency of intracompartmental as well as intercompartmental protein-protein interactions is an appropriate approach to predict the efficiency of a pathway.
Using data from the databases STRING, ANDSystem, IntAct and UniProt, a PPI frequency matrix of intra/inter-compartmental interactions efficiencies was constructed. This matrix included 15 human-specific cellular compartments. An approach for estimating pathway efficiency using the matrix of intra/inter-compartmental PPI frequency, based on analysis of reactions efficiencies distribution was suggested. An investigation of KEGG pathway efficiencies was conducted using the developed method. The clusterization and the ranking of KEGG pathways based on their efficiency were performed. "Amino acid metabolism" and "Genetic information processing" revealed the highest efficiencies among other functional classes of KEGG pathways. "Nervous system" and "Signaling molecules interaction" contained the most inefficient pathways. Statistically significant differences were found between efficiencies of KEGG and randomly-generated pathways. Based on these observations, the validity of this approach was discussed.
The estimation of efficiency of signaling networks is a complicated task because of the need for the data on the kinetic reactions. However, the proposed method does not require such data and can be used for preliminary analysis of different protein networks.
Estimation of efficiency of signaling networks is one of the most relevant problems in the study of biological systems. Analysis of effectiveness of biological networks is needed to meet the challenges of medicine and biotechnology [1, 2]. In particular, search for drug targets [3, 4], prediction of gene expression , engineering of organisms and plant systems  can be performed via analysis of various regulatory networks. Common methods for systems analysis of signaling pathways are presented by different modeling approaches, such as flux models , kinetic models , Boolean models [9, 10], Petri net models [11, 12] or stochastic modeling methods . Each method has both advantages and limitations. Ordinary Differential Equation (ODE) modeling provides qualitative and quantitative information about processes, though the search of parameters for the reactions is a time-consuming and difficult task. Flux and Boolean models allow steady-state analysis, but do not give a description of the process dynamics. Modeling and analysis using stochastic methods are computationally expensive. All methods require evaluation of reaction parameters, which in turn implies the need for experimental data.
One of the difficulties in modeling a signaling pathway is that biological processes in cells are allocated to different intracellular compartments . Thus, the effectiveness of a pathway can be directly influenced by the distribution of involved proteins over intracellular localizations.
Previously we developed the CELLmicrocosmos PathwayIntegration (CmPI) to support and visualize the subcellular localization prediction of protein-related data such as protein-interaction network . Here, we propose a method for evaluating the pathway efficiency on the basis of data on the intracellular localization of proteins involved in protein-protein interactions (PPI). Current analysis showed that proteins involved in PPI are localized preferably in the same cellular compartment. Moreover, it is shown that Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways  significantly differ in efficiency from random pathways. All KEGG pathways have been clustered in eight groups by the distribution of their reactions efficiencies. Clusters statistically differ by average efficiency. Ranking of functional classes of the KEGG pathways based on their efficiency was carried out.
Results and discussion
Method for estimating efficiency of signaling pathways
The method for estimating the efficiency of the pathway is based on consideration of PPI frequencies between different intracellular compartments. We assume that if PPI in general occurs more frequently between proteins from particular compartments, the interactions which contain proteins located in these compartments would be more effective within the pathways. Thus, the optimality of a pathway reaction distribution over the intracellular localization may reflect the efficiency of the pathway, with the most optimal distribution being the one where the frequency of observed interactions between proteins localized in intracellular compartments involved in the pathway has a maximum value.
To analyze the effectiveness of intra/inter-compartmental interactions, 15 major locations of eukaryotic cells were selected: Cytoplasm, Nucleus, Secreted, Membrane, Chromosome, Endoplasmic reticulum, Golgi apparatus, Endosome, Lysosome, Mitochondrion, Cell junction, Lipid-anchor, Cell projection, Peroxisome and cytoplasmic vesicle. The localizations were selected by following rules. We considered only the highest hierarchy level of localizations presented in UniProt , data on underlying in hierarchy localizations were added to localizations with the highest hierarchy level. Finally we took 15 localizations containing more than 200 numbers of proteins with available PPI data. We used data on 16,000 human proteins with the information about their compartmentalization (Figure S1). For this group of proteins, 600,000 cases of PPI were reported.
On the basis of these data we find efficiency of a reaction and a molecular-genetic network by following approach:
is a symmetric matrix (Additional file 1 Table S1). The efficiencies of reactions occurring in the same compartment are presented on the diagonal of the matrix. The efficiencies reflect efficiencies of reactions of proteins from different localizations and , . In most cases diagonal elements have higher values in comparison with other elements from the row or column . It can be observed that reactions of proteins from the one compartment take place in more efficient way than reactions of proteins from different compartments. The only exception is the membrane compartment. In this case the diagonal element is the smallest compared to other compartments.
The efficiency of a molecular-genetic network Q involving N reactions is defined as a function of efficiencies of the reactions: , where in case of PPI, is the reaction number q of the network Q, and are proteins, reacting in . In case of not PPI, we consider proteins and from reactions and .
KEGG pathways analysis
There were 282 KEGG human pathways analyzed, including totally 50.000 reactions (Additional file 2 Table S2).
To compare the mean efficiency (Eff) of KEGG pathways with the one of random pathways, Eff distributions of 282 KEGG pathways and more than 10000 random pathways were calculated. Random pathways were generated by permutation of KEGG pathways in following way: for each KEGG pathway we generated 1000 of random pathways by replacing the proteins in each reaction by randomly chosen ones from the list of all KEGG proteins. If one protein is involved in several reactions of the pathway - we replace it in all these reactions by the same random protein. It was found that the Eff distribution of KEGG pathways (Figure 1B) has a statistically significant difference over the Eff distribution of random pathways using chi square test (p-value <10E-5).
Also, it was important to check whether there is a correlation between the length of the pathways and their efficiency. The value of Pearson correlation coefficient was equal to R =- 0.1 (p-value <0.01). The value of R was low, so we cannot make any concrete conclusions. However, it is negative, suggesting a weak reciprocal relationship with the length of the pathways.
KEGG functional classes of pathways were ranked by the mean efficiency of included pathways. The highest efficiency is observed within pathways from "Amino acid metabolism", "Genetic information processing" and "Carbohydrate metabolism" classes.
Evaluation of the efficiency of the signaling networks currently remains an important issue. The method for preliminary analysis of networks lacking the data on kinetic parameters was suggested to avoid one of the main obstacles on the way to practical application of existing methods for modeling the dynamic of the molecular genetic systems. The matrix of intra/inter-compartmental interactions efficiencies was constructed for 15 specific human cellular localizations based on PPI data and data on protein distribution over cellular compartments. The analysis of the matrix revealed that the frequency of PPI of proteins from the same compartment is higher in comparison to frequency of PPI of proteins from different compartments. A new method for evaluating pathway efficiency was proposed; all KEGG human pathways were estimated by mean efficiency and clustered based on correlation distances between the distributions of pathway reaction efficiencies. The distribution of pathway functional classes over clusters shows that some classes are mainly presented in one cluster.
The proposed method can be used for the preliminary analysis of the effectiveness of various signaling networks, including networks, for which there is not enough data for modeling them with more accurate methods.
Material and methods
PPI data was extracted from STRING , IntAct , and ANDSystem . STRING is a database containing known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations. IntAct is a database containing protein-protein interaction data. All interactions are derived from literature curation or direct user submissions. The ANDSystem is designed to reconstruct and analyze associative gene networks. The ANDSystem incorporates utilities for automated knowledge extraction from Pubmed-published scientific texts, and analysis of information from various databases. In addition, the ANDCell database contains information on molecular-genetic events retrieved from texts and databases. Data on subcellular localization of human proteins was extracted from ANDSystem that contains - in addition to the text mining-based information - also data from the UniProt database. The classification of the pathways by their efficiency was conducted on a set of pathways from the KEGG database. 282 human pathways were analyzed (Release 70.1, June 1, 2014).
The work was financial supported from Russian Science Foundation grant "Programmed cell death induced via death receptors: Delineating molecular mechanisms of apoptosis initiation via molecular modeling "No 14-44-00011.”
Publication of this article has been funded by Russian Science Foundation grant No 14-44-00011.
This article has been published as part of BMC Genomics Volume 15 Supplement 12, 2014: Selected articles from the IX International Conference on the Bioinformatics of Genome Regulation and Structure\Systems Biology (BGRS\SB-2014): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S12.
- Karlebach G, Shamir R: Modelling and analysis of gene regulatory networks. Nature Reviews Molecular Cell Biology. 2008, 9 (10): 770-780. 10.1038/nrm2503.PubMedView ArticleGoogle Scholar
- Hopkins AL: Network pharmacology: the next paradigm in drug discovery. Nature chemical biology. 2008, 4 (11): 682-690. 10.1038/nchembio.118.PubMedView ArticleGoogle Scholar
- Csermely P, Agoston V, Pongor S: The efficiency of multi-target drugs: the network approach might help drug design. Trends in Pharmacological Sciences. 2005, 26 (4): 178-182. 10.1016/j.tips.2005.02.007.PubMedView ArticleGoogle Scholar
- Cascante M, Boros LG, Comin-Anduix B, de Atauri P, Centelles JJ, Lee PWN: Metabolic control analysis in drug discovery and disease. Nature biotechnology. 2002, 20 (3): 243-249. 10.1038/nbt0302-243.PubMedView ArticleGoogle Scholar
- Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED: Metabolic network structure determines key aspects of functionality and regulation. Nature. 2002, 420 (6912): 190-193. 10.1038/nature01166.PubMedView ArticleGoogle Scholar
- Shachar-Hill Y: Metabolic network flux analysis for engineering plant systems. Current opinion in biotechnology. 2013, 24 (2): 247-255. 10.1016/j.copbio.2013.01.004.PubMedView ArticleGoogle Scholar
- Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis. Current opinion in biotechnology. 2003, 14 (5): 491-496. 10.1016/j.copbio.2003.08.001.PubMedView ArticleGoogle Scholar
- Ishii N, Suga Y, Hagiya A, Watanabe H, Mori H, Yoshino M, Tomita M: Dynamic simulation of an in vitro multi-enzyme system. FEBS letters. 2007, 581 (3): 413-420. 10.1016/j.febslet.2006.12.049.PubMedView ArticleGoogle Scholar
- Chaves M: Robustness and fragility of Boolean models for genetic regulatory networks. J Theor Biol. 2005, 431-449. 235Google Scholar
- Fumiã HF, Martins ML: Boolean network model for cancer pathways: predicting carcinogenesis and targeted therapy outcomes. PloS one. 2013, 8 (7): e69008-10.1371/journal.pone.0069008.PubMedPubMed CentralView ArticleGoogle Scholar
- Baldan P, Cocco N, Marin A, Simeoni M: Petri nets for modelling metabolic pathways: a survey. Natural Computing. 2010, 9 (4): 955-989. 10.1007/s11047-010-9180-6.View ArticleGoogle Scholar
- Voss K, Heiner M, Koch I: Steady state analysis of metabolic pathways using Petri nets. In silico biology. 2003, 3 (3): 367-387.PubMedGoogle Scholar
- Cazzaniga P, Pescini D, Besozzi D, Mauri G, Colombo S, Martegani E: Modeling and stochastic simulation of the Ras/cAMP/PKA pathway in the yeast Saccharomyces cerevisiae evidences a key regulatory function for intracellular guanine nucleotides pools. Journal of biotechnology. 2008, 133 (3): 377-385. 10.1016/j.jbiotec.2007.09.019.PubMedView ArticleGoogle Scholar
- McConnachie G, Langeberg LK, Scott JD: AKAP signaling complexes: getting to the heart of the matter. Trends in molecular medicine. 2006, 12 (7): 317-323. 10.1016/j.molmed.2006.05.008.PubMedView ArticleGoogle Scholar
- Sommer B, Kormeier B, Demenkov PS, Arrigo P, Hippe K, Ates Ö, Hofestädt R: Subcellular localization charts: a new visual methodology for the semi-automatic localization of protein-related data sets. Journal of bioinformatics and computational biology. 2013, 11 (01):Google Scholar
- UniProt Consortium: The universal protein resource (UniProt). Nucleic acids research. 2008, 36 (suppl 1): D190-D195.Google Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMedPubMed CentralView ArticleGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, von Mering C: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic acids research. 2011, 39 (suppl 1): D561-D568.PubMedPubMed CentralView ArticleGoogle Scholar
- Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Hermjakob H: IntAct--open source resource for molecular interaction data. Nucleic acids research. 2007, 35 (suppl 1): D561-D565.PubMedPubMed CentralView ArticleGoogle Scholar
- Demenkov PS, Aman EE, Ivanisenko VA: Associative network discovery (AND)-the computer system for automated reconstruction networks of associative knowledge about molecular-genetic interactions. Comput Technol. 2008, 13 (2): 15-19.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.