NetWalker: a contextual network analysis tool for functional genomics
© Komurov et al.; licensee BioMed Central Ltd. 2012
Received: 23 November 2011
Accepted: 25 June 2012
Published: 25 June 2012
Functional analyses of genomic data within the context of a priori biomolecular networks can give valuable mechanistic insights. However, such analyses are not a trivial task, owing to the complexity of biological networks and lack of computational methods for their effective integration with experimental data.
We developed a software application suite, NetWalker, as a one-stop platform featuring a number of novel holistic (i.e. assesses the whole data distribution without requiring data cutoffs) data integration and analysis methods for network-based comparative interpretations of genome-scale data. The central analysis components, NetWalk and FunWalk, are novel random walk-based network analysis methods that provide unique analysis capabilities to assess the entire data distributions together with network connectivity to prioritize molecular and functional networks, respectively, most highlighted in the supplied data. Extensive inter-operability between the analysis components and with external applications, including R, adds to the flexibility of data analyses. Here, we present a detailed computational analysis of our microarray gene expression data from MCF7 cells treated with lethal and sublethal doses of doxorubicin.
NetWalker, a detailed step-by-step tutorial containing the analyses presented in this paper and a manual are available at the web site http://netwalkersuite.org.
KeywordsBiological networks NetWalker NetWalk Network analyses
In order to provide the research community with a software tool featuring advanced methods for a priori network analyses, we developed NetWalker (http://netwalkersuite.org). NetWalker architecture is designed to enable network analyses based on holistic (i.e. no cutoff) integration of experimental data with a priori networks and to allow extensive interoperability between analysis components and with external applications. NetWalker features NetWalk  and FunWalk, random walk-based analysis methods for prioritization of network interactions and functional processes, respectively, based on assessment of local network connectivity in conjunction with experimental data. Unlike other tools designed for similar purposes, NetWalk and FunWalk allow for interactive comparative analyses of most active networks and functional processes, respectively, between samples. The latter is achieved via Edge Flux and Function Tables, respectively, which give flexibility to the user in querying, analyses and visualizations of networks of most interest. In addition, intuitive inter-operability between analysis and visualization components in NetWalker, as well as with external applications, including R, adds flexibility in data analyses (see Manual in Additional file 1 for more details). In order to demonstrate the use of analysis functionalities in NetWalker for network-based analyses of microarray gene expression data, we have conducted an analysis of our in-house gene expression dataset from doxorubicin responses of p53-positive cells.
In NetWalker, analysis objects are of five types, NetWalker Interactome Knowledgebase (NIK), DataSet, EFTable, FunTable and Graph (Figure 3). The NIK is a pre-compiled knowledgebase of human genes, their functional annotations and their biomolecular interactions. NIK is loaded at the application startup, and it cannot be modified from within the application. The next three objects (DataSet, EFTable and FunTable) are in the form of tables, and Graph represents network views of interest. Tables in NetWalker feature standard functions for statistical manipulation, clustering, heatmap coloring, advanced filtering and network plotting, which give flexibility to the user in the analyses of respective analysis tables. DataSet handles primary datasets, such as gene expression datasets, uploaded by the user. NetWalk is run on selected columns of a DataSet, and generates an EFTable and a FunTable (see next). EFTable is a table of interactions and their scores assigned for each condition that NetWalk was run on. FunTable is a table of functional terms and their scores for each column that FunWalk was run on. Graphs can be derived from any of these three tables by a simple export, or by direct query of the NIK.
NetWalk and EFTables
FunWalk and Function Tables (FunTable)
where p w is the probability of f based on experimental data w, while p r is that after setting all w = 1. The score s(f) can be interpreted as a relative visitation probability of interactions defined by the functional term f compared to random chance due to network topology and functional set size. Since the lower term in Eq. 7 contains all the bias due to network topology (e.g. more studied genes) and set sizes, the log-likelihood function s is controlled for these biases.
Since FunWalk considers functional terms of annotations, rather than genes, it only considers terms that have common annotations across molecular interactions defined in the network. In this way, FunWalk prioritizes subnetworks containing common functional annotations that are also over-represented in the data. FunWalk uses NetWalk results to score each functional term for its enrichment in the given dataset. FunWalk results are displayed as Function Tables (FunTable), with each row representing a functional term, and columns show their scores in the given experimental conditions. Any selected rows in a FunTable can be directly exported to a network view in a NetView to view the network interactions associated with the given functional terms.
NetView, network implementation and functions
NetView windows provide graphical view of networks of interest. NetView contains a number of functions for visual manipulation of the graph, such as different layouts, coloring and functional analyses. For visual representation of network graphs, we have used commercial yFiles library for Java (http://www.yworks.com). The yFiles library offers extensive support for nested graphs, which are important for implementing nested grouping various network layouts. Utilizing yFiles’ support of nested graphs, we have implemented manual and automated grouping of network components.
NetWalker Interactome Knowledgebase (NIK)
NetWalker uses a pre-compiled knowledgebase of genes, functional terms and biomolecular relationships and is loaded at the application startup.
There are currently 4 different interaction types incorporated into the NIK. These are 1) protein-protein interactions, 2) transcription factor—target interactions, 3) neighboring metabolic reactions, and 4) neighboring interactions from Reactome.
Protein-protein interactions were obtained from HPRD (Human protein reference database) , BIND (Biomolecular interaction database) , MINT , BioGRID  and IntAct . Directed signaling interactions were obtained from KEGG  and NCI Pathway Interaction Database (http://pid.nci.nih.gov/). Interactions from MINT, BioGRID, IntAct and NCI were obtained from Pathway Commons .
Transcription factor—target interactions were obtained from BIND (queried as protein-dna interactions), Reactome  (obtained from Pathway Commons) and NCI Pathway Interaction Database (obtained from Pathway Commons).
Neighboring metabolic reactions are assigned to a pair of genes if the product of the reaction catalyzed by one gene is the reactant of the reaction catalyzed by the other. For example, HK2 (Hexokinase II) catalyzes the reaction Glucose + ATP < - > Glucose-6-phosphate + ADP, while GPI (glucose phosphate isomerase) catalyzes the reaction Glucose-6-phosphate - > Fructose-6-phosphate. Since Glucose-6-phosphate is a product of one and the reactant of the other, these two genes are assigned an interaction in the network. See Figure 4B for examples of metabolic interactions (orange interactions). Information on genes and their metabolic reactions were obtained from KEGG, Human Metabolome Database (HMDB) and BiGG .
Neighboring reactions interactions were obtained from Reactome.
Functional terms: Functional annotation of genes from Gene Ontology  is used as functional terms for genes in NIK. These are also loaded at the application startup to aid in functional analyses.
The authors will be continuously updating NIK with new interactions from the underlying databases, with new interactions from additional sources and with additional functional annotations of genes. Updated NIK files will be provided at the web site for download.
Datasets are imported into NetWalker in DataSet Tables (see above and Manual). The column whose values will be used by NetWalker as gene identifiers of rows are set by the user from within DataSet. At this point, NetWalker will automatically match the values in the given column to Gene nodes in the NIK. Currently, supported IDs are Gene Symbols, aliases, Affymetrix probe IDs, Entrez Gene IDs, Refseq, Ensemble, Mouse Genome Database, Rat Genome Database and VEGA IDs.
In order to maximize flexibility of analyses in NetWalker, we have implemented an interface with R, a popular statistical programming environment, using network connection. We provide a R workspace file along with the application, which contains currently implemented functions for R-NetWalker interface. Currently, we have implemented functions for exchange of dataset/table and network objects between R and NetWalker. Details on the use of this functionality and sample uses can be found in the Manual.
The software is released with a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license (CC BY-NC-SA 3.0), which allows for using, modification and sharing of the software and of its components for non-profit purposes.
Memory requirements and speed
Since NetWalker is using large matrix multiplications for NetWalk and FunWalk, at least 2 GB of memory is required to run NetWalker, although we have successfully been able to run it in systems with less memory. A NetWalk run in NetWalker takes a few seconds per each data column, depending on the sytem. Since EFTables are very large objects (~300,000 EF values per data column), running very large datasets with NetWalk will require large memory (>4 GB) in a 64 bit system running 64 bit JRE.
The yFiles library used in NetWalker allows for visualizations and handling of large networks. We have been able to generate and visualize a network of ~1,000 nodes from an EFTable in under 5 seconds.
Comparison of features in NetWalker, Cytoscape, BiologicalNetworks and VisANT
Pre-compiled interactome knowledgebase
NetWalker Interactome Knowledgebase
No central knowledgebase, can import external interaction sets
Dataset import and processing
Clustering and heatmaps
Network building with genes of interest
Shortest paths, common interactors, filtering
Pre-defined canonical pathways
Whole distribution-based network scoring method
Unique network integration/analysis method
NetWalk, FunWalk, EF Tables, GeneConnector, FunTable
ActiveModules, other plugins
Functional enrichment analysis
Fisher’s Exact Probability
Analyses/visualizations of sequence/structure data
Support for non-mammalian species data/networks
Interoperability with R
Results and discussion
In order to demonstrate the use of functionalities in NetWalker in a real dataset, we undertook an analysis of microarray gene expression data from MCF7 cells before and after treatment with lethal (10 uM) and sublethal (1 uM) doses of chemotherapy drug doxorubicin. We imported the dataset to NetWalker and averaged gene expression values for experimental triplicates for each condition. We normalized gene expression values at each time point to that at the 0 time point to reflect fold change. Then, we ran NetWalk and FunWalk on each of the normalized columns to perform a comparative network analysis of cellular responses to sublethal and lethal doxorubicin doses.
In order to make a heatmap of most significant network interactions in doxorubicin response (EF heatmap), we selected most significant interactions from 1 and 10 uM conditions, and made a clustering heatmap. Figure 4A-B shows the heatmap of most significant interactions associated with increased gene expression in response to low or high doses of doxorubicin and a network corresponding to the highlighted cluster, which represents interactions that are associated with increased expression in high dose but reduced expression in low dose doxorubicin treatment, revealing a bimodal response.
In order for a user to be able to analyze his data to extract network models of interest, the software should allow him to a) import and handle the dataset, b) integrate the dataset with a large knowledgebase of biomolecular interactions, c) query networks from the knowledgebase that are most related to his data, d) identify and visualize the networks of interest, e) and visually enhance the network for better representation of the experimental condition. Without any of these components, software will be incomplete, and it will be difficult for a bioinformatically untrained biologist to use it for analysis of his data. For example, VisANT , PINA , BioLayout  and Osprey , although offering network construction, management and visualization tools, do not offer functionalities for importing and processing datasets or network integration with user-supplied genomic data, which makes it difficult for biologists to use these tools for network-based data analyses. Cytoscape is a popularly used excellent tool primarily designed for advanced visualizations of networks, but it does not offer content in the form of a knowledgebase. To our knowledge, BiologicalNetworks and NetWalker are the only software platforms that offer all of the functionalities described above. However, NetWalker is the only software that offers efficient holistic (i.e. no cutoff approach) data analysis methods (NetWalk and FunWalk) for comparative network and functional analyses. The design of NetWalker and of the NetWalker Interactome Knowledgebase to emphasize whole-distribution based analysis methods (see Manual for more details) for more flexible data analyses and model building is its most distinguishing feature from other software.
Novel functions can be integrated into existing software applications, such as Cytoscape, instead of developing a stand-alone application. However, Cytoscape is designed more as a visualization tool for biological networks, with some excellent features for visual mapping of data and further visual manipulations. Consequently, Cytoscape is not a database-centric software, like BiologicalNetworks and NetWalker, and the functions it provides, both core and through plugins, mainly concern the networks of interest (usually relatively small networks) uploaded or created by the user. Accordingly, the core API that is used by plugins only provides functions to access the current uploaded networks. In contrast, NetWalker (and BiologicalNetworks) features a pre-compiled knowledgebase of prior information, which is used to query the user-supplied data to extract most relevant networks. In addition, since handling of NetWalk and FunWalk results, their analyses, query and visualizations (EFTable, FunTable and functions therein) are best done with a specialized software architecture, we developed NetWalker as a separate suite to maximize user experience in using these methods. In addition, NetWalk and FunWalk are only pilot methods for the use of biased random walk models in network-based holistic data analyses, and we are currently working on a suite of novel algorithms to be incorporated into NetWalker to enable whole system-based analyses and automated mechanistic model building. Therefore, NetWalker should also be viewed as a novel platform for random walk based holistic network analyses.
Availability and requirements
NetWalker is available for download for academic use at http://netwalkersuite.org. A Windows and a Mac version have been included. Windows version of NetWalker runs on Windows XP and Windows 7 systems. We have tested the Mac version on a Mac computer with OS X version 10.7. Since NetWalk computations in NetWalker involve many large matrix multiplications, we recommend at least 2 GB of RAM. Most modern processors (Dual Core, Core2 Duo, etc.…) will suffice to run NetWalker with a reasonable performance.
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27 (3): 431-432. 10.1093/bioinformatics/btq675.PubMed CentralView ArticlePubMedGoogle Scholar
- Baitaluk M, Sedova M, Ray A, Gupta A: BiologicalNetworks: visualization and analysis tool for systems biology. Nucleic Acids Res. 2006, 34 (Web Server issue): W466-471.PubMed CentralView ArticlePubMedGoogle Scholar
- Hu Z, Mellor J, Wu J, Yamada T, Holloway D, Delisi C: VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res. 2005, 33 (Web Server issue): W352-357.PubMed CentralView ArticlePubMedGoogle Scholar
- Breitkreutz BJ, Stark C, Tyers M: Osprey: a network visualization system. Genome Biol. 2003, 4 (3): R22-10.1186/gb-2003-4-3-r22.PubMed CentralView ArticlePubMedGoogle Scholar
- Theocharidis A, van Dongen S, Enright AJ, Freeman TC: Network visualization and analysis of gene expression data using BioLayout Express(3D). Nat Protoc. 2009, 4 (10): 1535-1550. 10.1038/nprot.2009.177.View ArticlePubMedGoogle Scholar
- Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ, Chen RO, Brownstein BH, Cobb JP, Tschoeke SK: A network-based analysis of systemic inflammation in humans. Nature. 2005, 437 (7061): 1032-1037. 10.1038/nature03985.View ArticlePubMedGoogle Scholar
- Komurov K, White MA, Ram PT: Use of data-biased random walks on graphs for the retrieval of context-specific networks from genomic data. PLoS Comput Biol. 2010, 6 (8): - . pii: e1000889Google Scholar
- Nikolsky Y, Nikolskaya T, Bugrim A: Biological networks and analysis of experimental data in drug discovery. Drug Discov Today. 2005, 10 (9): 653-662. 10.1016/S1359-6446(05)03420-3.View ArticlePubMedGoogle Scholar
- Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, Kalyana-Sundaram S, Wei JT, Rubin MA, Pienta KJ: Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007, 39 (1): 41-51. 10.1038/ng1935.View ArticlePubMedGoogle Scholar
- Ekins S, Bugrim A, Brovold L, Kirillov E, Nikolsky Y, Rakhmatulin E, Sorokina S, Ryabov A, Serebryiskaya T, Melnikov A: Algorithms for network analysis in systems-ADME/Tox using the MetaCore and MetaDrug platforms. Xenobiotica. 2006, 36 (10–11): 877-901.View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee S, Kim J: A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype. BMC Bioinformatics. 2011, 12: 377-10.1186/1471-2105-12-377.PubMed CentralView ArticlePubMedGoogle Scholar
- Shojaie A, Michailidis G: Analysis of gene sets based on the underlying regulatory network. J Comput Biol. 2009, 16 (3): 407-426. 10.1089/cmb.2008.0081.PubMed CentralView ArticlePubMedGoogle Scholar
- Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ: GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics. 2009, 10: 161-10.1186/1471-2105-10-161.PubMed CentralView ArticlePubMedGoogle Scholar
- Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM: Human protein reference database–2006 update. Nucleic Acids Res. 2006, 34 (Database issue): D411-414.PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND–The Biomolecular Interaction Network Database. Nucleic Acids Res. 2001, 29 (1): 242-245. 10.1093/nar/29.1.242.PubMed CentralView ArticlePubMedGoogle Scholar
- Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G: MINT: the Molecular INTeraction database. Nucleic Acids Res. 2007, 35 (Database issue): D572-574.PubMed CentralView ArticlePubMedGoogle Scholar
- Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bahler J, Wood V: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, 36 (Database issue): D637-640.PubMed CentralPubMedGoogle Scholar
- Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R: IntAct–open source resource for molecular interaction data. Nucleic Acids Res. 2007, 35 (Database issue): D561-565.PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMed CentralView ArticlePubMedGoogle Scholar
- Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C: Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011, 39 (Database issue): D685-690.PubMed CentralView ArticlePubMedGoogle Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005, 33 (Database issue): D428-432.PubMed CentralView ArticlePubMedGoogle Scholar
- Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S: HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009, 37 (Database issue): D603-610.PubMed CentralView ArticlePubMedGoogle Scholar
- Schellenberger J, Park JO, Conrad TM, Palsson BO: BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics. 2010, 11: 213-10.1186/1471-2105-11-213.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu J, Vallenius T, Ovaska K, Westermarck J, Makela TP, Hautaniemi S: Integrated network analysis platform for protein-protein interactions. Nat Methods. 2009, 6 (1): 75-77. 10.1038/nmeth.1282.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.