We reported a user-friendly network analysis and interpretation tool called atBioNet and described three case studies using atBioNet to identify key functional modules and provide hypotheses for the underlying mechanisms of diseases based on proteins/genes lists comprising candidate biomarkers from omics technologies. atBioNet leverages existing knowledge from seven publicly available PPI databases and adds powerful network analysis and visualization tools. The system has the capability to expand knowledge based on a list of seed proteins/genes through analysis of the resulting functional modules. The functional modules were identified by using SCAN, a fast structural clustering method, and annotated with KEGG pathways.
Recent advances in omics technologies have generated huge amounts of publicly available PPI data. Several visualization and network analysis tools have been developed to leverage this data for different purposes. VisANT  is an integrative framework for the analysis, mining, and visualization of pathways and integrated omics data. VisANT generates networks for use in systems biology research from input proteins/genes by querying integrated PPI data from multiple sources. The resulting network is annotated by using information from KEGG and GO. PINA  is another network construction, analysis, and visualization tool that contains information from six public PPI databases. It contains ~2400 pre-determined modules. Given a input proteins/genes, PINA determined the over-expressed modules by performing an enrichment test and then offer biological context to the modules that are annotated with GO, KEGG, protein domains, and MsigDB . Unlike PINA, atBioNet constructs modules at the time of the query, which is dynamic and allows novel modules to be generated based on the input proteins/genes. NAViGaTOR  mainly focuses on 2D or 3D visualization of PPI networks as well as GO annotation of the nodes. Cytoscape  allows users to build a customized pipeline to analyze PPI data by using different plug-ins and annotation tools, but the effective use of Cytoscape requires a thorough understanding of the tools and plug-ins available and expertise in organizing and interpreting the output.
atBioNet performs functional module analysis and biomarker identification by integrating public PPI data sources. atBioNet begins from the hypothesis that proteins/genes in the same module are likely involved in the same biological functions or processes. This approach allows un-annotated proteins/genes to be used as potential biomarkers for the same human disease that the input proteins/genes are associated with. Furthermore, sub networks are detected using the SCAN algorithm , which has been demonstrated to be a powerful tool for large-scale network analysis from both statistical and biological points of views.
More specifically, the SCAN algorithm quickly, efficiently, and accurately analyzes networks. SCAN’s runtime scales linearly with the size of the network, which makes it a scalable approach for extremely large networks with hundreds of thousands or even millions of nodes. Moreover, SCAN accurately finds clusters, and also identifies nodes playing crucial roles with only one traverse of the network. The power of SCAN has been demonstrated in applications including PPI networks [33, 61] and social networks [62, 63] in addition to the three study cases we have examined in this study.
From a clinical point of view, the rationale behind the functional module analysis and biomarker discovery performed by atBioNet is to find effective and robust biomarkers for a disease. When the number of candidate genes is too small to identify functional module, additional proteins/genes can be added from atBioNet's database to expand the network. In contrast, when there is a large amount of input proteins/genes associated with a phenotype, atBioNet focuses on detecting functional modules, the hub genes (e.g., transcription factors or regulatory genes), and outlier genes based solely on the list of seed proteins/genes. Thus, potential biomarkers that are important to multiple biological processes, mechanisms, or functions can be identified.
The three case studies presented here each used the default network parameters and the results were consistent with the knowledge about these diseases. atBioNet provides several options for network analysis, such as the choice of the starting PPI database, control of the stringency of node additions during network construction, etc. The particular options used will depend on specific research questions and scenarios; for example, for a very large list of seed proteins/genes, the user may choose to construct a network using only the seed proteins/genes without adding any additional nodes. To build a more reliable network, the user can choose to use a smaller, more stringent database .
Moreover, all three case studies are based on single genomic signature as a seed for network analysis. Actually, the network approach will be more powerful by using multiple signatures reported in different literature studies for a particular disease to enhance the accuracy of the functional modules interpreting the underlying mechanisms of the disease. It has been well-documented that different studies of the same disease often produce gene signatures with few overlapping elements , but they might reflect different mechanisms associated with the disease. Using atBioNet, different signatures can be integrated into the genome-wide network view, which can be used to further our understanding of biomarker specificity and broadening the search space and thus potentially offering a more comprehensive view of the PPI networks underlying the disease.
Another potential use of atBioNet is to study the mechanisms related to therapeutic use of drug combinations, which have become very effective due to medicinal research advancements in recent years . We can combine the signature genes associated with each drug and use the union list as a seed for network analysis. While individual drugs may affect a set of regulatory genes or pathways, combining drug actions in the context of biological mechanisms underlying the disease condition could lead to more effective therapies for a complicated clinical situation.
In the current version, atBioNet contains only human protein-protein interactions. Our next major revision will expand the available data to include the STRING and NCBI PID databases as well as covering PPI data from other species. Another limitation of the current atBioNet is that the biological annotation is exclusively relied on KEGG. We will add other biological annotation sources in the future such as GO, Biocarta pathways, disease-centric databases, and more. Additionally, owing to memory constraints in Java, there is an upper limit of approximately 3000 seed proteins/genes when using the “add all directly connected nodes” option in atBioNet. Nevertheless, the user is able to allocate more memory to the application to allow network analysis for a larger number of seed proteins/genes.