- Research article
- Open Access
Interfacing cellular networks of S. cerevisiae and E. coli: Connecting dynamic and genetic information
© de Matos Simoes et al.; licensee BioMed Central Ltd. 2013
Received: 31 July 2012
Accepted: 25 April 2013
Published: 11 May 2013
In recent years, various types of cellular networks have penetrated biology and are nowadays used omnipresently for studying eukaryote and prokaryote organisms. Still, the relation and the biological overlap among phenomenological and inferential gene networks, e.g., between the protein interaction network and the gene regulatory network inferred from large-scale transcriptomic data, is largely unexplored.
We provide in this study an in-depth analysis of the structural, functional and chromosomal relationship between a protein-protein network, a transcriptional regulatory network and an inferred gene regulatory network, for S. cerevisiae and E. coli. Further, we study global and local aspects of these networks and their biological information overlap by comparing, e.g., the functional co-occurrence of Gene Ontology terms by exploiting the available interaction structure among the genes.
Although the individual networks represent different levels of cellular interactions with global structural and functional dissimilarities, we observe crucial functions of their network interfaces for the assembly of protein complexes, proteolysis, transcription, translation, metabolic and regulatory interactions. Overall, our results shed light on the integrability of these networks and their interfacing biological processes.
With the advent of systems and network biology it is now generally acknowledged that the concerted interactions on all cellular levels between genes and their gene products within a cell are governed by various types of gene networks [1–5]. For instance, the transcriptional regulatory network regulates the expression of genes, whereas protein interaction networks provide a map of diverse types of protein interactions leading, e.g., to the formation of complexes. Unfortunately, large parts of these gene networks are currently unknown leaving us with a fragmented understanding of these networks.
Whereas the first stage of the new era of network biology consisted in the construction and inference of such networks, the next step consists in their analysis, interpretation and integration [6–12]. However, in order to perform an integration of different types of networks, we need to enhance our understanding of the biological information contributed by the individual gene networks as well as their functional overlap. Such an overlap is required because otherwise these networks could not be sensibly integrated with each other due to a lack of common interfacing processes allowing an information flow from one level to the other.
The purpose of the present paper is to study the biological overlap on the genomics and genetics level among three different types of cellular networks, namely the transcriptional regulatory network (TRN), the protein-protein interaction network (PPN) and the gene regulatory network (GRN). A TRN and a PPN are phenomenological networks because they are constructed from direct measurements of physical interactions (bindings) between molecular entities, whereas a GRN is an inferential network that needs to be statistically inferred from indirect interaction measurements in the form of gene expression data . We study these three cellular networks for S. cerevisiae and E. coli, because the information available about these organisms is most advanced compared to other, more complex organisms. Besides this, S. cerevisiae is the simplest eukaryote system that contains crucial differences in its principle cellular organization compared to prokaryote organisms like E. coli. For instance, it is known that the transcription regulation of genes is much more intricate in eukaryotes, which utilize a combinatorial coding of different transcription factors [13, 14]. Further, eukaryotes maintain a combination of cis-regulating and trans-acting factors that is absent in prokaryotes [14, 15].
Protein-protein networks describe the interactions between proteins in form of physical interactions such as between proteins of a protein complex and transient interactions such as protein modification interactions (e.g. phposphorylation). The majority of experimentally available protein-protein interactions are measured by mass spectrometry methods and large-scale yeast-two-hybrid experiments (Y2H) [16–18]. A number of databases collect protein interaction data from small-scale and large-scale experiments, such as BioGrid , IntAct , MINT , MPact  that are also jointly available in meta databases that consider only physical protein interactions . The interactions in a transcriptional regulatory network describe protein-DNA interactions, where transcription factors bind to specific DNA motifs and regulate the gene expression activity of a given target gene. In general, these interactions are measured from protein location data, e.g., from ChIP-Chip or ChIP-seq experiments that are performed for individual transcription factors [24–26].
Finally, gene regulatory networks are inferred networks from large-scale microarray gene expression data sets frequently composed of multiple observational or experimental conditions. The most popular inference methods for gene regulatory networks are based on mutual information [27, 28]. A gene regulatory network describes the relationship between gene-pairs based on their mutual dependency of gene expression. Predictions observed in gene regulatory networks have been validated in small-scale experiments for individual transcription factors [29–31]. The gene regulatory network is thus most often equated to the transcriptional regulatory network. However, an inferred gene regulatory network is known to reflect multiple levels of the gene network including also physical interactions, e.g., of proteins belonging to the same protein complex. In our study we distinguish therefore the terminology gene regulatory network from transcriptional regulatory network.
Biological networks that are experimentally derived are given by a binary representation of validated interactions occurring on multiple cellular levels, e.g., between proteins, proteins and DNA, proteins and RNA, RNA and RNA. However, if only one binary network is given, this network lacks the information of the underlying dynamic of temporal and spatial processes that regulate, realize and coordinate the regulatory programs for gene expression, metabolism, growth, differentiation and proliferation of a cell or organism. Hence, it represents the ‘average’ molecular interactions among all these processes. In contrast, the high-throughput data analysis of large-scale gene expression, sequencing or proteomics allows to measure a snapshot from a multitude of specific conditions and cellular states. One major advantage of the inferred interactions of a gene regulatory network is that the cellular context and the average over the underlying condition-specific contexts is considered by large-scale gene expression datasets. The understanding of the relationship of experimental and inferential networks may allow to interpret the role of the cellular and condition-specific contexts from a gene regulatory network in the light of large-scale experimentally evaluated interactions.
In this paper, we investigate phenomenological and inferential cellular networks for S. cerevisiae and E. coli. More precisely, we compare the structural topology and the functional overlap of the transcriptional regulatory network (TRN), the protein-protein interaction network (PPN) and the gene regulatory network (GRN) for S. cerevisiae and E. coli on the genomic-scale and the pathway- and interaction-level. Further, we study the genetic connection between interacting genes and the co-localization of genes on the chromosomes. The purpose of our study is to shed light on the integrative abilities of these networks to obtain a multi-level description of the biological processes within a cellular context. We are particularly interested in understanding the role the GRN can play in such an integration. The reason for this is that, so far, discussions about the integration of networks largely exclude the GRN and focus on phenomenological networks like the TRN or the PPN solely. This seems understandable, because a GRN represents an inferred network based on statistical inference algorithms and, in addition, gene expression data used to infer a GRN do not capture any type of direct physical interactions among molecules. Instead, they measure merely the concentration of the expression level and, hence, provide indirect information about molecular interactions only.
Our results section is subdivided into four major parts (see Figure 1B). In part one, we provide a structural analysis of the S. cerevisiae and E. coli cellular networks. In part two, we conduct a functional network analysis by means of the Gene Ontology (GO)  database and in part three, we study structural and functional network features in an integrated manner. Finally, in part four of the results section, we analyze the connection between the chromosomal location of genes and their interconnectedness, as provided by the gene regulatory network (GRN), transcription regulatory network (TRN) and protein-protein interaction network (PPN).
Structural analysis of the S. cerevisiae and E. colicellular networks
General overview of the cellular networks
Summary statistics of the A) S. cerevisiaeand B) E. coli gene regulatory network (GRN), protein-protein network (PPN) and transcription regulatory network (TRN)
A) S. cerevisiae
B) E. coli
The network degree assortativity coefficient measures the average degree-degree correlation of connected nodes in a network . A positive coefficient suggests assortative mixing (nodes are likely connected to nodes with similar degree), a negative coefficient disassortative mixing (low degree nodes are likely connected to high degree nodes) and a zero coefficient non-assortative mixing. The assortativity coefficient is close to zero for the GRN (κ =0.01) and negative for the PPI (κ =−0.5968) and TRN (κ =−0.1314)(Table 1).
Pairwise degree-rank, transivity and betweenness correlation
In order to investigate the global structural similarities among the three different types of cellular networks, we perform a pairwise Spearman’s rank correlation test for the degree-ranks, betweenness and transitivity of the genes . We find that between the S. cerevisiae gene regulatory network (GRN) and protein-protein interaction network (PPN) the gene graph measures degree rank (ρ=0.09, p =8.9 e −11), betweenness (ρ =0.045, p =9.4e −4) and transitivity (ρ =0.025, p = 0.04) show a significant correlation. For the pairwise comparisons of the other networks, no significant correlation of the graph measures is observed. For E. coli all three pairwise comparisons of the node degrees between the GRN, TRN and PPN do not show any significant rank correlation coefficient, indicating crucial structural differences on a global-level.
Structural and functional overlap between the cellular networks
Between the S. cerevisiae cellular networks, we observe a percentage of shared edges in the range of 0.1% to 0.6% (Figure 3A) and for E. coli 0.7% to 1.5% (Figure 3B) between the three networks. Except for the interface of the S. cerevisiae TRN and PPN, the number of edges shared between the cellular networks (GRN-PPN, GRN-TRN) are statistically significant. The gene regulatory network (GRN) and the protein-protein network (PPN) show the largest percentage of shared edges of 0.6% in S. cerevisiae. For E. coli the largest percentage of shared edges is 1.5% and observed between the gene regulatory network (GRN) and the transcriptional regulatory network (TRN).
For S. cerevisiae the GRN and the PPN interface share a total of 667 edges among 785 genes (p=0) (Figure 3A and B). The largest connected components (with more than 10 genes) of the shared edges are shown in Figure 3C. The shown subnetworks consist in total of 248 genes and 316 interactions.
The S. cerevisiae TRN to GRN interface consists of 0.2% shared edges corresponding to 41 edges and 64 genes. The edges correspond to protein-DNA interactions from 37 transcription factors to 27 target genes. (Figure 3D). The genes with the largest degrees are transcription factors with 4 edges THI2 (thiamine biosynthesis) and 3 edges AFT1 (iron homeostasis), CIN5 (stress response), GAL80 (GAL genes repression), YAP6 (stress response).
The PPN and the TRN interface share a percentage of 0.1% edges that correspond to 92 edges between 115 genes. The edges include 53 transcription factors and 62 target genes (Figure 3E). The genes with the largest degrees are transcription factors with 6 edges SWI6 (cell cycle) and 5 edges YAP1 (stress response), STE12 (MAP kinase signaling), MSN2 (stress response).
In order to interpret the functional role of shared edges that define an interface between two networks, we performed a Gene Ontology enrichment analysis (GEA) and a gene pair enrichment analysis (GPEA), see ‘Methods’ section, for overlapping edges between two networks for the GO category ‘Biological Process’. The GEA tests for a functional enrichment of a gene set from a collection of connected genes present in a comparison of two cellular networks. In a GEA analysis the genes annotated by the same term are assumed associated to each other. However, the GPEA gives a more objective functional assessment for the individual edges of a network interface rather than the collection of genes participating in the interface. We prefer therefore in this section the detailed assessment of the GPEA analysis over the GEA analysis as the GPEA allows to perform the functional analysis on an edge-centered view compared to the gene-centered view of a GEA analysis. The results of the performed GEA analysis show similar results for all pairwise comparisons (Additional file 1: Figures S1 A,B and Additional file 1: Figure S2).
The GPEA of the S. cerevisiae GRN-TRN interface comprises 39 terms with p fdr ≤0.01 (Figure 4B). The processes involve the positive regulation of metabolic and biosynthetic processes (RNA), regulation of gene expression, galactose and alcohol metabolic process and response to chemical stimulus. The GPEA for the S. cerevisiae TRN-PPN interface shows 79 terms with p fdr ≤0.01 (Figure 4C) and describes biological processes involved in positive regulation of metabolic and biosynthetic processes (e.g., RNA), regulation of gene expression, metabolic processes for alcohol, sulfur, nitrogen, methionine, aspartate, cellular responses to stress, abiotic and organic substances and growth.
The S. cerevisiae GRN-PPN interface shows a close relationship to the actual cellular protein to mRNA interface of cytosolic and mitochondrial ribosomes for protein translation from transcribed mRNA, proteolysis, mitochondrial respiration and cell cycle. In contrast, the network comparisons to the TRN have the biological processes in common that are involved in the regulation of gene expression and biosynthetic and metabolic processes.
Between the E. coli cellular networks, we observe a percentage in the range of 0.7% to 1.5% of shared edges for the pairwise comparisons (Figure 3B). In contrast to the S. cerevisiae networks, we observe for the E.coli networks a higher percentage of shared edges between the GRN and the TRN despite the fact that the absolute number of shared edges is largest between the GRN and the PPN (Figure 3B).
The GEA analysis of the genes in the E. coli interfaces the enrichment shows only terms enriched for the GRN-PPN and only one term for the GRN-TRN genes and is therefore neglected.
The GPEA analysis results compared between S. cerevisiae and E. coli show similarities for the GRN-PPN interface in processes involved in protein complex assembly, respiration, ATP synthesis, translation and metabolic processes. For the GRN-TRN interface between S. cerevisiae and E. coli stress response related processes appear more prominent in E. coli. Compared to the S. cerevisiae GRN-TRN interface in E. coli terms related to response are more apparent rather than terms for describing regulation. The TRN-PPN interface in E. coli shows only 5 terms related to carbohydrate metabolic processes that are also present in the S. cerevisiae TRN-PPN interface.
Gene-pair enrichment analysis (GPEA) between cellular networks
In the previous section, we performed a detailed analysis of the functional role of the edges of the interfaces by a GPEA between the GRN, PPN and TRN networks of S. cerevisiae and E. coli. In order to quantify the global functional similarities between the GRN, PPN and TRN, we performed a gene pair enrichment analysis (GPEA) analysis for each of the three networks individually and compare the overlap of significant Gene Ontology terms between them.
Note that in a GPEA interface analysis, we test for the enrichment of edges shared by two networks between the genes in a particular GO term. For a global GPEA analysis that is performed in this section we test for the enrichment of edges in an individual network between the genes in a particular GO term. For each comparison, we include only the subnetwork of genes that are actually present in both networks. For this analysis we used a significance level of α=10 −4 and a Bonferroni adjustment of p-values to correct for multiple testing.
In the supplementary, we show a GPEA Gene Ontology map of the common significant terms of the GRN-PPN, GRN-TRN and TRN-PPN comparisons for S.cerevisiae(Additional file 1: Figure S4) and E. coli(Additional file 1: Figure S5). We do not observe major differences between the GPEA interface analysis and the global GPEA analysis for the individual networks. The significant terms describe similar biological processes as described in the previous section for the GPEA analysis of the network interface of a pair of cellular networks.
Functional congruence of connected genes within the cellular networks
In order to assess the functional congruence of interactions in the three cellular networks, we quantify the co-occurrence of functional annotations from Gene Ontology for connected genes. More specifically, for each network, we count how often directly connected genes share a common Gene Ontology term. For the comparison, the count frequencies of the co-occurring GO terms are normalized to values between 0 and 1.
In order to compare the results with reference networks, we randomize the gene labels. That means, for each network, we estimate randomized count frequencies from randomizations of the gene labels. In the following, we show the resulting cumulative distributions of the co-occurrence frequency for the three GO categories ‘Biological process’, ‘Cellular component’ and ‘Molecular function’ separately (see Figure 7C-G). For E.coli we excluded the GO category ‘Molecular function’ from the analysis because this category contained only 100 terms.
First, from Figure 7C-G one can see that the three corresponding networks with randomized gene labels (dashed lines) can be distinguished in all cases from the networks with non-randomized gene labels. This indicates that all three cellular networks (GRN, TRN and PPN) contain a considerable amount of biological information about S. cerevisiae and E. coli; because otherwise they would overlap with the results of the networks with randomized gene labels. Due to the fact that, the more GO terms are shared between connected genes, the further the cumulative distributions are shifted to the right (convergence to ‘1.0’ is prolonged), one can see that all three cellular networks contain more information about the GO category ‘Cellular Component’ than ‘Biological Process’ and ‘Molecular Function’.
Integrated structural and functional analysis of the cellular networks
Next, we perform a local structural comparison on the pathway-level of the cellular networks for S. cerevisiae and E. coli. That means, we conduct an integrated functional and structural analysis of these networks by identifying sets of genes that belong to particular GO terms, for which we assess their structural similarity, by using five graph-based centrality measures and Spearman’s rank correlation test.
Results for S. cerevisiae
A) S. cerevisiae Biological Process
B) S. cerevisiae Cellular Component
C) S. cerevisiae Molecular Function
D) S. cerevisiae transitivity centrality GRN/PPN (Cellular Component)
p f d r
Cytosolic proteasome complex
Microtubule associated complex
SCF ubiquitin ligase complex
ForE. colithe results are similar toS. cerevisiae, where almost none of the tested pathways showed a significant correlation between pairs of the three networks (GRN, TRN and PPN) for any of the five centrality measures (results not shown).
Chromosomal co-location and distance of interacting genes
The genes of bacterial species likeE.coli are organized in an operon structure that have a linear circular genome. In contrast, eukaryotic species likeS. cerevisiae have a nucleus, where the DNA of the genome is located in form of distinct chromosomes. Hence, eukaryote species have a more complex genomic organization and structure of the genome. Due to the higher complexity of eukaryotic species the evaluation and interpretation of predicted interactions resulting from co-regulation, co-localization or co-expression may be more difficult to judge compared to networks from E.coli. This may be a reason why the relationship of co-localization w.r.t. to the interactions described in cellular networks have not been investigated in great detail so far.
For this reason, we study the chromosomal co-location and the distance between interacting genes for the three networks (GRN, TRN and PPN). First, we estimate the percentages of interactions for genes that are co-located on the same chromosomes and for interactions that correspond to two genes that are located on different chromosomes. For the gene regulatory network, we observe a fraction of 17.14% of the interacting genes co-located on the same chromosome, while for the protein-protein network and the transcriptional regulatory network we observe only 8.05% (PPN) and 7.92% (TRN) of the interacting genes co-located on the same chromosome.
Next, we study the global degree ranks of the chromosomes for the three networks. The degree ranks of the chromosomes are calculated by the number of interactions corresponding to a particular chromosome. For each chromosome, we count the number of interactions by summing the degrees of each gene corresponding to a particular chromosome. For each network, the chromosomes are then ranked based on the count frequencies. We perform a pairwise comparison using Spearman’s rank correlation test between TRN, GRN and PPN, where we consider only genes present in both networks. From this analysis, we find a significant correlation for all pairwise comparisons (p≤2.2e−16 withr between 0.84 and 0.88).
Next, we study the co-localization distance of connected gene pairs within chromosomes for the three cellular networks. More precisely, we extract the genomic start and end coordinates of gene-pairs from the chromosomes and calculate their relative distances, δ , between connected genes in the GRN, TRN; and PPN. In order to obtain comparable values for chromosomes of differing length,δis normalized by the size of the chromosomes.
Interestingly, the GRN ofS. cerevisiae andE. colishows the strongest co-localization of connected genes (Figure 8A and B, GRN in green). The reason for this may come from the different nature of this network type, because in contrast to the two phenomenological networks TRN and PPN, the GRN is inferred from gene expression data. In [33, 40] it has been shown that such a GRN contains signatures of both phenomenological networks, that means, in the GRN one can find transcription regulations as well as protein-protein interactions. Further, in [41, 42] it has been found that inference algorithms used to estimate a GRN favor systematically molecular interactions involving genes having only a moderate number of interactions. In turn, this could hint that genes co-localized on the same chromosome are less connected.
Figure 8C shows the homogeneity of the relative co-location distance distributions among the chromosomes ofS. cerevisiae for the GRN, TRN and PPN. In this figure, ‘M’ indicates the mitochondrial chromosome and ‘adjacent’ and ‘global’ provide the average distances over all adjacent gene-pairs on a chromosome respectively all possible gene-pairs regardless of their location on the chromosome. Overall, the average distance between two interacting genes or proteins is aroundδ=0.3 inS. cerevisiae andE. coli (global) whereas the distance between adjacent genes is belowδ=0.001. In general, forS. cerevisiae the differences between the GRN, TRN and PPN are mild. Only for chromosome 4 (p-valuep bonf =4.44e−6) and chromosome 12 (p bonf =9.99e−03) we obtain a significant difference from a one-way ANOVA testing the equality of the mean distances of the three cellular networks for each chromosome for a significance level of α=0.05. This indicates that none of the three networks carries strongly different information about the chromosomes.
GEA and GPEA for chromosomal subnetworks
In this last results section, we want to study the functional enrichment of genes for each individual chromosome and gene-pairs of theS. cerevisiae cellular networks that are co-located on the same chromosome.
For the GRN, TRN and PPN we perform a Gene Ontology enrichment analysis (GEA) for the category ‘Biological Process’. The GEA is performed for each individual chromosome, using a hypergeometric test, where all genes of a particular network are defined as background. For the GEA analysis, we choose a significance level ofα=0.01 for nominal p-values to define a set of significant terms for each chromosome. Further, we perform a GPEA analysis for the GRN, TRN and PPN subnetworks of the genes for each individual chromosome. For the GPEA analysis we apply a Bonferroni multiple hypothesis testing correction with a significance level ofα=0.05. In addition, we estimate the fraction of overlapping significant terms of the GEA and GPEA.
In Figure 9E-J, we show an overview of the fractions of significant terms of the GPEA, GEA and their overlap for each chromosome. The fractions for the GRN (Figure 9E), PPN (Figure 9F) and TRN (Figure 9G) show slight variations among the chromosomes.
S. cerevisiae GPEA analysis Biological Process for individual chromosomes
S. cerevisiae GRN to PPN
Cellular metabolic process
Primary metabolic process
Macromolecule metabolic process
Cellular macromolecule metabolic process
rRNA metabolic process
Ribonucleoprotein complex biogenesis
S. cerevisiae GRN to TRN
Cellular metabolic process
Primary metabolic process
Macromolecule metabolic process
Cellular macromolecule metabolic process
S. cerevisiae TRN to PPN
Cellular metabolic process
Primary metabolic process
Cellular biosynthetic process
Regulation of primary metabolic process
Regulation of cellular metabolic process
Regulation of macromolecule biosynthetic process
Regulation of cellular macromolecule biosynthetic process
In this paper, we investigated relations between a transcription regulatory network, a protein-protein network and a gene regulatory network forS. cerevisiae andE. coli. For these cellular networks, we studied structural, functional and chromosomal properties (I) on the genomic-scale (global) involving the entire network, (II) on the pathway level (local) considering only well defined Gene Ontology terms and (III) on the level of individual interactions. That means our investigation comprised various relevant biological scales of these cellular networks.
From a structural analysis, we found that the three cellular networks (GRN, TRN and PPN) are considerably different from each other. This result is consistent on the genomic-, pathway- and interaction-level. For instance, on the interaction-level, a pairwise comparison between the three networks revealed that the percentage of common interactions (edges) is in general only in the range of 0.1% to 1.5% percent. This holds forE. coli andS. cerevisiae (see Figure 3A and B). However, we would like to point out that the GRN and the PPN are more similar to each other than the other two pairs of network combinations, at least forS. cerevisiae. An indicator for this has been found, e.g., by the significance of Spearman’s rank correlation coefficient of the degree, betweenness and transitivity distributions.
We studied the functional relationship between the cellular networks by a gene pair enrichment analysis (GPEA) in Gene Ontology terms for the network interfaces that are defined by the set of shared edges between two networks. The functional analysis showed a vast diversity of the biological processes in the network interfaces, although, the fractions of shared edges between the cellular networks are low. TheS. cerevisiae GRN and PPN interface showed the largest variety of biological processes and protein complexes related to the translation of mRNA (cytosolic and mitochondrial ribosome complex), proteolysis (proteasome complex), metabolic processes, mitochondrial (respiration chain complex II and IV, ATPase synthase complex, succinate dehydrogenase complex), cell cycle (M phase, MCM complex) and transcription (RNA polymerase complex, Cdc73/Paf1 complex). The most prominent and largest Gene Ontology ‘biological processes’ are related to ribosome biogenesis, translation and proteolysis. Proteins are directly synthesized from mRNA and, thus, the identified translational processes corresponds to thephysical interface between the PPN and the GRN inferred at the mRNA level. Note, that also the proteolysis process is related to the protein translation process, e.g., by post-translational protein processing. Further, the large variety of identified protein complexes in the GRN/PPN interface can be explained by the vital spatial and temporal dependency of genes that belong to the same protein complex to be functional. Protein complexes have been observed to have highly dependent expression profiles  and are thus likely to be identified in a gene regulatory network.
ForE. coli the GPEA analysis of the GRN and PPN interface showed similar results, where we observe biological processes related to protein translation, protein complex assembly and organization, gene expression, aerobic and anaerobic respiration, ATP synthesis, metabolic processes, ion transport and stress response. The majority of the observed biological processes forE. coli such as translation, protein complex assembly, metabolic processes, respiration and ATP synthesis are in agreement with the observation forS. cerevisiae that indicate to some extend a functional conservation of the GRN-PPN interface between both species.
InS. cerevisiae andE. coli TRN comparisons, we observe regulatory terms for metabolic processes, gene expression and response that are expected for transcription factor related interactions. For the comparison of theS. cerevisiae TRN-GRN the relative low percentages of shared edges are mainly explained by the complex relationships of regulatory protein-DNA interactions that regulate the expression of genes. The observed higher percentage of shared edges in the TRN-GRN compared to the TRN-PPN in both species is reasoned by the closer relationship of the gene expression dependencies inferred by a GRN to transcription factor to target gene interactions. In contrast toS. cerevisiae, forE. coli the percentage of shared edges of the GRN-TRN is slightly larger than for the GRN-PPN. This may result from the less complex regulation of gene expression inE. coli.
The experimental evaluation of single interactions in large inferred networks is very labor and cost intensive. For this reason, a functional co-occurrence of Gene Ontology  or pathway annotation is widely used to measure or weight the reliability of interactions between genes [33, 40, 44]. The principle idea of this approach is based on the concept ofguilt by association that emerged from the observation that genes with similar expression profiles also tend to share similar biological functions . From a functional co-occurrence analysis of the three networks using Gene Ontology terms from the categories ‘Biological Process’, ‘Cellular Component’ and ‘Molecular Function’, we found that each of the networks contains a considerable amount of biological information, because the biological information content of networks with randomized gene labels can be clearly distinguished (see Figure 7A-E). This is particularly interesting for the GRN, because it demonstrates that the information that can be extracted from such networks is, in terms of its biological knowledge, as valuable as the information extracted from the phenomenological networks (TRN and PPN).
Interestingly, the main difference of the gene regulatory network compared to the phenomenological networks is that a GRN isinferred from large-scale data by statistical methods and, thus, a different type of network compared to the TRN and the PPN, which are obtained from direct measurements of molecular interactions, e.g., via ChIP-chip or Y2H experiments. A potential reason for the rich biological information content of the GRN could be given by the way the underlying data are measured, namelyin vivo. That means, expression data come usually either from cell cultures, tissues or biopsies. In contrast, many PPN are based on yeast two-hybrid measurements that are measured outside a cellular and condition specific context .
Finally, we studied genetic information of interacting genes in the three networks. We found that there is a significant difference between the chromosomal co-location and the distance among interacting genes forE. coli andS. cerevisiae. While forE. coli there is a strong co-location effect for close neighbor genes, especially for the TRN and the GRN, this connection is largely absent inS. cerevisiae (see Figure 8A and B). This means that in the TRN ofE. coli transcription factors and the regulated genes are frequently closely located, whereas forS. cerevisiae this is not the case. ForS. cerevisiae this effect is quite homogeneous across all chromosomes (see Figure 8C). Interestingly, the GRN contains the largest fraction of co-localized interacting genes for both organisms, which is around 17%.
From a GPEA and GEA forS. cerevisiae, we found that the PPN contains much more chromosome specific interactions than the GRN and the TRN (see Figure 9F and I). This holds for an analysis of interactions within chromosomes (see Figure 9E-G) and between them (see Figure 9H-J). Also, the number of common significant terms is between the GRN and PPN largest compared to all other network pairs (see Figure 9C). This is again an indicator that the GRN contains a considerable amount of information from protein interactions.
As a summarizing conclusion from the results of our analysis, we hypothesize that the GRN plays a pivotal role when integrating the phenomenological TRN and PPN. The reason for this is that, as seen from our results the overlap between the PPN and TRN is in general much smaller than the overlap between the PPN and the GRN. This holds for a structural, functional and chromosomal analysis, independently, forE. coli andS. cerevisiae. The reason for this increased overlap comes largely from genes corresponding to the same protein complex  and the capability of the BC3NET inference method, used to infer the GRNs in this study, to infer such interactions [33, 40]. Hence, a GRN does not only seem to be beneficial as an interface to integrate a TRN with a PPN, but necessary.
Aside from our analysis, this is also plausible for biological reasons, as can be seen in Figure 1, because the data used for the inference of a GRN come from the concentration of mRNAs, which are intermediate between the DNA and the protein level. A further reason in favor for the inclusion of a GRN in such an integration is the type of information represented by the GRN. As explained above, in contrast to the phenomenological TRN and PPN, gene expression data represent the dynamics of the cellular system rather than a static information, because the dynamical concentration levels of mRNAs are converted into a snapshot of the underlying molecular interactions actually happening within these samples. This effect is enlarged by the fact that TRN and PPN are usually generated without considering multiple conditions or outside the cellular context. In contrast, if tissues or biopsies are used as samples to measure the gene expression such data are more representative of the dynamical processes within a cell. However, due to the nature of the employed experimental assays (Y2H or ChIP-chip) neither a PPN nor a TRN alone, or in combination, is sufficient to provide a condition specific map of molecular interactions. Instead, these networks correspond tocell type specific networks providing information about potential interactions. However, we hypothesize that if one combines these networks with acondition specific network, like the GRN, then the resulting integrated network conveyscondition specific information induced by the GRN. The reason for this condition specific behavior of a GRN, as discussed above, comes from the way these networks are obtained, namely from inferential methods ofin vivo samples. This suggests that the integration of different cellular networks should always consist of a combination ofcondition specific andcell type specific (condition unspecific) networks in order to obtain a phenotype specific model. As shown by our analysis, the observed overlap between the inferential GRN and the two phenomenological networks (PPN and TRN) provides ample opportunities for such an integration.
The BC3NET approach for GRN inference
The BC3NET  algorithm is a bagging approach for C3NET [47, 48]. The BC3NET algorithm is based on 3 major steps: (1) the generation of bootstrap data sets, (2) inferring ensemble of C3NET networks and (3) aggregation of the network ensemble into a weighted network, where a binomial test is performed for the edges with subsequent consideration for multiple hypothesis testing.
Briefly, the C3NET algorithm selects for each gene at most one edge to a gene neighbor which has the strongest mutual dependency as measured by the mutual information. For each inferred edge, a non-parametric significance test for mutual information is performed. The null distribution for the test is generated by a randomization of the gene expression matrix. We use a Bonferroni multiple hypothesis testing correction with a significance level ofα=0.05.
From a bootstrap ensemble consisting of 100 data sets a gene regulatory network is inferred using C3NET for each of these data sets. For the network inference, we use a B-spline estimator . A B-spline estimator uses a weighted discretization method to estimate mutual information from continuous values. For each bin, weights are estimated for the corresponding gene expression values from overlapping polynomial B-spline functions. Finally, the ensemble of networks is aggregated into a weighted network, where the weights describe the ensemble consensus rate for an edge. We use a binomial test whether or not an edge should be included in the resulting network. We retain edges for a significance level ofα=0.05 and a Bonferroni multiple hypothesis testing correction.
S. cerevisiae and E. coli gene expression data
We use theS. cerevisiae Affymetrix ygs98 RMA normalized gene expression compendium available from the Many Microbe Microarrays Database M3D . The yeast compendium dataset comprises 9,335 probesets and 904 samples from experimental and observational data from anaerobic and aerobic growth conditions, gene knockout and drug perturbation experiments. We map the yeast Affymetrix probeset IDs to gene symbols using the annotation of theygs98.db Bioconductor package. Multiple probesets for the same gene are summarized by the median expression value. The resulting expression matrix comprises a total of 9,163 features for 4,837 gene symbols and 4,326 probesets that cannot be assigned to a gene symbol.
Further, we use theEscherichia coli gene expression compendium from the Many Microbe Microarrays Database M3D . TheEscherichia coli compendium (version 4, build 6) comprises a total of 7,459 probesets corresponding to 7,258 unique probeset descriptions and 907 samples. We map theEscherichia coli probeset IDs to gene symbols or transcription units from the provided probeset descriptions from M3D. The dataset comprises a total of 4,335 mapped gene symbols and 2,923 mapped transcription units. Multiple probesets for the same gene or transcription unit are summarized by the median expression value.
Cellular networks of S. cerevisiae and E. coli
As gene regulatory network (GRN) we use the BC3NET gene regulatory network described in . This network consists of 9,163 genes and 27,493 edges. The transcriptional regulatory network (TRN)  consists of 4,441 genes with 157 transcription factors and includes a total of 12,873 interactions. We use a undirected version of the TRN for our analysis. We map ORF identifiers to gene symbols using the Bioconductororg.Sc.sgd.db package. We inferred a BC3NET gene regulatory network from the E. coli gene expression compendium  using the B-spline estimator. Finally, the transcription regulatory network for E.coli was assembled from protein-DNA interaction from RegulonDB . The network includes transcription factor to target gene and transcription factor to transcription factor interactions.
Protein-Protein interaction network (PPN)
ForSaccharomyces cerevisiae we use interactions mapped to gene symbols from PINA (version 2012-12-10) [23, 35]. PINA is a meta database of PPI interaction data from BioGrid , DIP , IntAct , MINT  and MPact . The network consists of 6,169 genes and 112,562 interactions.
The PPN forEscherichia coli was constructed from binary protein interaction from the DIP (version 2011-10-27) , IntAct (version 2012-12) , MINT (version 2012-10-26)  and MPIDB (version 2009-11-18)  database. From IntAct, MINT and MPIDB the interactors uniprotkb gene symbols were extracted. For the DIP network uniprotkb, refseq or DIP cross-references to gene symbols were available from EBI or extracted from uniprot entries if the gene symbol was missing (http://www.ebi.ac.uk). The interactions from DIP (12,636 interactions), IntAct (16,517), MINT (5,250) and MPIDB (2,215) were merged resulting in an undirected protein-protein interaction network. The resulting network consists of 3,619 genes and 20,198 interactions.
Network functional co-occurrence analysis
We compare the edge reliability between inferred and phenomenological cellular networks forE. coli andS. cerevisiae. The edge reliability is quantified based on the extend of co-occurrence of functional Gene Ontology annotation of connected genes for each network. We compute the cumulative distribution of the number of shared Gene Ontology terms for the edges of the cellular networks for the Gene Ontology classes Biological Process, Molecular Function and Cellular Component.
where gives the count frequency score of co-occurrence. In the next step we estimate the cumulative distribution fromd of the of all gene pairs to compare the GO co-occurrence between networks. The resulting distribution vector is scaled to the origin. In order to judge the extend of functional co-occurrence that is expected by random chance we randomize gene labels for each network and computed r from 100 randomizations.
Interfaces between cellular networks
where E i and E j define the set of edges in network i and j.
wherem is the total number of observed interactions of the joined network 1 and network 2,N is the number of all possible interactions between the genes shared by both networks andkis the number of shared edges between network 1 and network 2.
Network centrality measures
whereA is the adjacency matrix of the network.
where i ≠ v and the total number of nodes N in the network. If no path exists between two nodes, d (v,i) gives the total number of nodes N.
where |e ij | is the number of edges from vertex v i to all direct neighbors v j and gives the total number of edges in the clique of k vertices.
Network pathway analysis using centrality measures
For the comparison between two networks, we consider only the subnetworks of common genes. For two cellular networks,G a andG b , we estimate the degree, betweenness, transitivity, hubscore and closeness centrality values for all genes for a Gene Ontology (GO) term. Then, for each GO term, we perform a Spearman’s rank correlation test  for the ranks of the values for each centrality measure between a pair of networks. We adjust p-values using a FDR  correction for a given significance level ofα=0.05.
Gene ontology enrichment analysis and annotation
We use Gene Ontology annotation using the Bioconductor  package org.Sc.sgd.db for S. cerevisiae and org.EcK12.eg.db forE. coli. Gene Ontology terms and class definitions (BP, MF, CC) were extracted from the GO.db Bioconductor package. The Gene Ontology enrichment analysis (GEA) was performed with thetopGO package  using a hypergeometric test.
Gene Pair Enrichment Analysis (GPEA)
The p-value gives an estimate of the probability to observe k or more edges between genes from the given GO-term. For the analysis we consider Gene Ontology Biological Process terms with more than 2 and less than 1000 genes. The p-values are adjusted using a bonferroni multiple hypothesis testing procedure. We select terms significant with p bonferroni = 0.0001.
In the following we describe the GPEA analysis for functional gene pair enrichment of shared edges between two networks. For each Gene Ontology term we perform a hypergeometric test (one-sided Fisher exact test) for the enrichment of gene pairs sharing the same functional annotation between two networks (analog to eqn. 8). For p genes the joint number of edges for two networks is given by. The total number of n edges common between two networks is given by. The joint number of m edges of two subnetworks S for a GO termt is given by. The number of edges of the subnetwork S common between two networks for a GO term t is given by. The p-value gives an estimate of the probability to observe k or more edges between genes from the given GO-term. For the analysis we consider Gene Ontology Biological Process terms with more than 2 and less than 500 genes. The p-values are adjusted using a Benjamini Hochberg (fdr) multiple hypothesis testing procedure. We select terms significant with p fdr = 0.01.
Gene ontology graph visualization
For the visualization of the Gene Ontology graphs we use our currently unpublished R-Package drawgo. For a set of defined Gene Ontology terms a Gene Ontology subgraph is extracted from Gene Ontology including the set of significant Gene Ontology terms and the corresponding parental terms . In order to reduce the size of the GO graph for a visualization of the graph we delete iteratively non-significant parental terms from the graph. The corresponding child terms of a deleted parental GO term are connected to the corresponding parent GO terms of the deleted parental GO term of the graph. In the visualization the connections between Gene Ontology terms do not necessarily show direct parent child connections and also include more distant ancestor child connection when non-significant direct parents were deleted. The layout in drawgo is based on a force-based grid layout for the visualization. The graph procedures and visualization is based onigraph .
Relative gene location distance
We retrieved the E. coli K12 Genebank refSeq coordinates from the UCSC Microbial Genome Browser . We define the relative distance δ∈[0,1] between two genes g i and g j that are co-located on the same chromosome by the distance between the mid points of the two genes normalized by the size of the chromosome.
whereend(g i )≥start(g i ). start() gives the start and end() the physical end coordinate in bp (base pair) units.
form(g j )>m(g i ).L k is the chromosome size of chromosome kin bp where g i and g j are co-located.
Matthias Dehmer thanks the Standortagentur Tirol for financial support.
- Alon U: An Introduction to Systems Biology: Design Principles of Biological Circuits. 2006, Boca Raton: Chapman & Hall/CRCGoogle Scholar
- Barabasi A, Oltvai Z: Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004, 5: 101-13. 10.1038/nrg1272.View ArticlePubMedGoogle Scholar
- Emmert-Streib F, Glazko G: Network Biology: A direct approach to study biological function. Wiley Interdiscip Rev Syst Biol Med. 2011, 3 (4): 379-391. 10.1002/wsbm.134.View ArticlePubMedGoogle Scholar
- Palsson B: Systems Biology. 2006, Cambridge, New York: Cambridge University, PressView ArticleGoogle Scholar
- Vidal M: A unifying view of 21st century systems biology. FEBS Let. 2009, 583 (24): 3891-3894. 10.1016/j.febslet.2009.11.024.View ArticleGoogle Scholar
- Bebek G, Koyuturk M, Price ND, Chance MR: Network biology methods integrating biological data for translational science. Brief Bioinform. 2012, 13 (4): 446-459. 10.1093/bib/bbr075.PubMed CentralView ArticlePubMedGoogle Scholar
- Analysis of Complex Networks: From Biology to Linguistics. Edited by: Emmert-Streib F. 2009, Weinheim: Wiley-VCHGoogle Scholar
- Emmert-Streib F, Dehmer M: Networks for Systems biology: conceptual connection of data and function. IET Syst Biol. 2011, 5 (3): 185-10.1049/iet-syb.2010.0025.View ArticlePubMedGoogle Scholar
- Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri P, Aitchison J, Hood L, Siegel A, Bolouri H: A data integration methodology for systems biology. Proc Natl Acad Sci USA. 2005, 102 (48): 17296-172301. 10.1073/pnas.0508647102.PubMed CentralView ArticlePubMedGoogle Scholar
- Mueller L, Kugler K, Graber A, Emmert-Streib F, Dehmer M: Structural measures for network biology Using QuACN. BMC Bioinformatics. 2011, 12 (1): 492-10.1186/1471-2105-12-492.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Chen B: Integrated cellular network of transcription regulations and protein-protein interactions. BMC Syst Biol. 2010, 4: 20-10.1186/1752-0509-4-20.PubMed CentralView ArticlePubMedGoogle Scholar
- Yeger-Lotem E, Margalit H: Detection of regulatory circuits by integrating the cellular networks of protein-protein interactions and transcription regulation. Nucleic Acids, Res. 2003, 31: 6053-6061. 10.1093/nar/gkg787.View ArticleGoogle Scholar
- Chen L: Combinatorial gene regulation by eukaryotic transcription factors. Curr Opin Struct Biol. 1999, 9 (1): 48-55. 10.1016/S0959-440X(99)80007-4.View ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Reynolds DB, Yoo J, Tagne J-b, et al: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104. 10.1038/nature02800.PubMed CentralView ArticlePubMedGoogle Scholar
- Barreram LO, Ren B: The transcriptional regulatory code of eukaryotic cells–insights from genome-wide analysis of chromatin organization and transcription factor binding. Curr Opin Struct Biol. 2006, 18 (3): 291-298. 10.1016/j.ceb.2006.04.002.View ArticleGoogle Scholar
- Bantscheff M, Schirle M, Sweetman G, Rick J, Kuster B: Quantitative mass spectrometry in proteomics: a critical review. em Anal, Bioanal Chem. 2007, 389 (4): 1017-1031. 10.1007/s00216-007-1486-6.View ArticleGoogle Scholar
- Koegl M, Uetz P: Improving yeast two-hybrid screening systems. Brief Funct, Genomic Proteomic. 2007, 6 (4): 302-312.View ArticleGoogle Scholar
- Vidal M, Cusick ME, Barabási A-L: Interactome networks and human disease. Cell. 2011, 144 (6): 986-998. 10.1016/j.cell.2011.02.016.PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34: D535-9. 10.1093/nar/gkj109.PubMed CentralView ArticlePubMedGoogle Scholar
- Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, Feuermann M, Ghanbarian A, Kerrien S, Khadake J, Kerssemakers J, Leroy C, Menden M, Michaut M, Montecchi-Palazzi L, Neuhauser S, Orchard S, Perreau V, Roechert B, van Eijk K, and Hermjakob H: The IntAct molecular interaction database in 2010. Nucleic Acids Res. 2010, 38: D525-D531. 10.1093/nar/gkp878.PubMed CentralView ArticlePubMedGoogle Scholar
- Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Palma A, Nardozza A, Santonico E, Castagnoli L, Sacco F: MINT, the molecular interaction database 2012 update. Nucleic Acids Res. 2012, 40: D857-D861. 10.1093/nar/gkr930.PubMed CentralView ArticlePubMedGoogle Scholar
- Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes H, Stumpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res. 2006, 34: D436-D441. 10.1093/nar/gkj003.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu J, Vallenius T, Ovaska K, Westermarck J, Makela T, Hautaniemi S: Integrated network analysis platform for protein-protein interactions. Nat Methods. 2009, 6: 75-77. 10.1038/nmeth.1282.View ArticlePubMedGoogle Scholar
- Buck M, Lieb J: Chip-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics. 2004, 83 (3): 349-360. 10.1016/j.ygeno.2003.11.004.View ArticlePubMedGoogle Scholar
- Kidder BL, Hu G, Zhao K: ChIP-Seq: technical considerations for obtaining high-quality data. Nat Immunol. 2011, 12 (10): 918-922. 10.1038/ni.2117.PubMed CentralView ArticlePubMedGoogle Scholar
- Park K, Kim D: Localized network centrality and essentiality in the yeast-protein interaction network. Proteomics. 2009, 9 (22): 5143-5154. 10.1002/pmic.200900357.View ArticlePubMedGoogle Scholar
- Cover T, Thomas J: Information Theory. 1991, John Wiley & Sons, IncGoogle Scholar
- Emmert-Streib F, Glazko G, Altay G, de Matos Simoes: Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front Genet. 2012, 3: 8-PubMed CentralView ArticlePubMedGoogle Scholar
- Basso K, Margolin A, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005, 37: 382-390. 10.1038/ng1532.View ArticlePubMedGoogle Scholar
- Faith J, Hayete B, Thaden J, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins J, Gardner T: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007, 5: e8-10.1371/journal.pbio.0050008.PubMed CentralView ArticlePubMedGoogle Scholar
- Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006, 7 (Suppl 1): S7-10.1186/1471-2105-7-S1-S7.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- de Matos Simoes R, Emmert-Streib F: Bagging statistical network inference from large-scale gene expression data. PLoS ONE. 2012, 7 (3): e33624-10.1371/journal.pone.0033624.PubMed CentralView ArticlePubMedGoogle Scholar
- Balaji S, Babu M, Iyer L, Luscombe N, Aravind L: Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. J Mol Biol. 2006, 360: 213-227. 10.1016/j.jmb.2006.04.029.View ArticlePubMedGoogle Scholar
- Cowley M, Pinese M, Kassahn K, Waddell N, Pearson J, Grimmond S, Biankin A, Hautaniemi S, Wu J: PINA v2.0: mining interactome modules. Nucleic Acids Res. 2012, 40: D862-D865. 10.1093/nar/gkr967.PubMed CentralView ArticlePubMedGoogle Scholar
- Newman M: Assortative mixing in networks. Phys Rev Lett. 2002, 89: 208701-View ArticlePubMedGoogle Scholar
- Newman M: Networks: An Introduction. 2010, Oxford: Oxford University PressView ArticleGoogle Scholar
- Sheskin DJ: Handbook of Parametric and Nonparametric Statistical Procedures. 2004, Boca Raton: RC PressGoogle Scholar
- Newman MEJ: The structure and function of complex networks. SIAM Rev. 2003, 45: 167-256. 10.1137/S003614450342480.View ArticleGoogle Scholar
- de Matos Simoes R, Tripathi S, Emmert-Streib F: Organizational structure of the peripheral gene regulatory network in B-cell lymphoma. BMC Syst Biol. 2012, 6: 38-10.1186/1752-0509-6-38.PubMed CentralView ArticlePubMedGoogle Scholar
- Altay G, Emmert-Streib F: Revealing differences in gene network inference algorithms on the network-level by ensemble methods. Bioinformatics. 2010, 26 (14): 1738-1744. 10.1093/bioinformatics/btq259.View ArticlePubMedGoogle Scholar
- Emmert-Streib F, Altay G: Local network-based measures to assess the inferability of different regulatory networks. IET Syst Biol. 2010, 4 (4): 277-288. 10.1049/iet-syb.2010.0028.View ArticlePubMedGoogle Scholar
- Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002, 12: 37-46. 10.1101/gr.205602.PubMed CentralView ArticlePubMedGoogle Scholar
- Madhamshettiwar P, Maetschke S, Davis M, Reverter A, Ragan M: Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Gen Med. 2012, 4 (5): 41-Google Scholar
- Wolfe C, Kohane I, Butte A: Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks. BMC Bioinformatics. 2005, 6: 227-10.1186/1471-2105-6-227.PubMed CentralView ArticlePubMedGoogle Scholar
- Maslov S, Sneppen K: Specificity and stability in topology of protein networks. Science. 2002, 296: 910-913. 10.1126/science.1065103.View ArticlePubMedGoogle Scholar
- Altay G, Emmert-Streib F: Inferring the conservative causal core of gene regulatory networks. BMC Syst Biol. 2010, 4: 132-10.1186/1752-0509-4-132.PubMed CentralView ArticlePubMedGoogle Scholar
- Altay G, Emmert-Streib F: Structural Influence of gene networks on their inference: Analysis of C3NET. Biol Direct. 2011, 6: 31-10.1186/1745-6150-6-31.PubMed CentralView ArticlePubMedGoogle Scholar
- Daub C, Steuer R, Selbig J, Kloska S: Estimating mutual information using B-spline functions–an improved similarity measure for analysing gene expression data. BMC Bioinformatics. 2004, 5: 118-10.1186/1471-2105-5-118.PubMed CentralView ArticlePubMedGoogle Scholar
- Faith J, Driscoll M, Fusaro V, Cosgrove E, Hayete B, Juhn F, Schneider S, Gardner T: Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 2008, 36: D866-D870.PubMed CentralView ArticlePubMedGoogle Scholar
- Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, Garcia-Sotelo J, Lopez-Fuentes A, Porron-Sotelo L, Alquicira-Hernandez S, Medina-Rivera A, Martinez-Flores I, Alquicira-Hernandez K, Martinez-Adame R, Bonavides-Martinez C, Miranda-Rios J, Huerta A, Mendoza-Vargas A, Collado-Torres L, Taboada B, Vega-Alvarado L, Olvera M Olvera, Grande R, Morett E, Collado-Vides J: RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 2011, 39: D98-D105. 10.1093/nar/gkq1110.PubMed CentralView ArticlePubMedGoogle Scholar
- Xenarios I, Salwinski L, Duan X, Higney P, Kim S, Eisenberg D: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002, 30: 303-30s5. 10.1093/nar/30.1.303.PubMed CentralView ArticlePubMedGoogle Scholar
- Goll J, Rajagopala S, Shiau S, Wu H, Lamb B, Uetz P: MPIDB: the microbial protein interaction database. Bioinformatics. 2008, 24: 1743-1744. 10.1093/bioinformatics/btn285.PubMed CentralView ArticlePubMedGoogle Scholar
- Freeman LC: Centrality in social networks: Conceptual clarification. Social Networks. 1979, 1: 215-239.View ArticleGoogle Scholar
- Watts D, Strogatz S: Collective dynamics of ’small-world’ networks. Nature. 1998, 393: 440-442. 10.1038/30918.View ArticlePubMedGoogle Scholar
- Kleinberg JM: Authoritative sources in a hyperlinked environment. J ACM. Sept 1999, 46 (5): 604-632. 10.1145/324133.324140.View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc, Ser B (Methodological). 1995, 57: 125-133.Google Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22: 1600-1607. 10.1093/bioinformatics/btl140.View ArticlePubMedGoogle Scholar
- Beissbarth T, Speed T: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20: 1464-1465. 10.1093/bioinformatics/bth088.View ArticlePubMedGoogle Scholar
- Csardi G, Nepusz T: igraph-package. 2008Google Scholar
- Schneider K, Pollard K, Baertsch R, Pohl A, Lowe T: The UCSC archaeal genome browser. Nucleic Acids Res. 2006, 34: D407-D410. 10.1093/nar/gkj134.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.