Open Access

Evidence for systems-level molecular mechanisms of tumorigenesis

  • Pilar Hernández1,
  • Jaime Huerta-Cepas2,
  • David Montaner2,
  • Fátima Al-Shahrour2,
  • Joan Valls1,
  • Laia Gómez1,
  • Gabriel Capellá1,
  • Joaquín Dopazo2Email author and
  • Miguel Angel Pujana1Email author
BMC Genomics20078:185

DOI: 10.1186/1471-2164-8-185

Received: 17 November 2006

Accepted: 20 June 2007

Published: 20 June 2007



Cancer arises from the consecutive acquisition of genetic alterations. Increasing evidence suggests that as a consequence of these alterations, molecular interactions are reprogrammed in the context of highly connected and regulated cellular networks. Coordinated reprogramming would allow the cell to acquire the capabilities for malignant growth.


Here, we determine the coordinated function of cancer gene products (i.e., proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts, hereafter referred to as "CGPs") defined as their topological properties and organization in the interactome network. We show that CGPs are central to information exchange and propagation and that they are specifically organized to promote tumorigenesis. Centrality is identified by both local (degree) and global (betweenness and closeness) measures, and systematically appears in down-regulated CGPs. Up-regulated CGPs do not consistently exhibit centrality, but both types of cancer products determine the overall integrity of the network structure. In addition to centrality, down-regulated CGPs show topological association that correlates with common biological processes and pathways involved in tumorigenesis.


Given the current limited coverage of the human interactome, this study proposes that tumorigenesis takes place in a specific and organized way at the molecular systems-level and suggests a model that comprises the precise down-regulation of groups of topologically-associated proteins involved in particular functions, orchestrated with the up-regulation of specific proteins.


In recent years, functional genomic and proteomic approaches have generated a vast quantity of data through which cellular processes, pathways and pathologies can be deciphered. In particular, microarray-based studies have provided genome-wide expression data for almost every type of human cancer [1]. As a consequence of genetic and molecular analyses, the sequence of events that contributes to certain types of human cancer, for example colorectal cancer [2], is relatively well characterized.

Although our understanding of the genetic determinants of tumorigenesis has been greatly enhanced by these approaches [3], other levels of molecular complexity have emerged [46]. Cancer arises from the consecutive acquisition of genetic alterations that, in general, can be recognized as the combination of the loss of function or transcriptional down-regulation of particular genes (tumor suppressor genes) and the activation or transcriptional up-regulation of other genes (oncogenes) [3]. Downstream of the genetic alterations are expression changes in many genes in cancer cells, mediated in part by the activation or inactivation of transcription factors [7, 8]. It is thought, then, that genetic and molecular alterations promote tumorigenesis in the context of highly connected and regulated gene and protein networks [46, 9]. Cellular transformation therefore requires dynamic interconnectedness, where specific changes in the information circuitry primarily dictated by up- or down-regulated genes activate or deactivate pathways and, finally, change the cell phenotype. In order to develop a systems-level understanding of cellular transformation it would therefore be necessary to determine the properties and organization of CGPs (proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts) in cellular networks.

This study examines the topological properties of CGPs in the human interactome network. Wachi et al. [10] previously reported increased connectivity of differentially expressed proteins in lung cancer tissues, and Jonsson and Bates (2006) [11] reported differences in the global topological features of mutated cancer proteins relative to non-mutated proteins. However, there is no comprehensive study of different cancer types that examines both the local and global topological properties of CGPs and their organization relative to the structural integrity of the network and to molecular mechanisms of tumorigenesis. The results of these analyses suggest that CGPs are central to information exchange and propagation, and that their topological organization supports fundamental biological processes of neoplasia.


Integration of interactome and cancer transcriptomes

To investigate the systems-level organization of CGPs, we integrated interactome and cancer transcriptome data sets (Figure 1). The interactome data set contains compiled and filtered binary human protein-protein interactions from all currently available databases (HPRD, BIND, DIP, MINT, INTACT and MIPS; detailed in Gandhi et al. [12]). This data set is mainly derived from one-at-a-time experimentally demonstrated interactions compiled through a literature curation process [13], which suggests a high degree of reliability. The corresponding scale-free interactome network contains 7,388 proteins and 24,109 interactions, which follow a power-law distribution with an average degree of 6.52 (Additional file 1). The longest distance between any two proteins is 15 and the average distance is 4.50. This interactome network constitutes a scaffold in which different types of functional genomics data can be integrated to ascertain the coordinated function of proteins under particular conditions.
Figure 1

Study strategy. Integration of binary protein-protein interactions and gene expression data sets for the investigation of the topological properties and organization of cancer gene products (CGPs) in the human interactome network.

To analyze cancer transcriptomes, we chose data sets of high-incidence cancer types containing a large number of tumors and healthy tissue samples in order to obtain more consistent gene lists (data sets with at least 10 samples of each type). Four expression data sets were analyzed, corresponding to prostate, lung and colorectal samples [1417] (Additional file 2). We first focused our analysis on the prostate data sets because they were independently generated and contain publicly available raw data, which meant an identical statistical methodology could be applied for differential gene expression detection and to replicate findings [14, 15]. These data sets contain data for 50 healthy tissue samples each and for 52 and 38 tumor samples, respectively. Differentially expressed genes between healthy and tumor samples were then identified using an empirical Bayes moderated t-test and adjusting P values with a false discovery rate of 5%. Thus, 1,429 and 981 CGPs encoded by up- and down-regulated genes in prostate tumors were mapped in the interactome network, respectively. Up- and down-regulated gene sets overlapped between studies by 50.33% and 41.05%, respectively. Accordingly, both studies also showed a similar distribution of Gene Ontology (GO) [18] terms annotation in the complete gene ranking (Additional file 3), which essentially supports a good agreement between the expression data sets. The numbers of differentially expressed genes obtained in this analysis are consistent with the numbers given in the original publications. A comparison of healthy and tumor tissues is likely to reveal more dramatic expression differences than a comparison of tumor subtypes, thus identifying differentially expressed genes that are involved in all stages of the neoplastic process.

In order to extend the analysis to different types of CGPs, we used expression data sets derived from the study of lung samples (230 tumors and 17 healthy), which included different cellular types, and colorectal samples (18 tumors and 36 healthy) [16, 17]. The lung expression data set was analyzed using the same statistical methodology as described for the prostate, while genes differentially expressed in colorectal tumors identified on a different microarray platform were taken from a public repository [19]. Sets of differentially expressed probes for each cancer type are detailed in (Additional file 2). Integration of the human interactome and cancer transcriptomes was then completed by matching GeneIDs.

Centrality of CGPs

Analysis of the topological properties of CGPs in the interactome network was focused on centrality by measuring: i/ degree, which accounts for the total number of first interactions; ii/ betweenness, which accounts for the frequency with which a node in a network is found in the shortest path between any two other nodes; and iii/ closeness, which accounts for the proximity of a node to all other nodes in a network. To determine the significance of each measure, we compared the median of CGPs to the median of the total of nodes in the network using the Mann-Whitney U test. We also compared the results to equivalent randomly selected protein sets in the interactome.

The analysis of prostate CGPs revealed higher values for degree, betweenness and closeness than in the complete interactome set (Mann-Whitney U test P values < 10-5) or equivalent randomly selected sets (empirical P values < 0.01) (Figure 2 and Additional file 4). The results for the two prostate expression data sets were concordant. Importantly, higher values of centrality for lung and colorectal CGPs were also observed (Additional file 4). These results indicate that centrality in the interactome network is a common property of proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts.
Figure 2

Centrality of CGPs. Results of the Mann-Whitney U test (MW) are shown at the top right in each box. Results of comparing each centrality measure between prostate CGPs (vertical arrow; mean value) and 1,000 equivalent randomly selected protein sets (curves; mean values) (data sets 1 [14] and 2 [15]) are also shown. CGPs mean values and 95% confidence intervals (CI), as well consequent empirical P values are shown.

To further examine the topological properties of CGPs, we analyzed the manner in which they are related to their neighbors by examining their constraint, which accounts for the dependency of a node on its neighborhood. CGPs showed significantly lower average values of this measure (Mann-Whitney U test P values < 10-7; empirical P values < 0.01) (Additional file 4). CGPs therefore appear to act independently of their neighborhood, which supports the importance of these products in terms of information exchange and propagation within the interactome network studied.

Following this, we analyzed whether the topological properties of CGPs were mainly determined by one specific type of differentially expressed gene (i.e. up-regulated or down-regulated). This analysis highlighted that centrality is a property consistently found in down-regulated CGPs, while results for up-regulated CGPs were not conclusive (Additional file 5). The fact that up-regulated CGPs do not show consistent centrality measures could be the result of technical or biological differences between studies, for example that prostate tumor samples were collected at different stages [14, 15]. On the other hand, this observation might also suggest that down-regulation plays a major role in tumorigenesis at the interactome network level.

Possible centrality differences between cancer subtypes were investigated by analyzing the lung data set according to the pathological description of tumors (adenoid, carcinoid, and squamous) [17]. Overlaps of 50.55%, 74.94% and 50.48%, respectively, were observed for adenoid-carcinoid, adenoid-squamous, and carcinoid-squamous down-regulated CGPs sets. In this case, all three subtypes showed centrality measures consistent with the analysis of prostate and colorectal down-regulated CGPs (Additional file 4). Once again, up-regulated CGPs showed heterogeneity of average values and value distributions.

Centrality analysis using different sets of experimentally- or computationally-generated interactions

Comparison of publicly available protein-protein interaction repositories has revealed small, although significant, overlaps and considerable selection and detection bias [20, 21]. To evaluate the consistency of the above results, we performed similar centrality analyses using three different sets of experimentally- or computationally-generated interactions: i/ in vivo experimental interactions only; ii/ interactions with two or more experimental evidences as compiled by Gandhi et al. [12] (interactions found in vivo and/or in vitro, including yeast two-hybrid interactions); and iii/ computationally-generated interactions using a homology-based method [22]. This final data set was carefully validated using true positive interactions sourced from the HPRD database and false positive interactions for proteins localized in incompatible cellular compartments based on Gene Ontology (GO) annotations [11]. Using these three interactions sets, the number of nodes and edges in each network were 6,022 and 15,990; 5,009 and 9,950; and 10,691 and 57,846, respectively.

Centrality was then examined in each network for CGPs of the two prostate cancer data sets, the lung cancer data set including three pathological sub-classes, and the colorectal cancer data set referred to above, distinguishing between down- and up-regulated CGPs. Importantly, the results of these analyses are fully consistent with increased local and global centrality and with lower constraint of CGPs, particularly for down-regulated CGPs (Additional file 6). In addition, the results using the homology-based network also showed increased centrality and lower constraint for up-regulated CGPs. This observation may be due to the higher number of nodes and edges in the network, which could diminish sampling errors relative to the anticipated complete interactome or, in contrast, to an unknown intrinsic bias of the homology-based method. Overall, analysis of the three interactome data sets further supports the hypothesis that high centrality is a fundamental property of CGPs.

CGPs attack and interactome structure integrity

To better understand the relative importance of each centrality measure for CGPs, a strategy was used that consisted of determining the structural integrity of the interactome network after removing nodes with different topological characteristics [2325]. We calculated the number of proteins remaining in the main component of the network (i.e., the part containing the largest number of connected proteins) after removing CGPs, selected nodes with the same degree distribution as CGPs but with lower values of betweenness and closeness centrality, and, in extreme cases, the hubs (proteins with the highest degrees).

As expected from the association between centrality and vulnerability [23, 24], removing CGPs had a lesser effect on structure integrity than did hubs removal. However, removing CGPs always produced a more dramatic effect than removing selected nodes with the same degree distribution but with lower values of betweenness and closeness. The number of proteins remaining in the main component was consistently smaller when CGPs were removed than when these selected nodes were removed (Figure 3 and Table 1). After deleting n nodes, the size of the main component is not only reduced by n but also by other nodes that are attached to CGPs. For example, removing 795 down-regulated prostate CGPs reduced the total number of nodes in the main component by 1,026 (7,092 to 6,066), while removing 795 proteins with the same degree distribution but with lower values of betweenness and closeness reduced the total number of nodes by 682 (7,092 to 6,410). Although the differences affect a small percentage of nodes in the main component (5.6% to 1.5%), the same tendency was observed for all the up- and down-regulated CGP sets examined (12 in total). These results suggest that the positions of both types of CGPs in the interactome network are more important than their degree distributions reflect.
Table 1

Interactome attack


Number of nodes remaining in the main component after selective removal of:





Same degree nodes


Prostate cancer


Data set 1














Data set 2














Lung cancer












































Colorectal cancer















*Number of CGPs mapped on the complete human interactome (i.e. number of nodes removed in this analysis)

**Main component difference between removing nodes with the same degree distribution as CGPs but with lower values of betweenness and closeness, and CGPs
Figure 3

CGPs attack and interactome network structure integrity. Interactome network examples after removing an equivalent number of hubs, prostate down-regulated CGPs (data set 1 [14]) or selected proteins with the same degree distribution as CGPs, but with lower values of betweenness and closeness. Disconnected nodes from the main component are shown in inset to emphasize the difference between CGPs and selected proteins.

Topological and functional association of CGPs

Using the experimentally-based data sets, analysis of the level of inter-connection with neighboring proteins through the average clustering coefficient (CC) and by examining cliques (i.e. fully connected network sub-graphs) did not reveal significant over-representation of CGPs when corrected by multiple testing (not shown). In agreement with the lower constraint values observed, these observations suggest that CGPs perform their systems-level function principally by exploiting centrality (degree, betweenness and closeness), although not by virtue of being highly inter-connected in their neighborhood. However, the same analysis using the homology-based data set revealed significant differential CC values for both down- and up-regulated CGPs in different tumor types (Additional file 6). Larger, experimentally-based data sets are therefore needed to clarify the reasons for this discrepancy.

Next, we assessed whether the average network distances between CGPs were lower than the average in the main component. Thus, we determined the shortest distance between CGPs and compared this to the shortest distance between any two proteins in the main component. Lower distances were observed between CGPs – up-regulated, down-regulated, or both – when compared to the average distance in the main component (4.09 – 4.34 against 4.50, respectively) (Table 2). Accordingly, the maximum distances between CGPs were always found to be smaller than the maximum distance between any two proteins in the main component (10–12 against 15, respectively). These results suggest the topological association of CGPs regardless of CC or up/down-regulation.
Table 2

Topological association of prostate CGPs


Network distance



Shortest (average)


Data set 1















Data set 2















Main component





*Number of CGPs in the main component

Distances between CGPs can be represented in a matrix format where clusters are identified (Figure 4a and Additional file 7). We then investigated whether these topological associations or clusters of CGPs have functional implications for mechanisms of tumorigenesis. In this analysis, proportions of GO terms [18] and pathway (KEGG) [26] annotations were compared between clusters showing small network distances (≤ 3 shortest distance) and the remaining CGPs (≥ 4) in each matrix. Results showed that down-regulated CGPs in clusters participate in common biological processes or pathways involved in tumorigenesis (Table 3). Thus, the GO analysis revealed the coordinated down-regulation of CGPs involved in cell adhesion and cell communication processes, which would facilitate the metastatic behavior of cancer cells, and the coordinated down-regulation of CGPs involved in programmed cell death, which would in turn prolong cancer cell life and allow tumorigenesis to progress by accumulating genetic and molecular alterations. KEGG analysis revealed the coordinated down-regulation of pathways commonly associated with tumorigenesis, such as the extracellular matrix-receptor interaction pathway. It also revealed the coordinated down-regulation of pathways known to play a critical role in prostate carcinogenesis, for example the insulin signaling pathway [27].
Table 3

Topological and functional association of prostate CGPs

Non-redundant significant terms*

GO level

P value FDR-adjusted

Data set 1


Cluster A


BP: Protein amino acid phosphorylation



CC: Plasma membrane



MF: Protein-tyrosine kinase activity



Cluster B


BP: Cell-matrix adhesion



CC: Extracellular space



MF: Metalloendopeptidase inhibitor activity



Cluster C


BP: Intracellular signaling cascade



CC: Cytoskeleton



MF: Protein-tyrosine kinase activity



KEGG: T cell receptor signaling pathway



   Adherens junction



   Focal adhesion



Cluster D


BP: Macromolecule biosyntesis



CC: Cytosolic ribosome



MF: Structural constituent of ribosome



KEGG: Ribosome



Data set 2


Cluster E


MF: Purine nucleotide binding



Cluster F


CC: Extracellular space



MF: Extracellular matrix structural constituent



KEGG: Extracellular matrix receptor interaction



Cluster G


BP: Regulation of programmed cell death



MF: Protein kinase activity



KEGG: Insulin signaling pathway



Cluster H


BP: Phosphate transport



CC: Extracellular space



MF: Metalloendopeptidase inhibitor activity



KEGG: Extracellular matrix receptor interaction



*BP (Biological Process), CC (Cellular Component), MF (Molecular Function), KEGG (Kyoto Encyclopedia of Genes and Genomes)
Figure 4

Topological and functional association of CGPs. (a). Matrices of network distances between prostate CGPs (three categories: < 4 shortest distance shown in red; 4–6 shown in green; and > 6 shown in blue). CGP matrix clusters with significant enrichment in GO or KEGG annotations involved in tumorigenesis-related processes are indicated. (b). Functional association of prostate CGPs. Cell communication (cluster C) and cell death (cluster G) biological processes are shown. Green, red and black nodes correspond to down-regulated, up-regulated, and non-differentially expressed proteins, respectively. Dashed circles and lines connect proteins common to both processes. Protein kinase C isozymes are denoted by the prefix PRKC.

Most up-regulated CGP topological associations did not show significant enrichment in GO or KEGG annotations, however, when all CGPs were considered together, both up- and down-regulated CGPs participating in common biological processes and pathways were found closely located in the network. For example, up- and down-regulated CGPs within and connecting cell communication and cell adhesion functions are protein kinase C isozymes (Figure 4b), which are well known regulators of cell proliferation and transformation of prostate epithelial cells [28]. This CGP organization might reflect a change in the flow of information between different processes so as to promote tumorigenesis.


A criterion of centrality for a particular node in a network can be given by local (degree) or by global (betweenness and closeness) measures. A higher degree does not necessarily mean that a node is more important for information exchange and propagation, so more global measures are needed than degree measures. The three measures of centrality therefore reflect the possibilities of a particular protein choosing alternative paths, acting as a broker between different proteins, for example connecting distinct complexes or signaling pathways, or being closer to any other proteins for information propagation. By virtue of centrality, the hundreds of differentially expressed proteins in tumors are likely to promote tumorigenesis at the interactome network level in a coordinated manner. Viewed alternatively, proteins with a less central position within the interactome network might not be able to have a global impact on the cellular behavior determined by the protein-protein interactions involved in cellular transformation.

Protein-protein interactions repositories are incomplete and not fully reliable, based on the observed selection and detection biases [20, 21]. Gandhi et al. [12] demonstrated that there is minimal overlap across currently known experimental species interactome data sets. In addition, recent work by Mika and Rost [29] has shown that interactions are more conserved within species than across species and that homology transfers are only accurate at high levels of identity. These observations raise the question of the specificity and sensitivity of large-scale homology-based generated interactomes with respect to other approaches. The principle of conserved protein-protein interactions or "interologs" was first used by Matthews et al. [30] and subsequently extended by several authors [3133]. The Jonsson et al. [11, 22] data set used in the present study applied a new confidence score to predict interactions, which was based on both the level of homology and the amount of experimental data available that supported a particular interaction. By benchmarking the score the authors obtained relatively good percentages of sensitivity and specificity (~80–85%) for a reasonable cut-off [11], which indicates high reliability of the data set. This observation corroborates our results by replicating the findings with respect to centrality.

In addition to selection and detection biases, the limited coverage of current data sets relative to the anticipated complete human interactome suggests that results derived from any currently available set should be interpreted with an element of caution, as has been demonstrated for other well-established topology characteristics [34]. The results using the Jonsson et al. [22] data set show increased centrality for up-regulated CGPs and differential CC values that were not consistently observed when using other interaction sets. The larger size of this data set could reduce the effect of sampling and may facilitate the detection of weak effects. This apparent discrepancy will probably remain unexplained until larger coverage of the anticipated human experimental interactome has been obtained. Nevertheless, we analyzed hundreds of CGPs, most of which belonged to different sets across different cancer types, which makes this study less likely to present a bias in gene selection.

In a previous study focused on lung cancer, it was suggested that up-regulated CGPs in squamous lung tumors have higher connectivity [10], yet the same observation was not supported for down-regulated CGPs. This discrepancy could be due to the small number of samples profiled (five tumors and matched healthy tissues) but also to the examination of another interactome network generated mainly from computationally-generated interactions. On the other hand, the lung data set we used [17] has been extensively examined and validated, which suggest that the apparent centrality inconsistency of down-regulated CGPs is not due to the existence of different subsets of lung CGPs.

This study proposes a model for systems-level molecular mechanisms of tumorigenesis that includes the down-regulation of specific biological processes represented by topological associations of CGPs in the interactome network, combined with the up-regulation of particular proteins that could depend on the cell type, tumor type or tumor stage. Since we analyzed tumor panels, our results reflect average, possibly secondary molecular events in cancer. These changes are initiated by combinations of genetic alterations in tumor suppressor genes and oncogenes, which lead to extensive downstream variation of expression modules carrying specific functions in cancer cells [7, 8]. The analysis of tumors ordered in stages would provide additional information on the systems-level molecular mechanisms of cancer progression. The final outcome of CGP organization could be a change in the flow of information, specific to each cancer type that will determine the neoplastic process. Centrality could then be used in combination with dynamic information (i.e., gene or pathway up- or down-regulation) to specifically disrupt cancer cell networks by disturbing proteins that are critical to both aspects.


Taking into account the current limited coverage of the anticipated, complete human interactome, this study suggests that the proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts occupy central positions in the interactome network. Our results suggest a systems-level tumorigenesis model that comprises the precise down-regulation of groups of topologically-associated proteins involved in particular functions, orchestrated with the up-regulation of specific proteins.


Human interactome network

In generating the human interactome network, a previously compiled data set was used, mainly containing experimentally demonstrated interactions compiled through a literature-curation process, combined with data from different types of experimental and computational evidence [12]. In our analyses, proteins with no assigned Entrez GeneID were excluded, thus yielding a final interactome network containing 7,388 proteins and 24,109 interactions. The network was analyzed using Cytoscape [35] and UCINET [36]. In removing network hubs, proteins were selected from the highest degree value (> 9 when analyzing prostate CGPs to > 19 when analyzing colorectal CGPs). The number of proteins/nodes removed from the network in each case was identical amongst hubs, CGPs and selected proteins with the same degree distribution as CGPs but with lower values of betweenness and closeness.

Gene expression analysis

The GEPAS package [37] was used for the analysis of expression data. Background correction, normalization and averaging of expression values were performed with the Robust Multi-array Average (RMA) algorithm [38]. Differentially expressed genes between healthy and tumor samples were declared after the calculation of an empirical Bayes moderated t-statistic, and P values adjusted by false discovery rate of 5%. Previously analyzed colorectal data [16] were down-loaded from a public repository [19]. Data sets and probe lists are detailed in Table S1. The FatiScan tool [39] was used to assess enrichment of GO terms in the complete gene ranking according to the empirical Bayes moderated t-statistic.

Topological analysis

The degree of a vertex or protein in the interactome network was calculated by counting the number of edge-ends at that node. Betweenness was calculated in accordance with Freeman's formulation [40]. Thus, betweenness centrality C B (v) for vertex v is calculated as follows:
C B ( v ) = s v t V σ s t ( v ) σ s t MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqdaWgaaWcbaGaemOqaieabeaakiabcIcaOiabdAha2jabcMcaPiabg2da9maaqababaWaaSaaaeaaiiGacqWFdpWCdaWgaaWcbaGaem4CamNaemiDaqhabeaakiabcIcaOiabdAha2jabcMcaPaqaaiab=n8aZnaaBaaaleaacqWGZbWCcqWG0baDaeqaaaaaaeaacqWGZbWCcqGHGjsUcqWG2bGDcqGHGjsUcqWG0baDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaaa@4C80@
where σ st is the number of shortest geodesic paths from s to t and σ st (v) the number of shortest geodesic paths from s to t that pass through the vertex v. This value was normalized by dividing by (n -1) × (n - 2), where n is the number of vertices. Closeness centrality was calculated according to Sabidussi's formulation [41]. Thus, the closeness C C (v) for a vertex v is the reciprocal of the sum of geodesic distances to all other vertices in graph G, and is calculated as follows:
C C ( v ) = 1 t V d G ( v , t ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqdaWgaaWcbaGaem4qameabeaakiabcIcaOiabdAha2jabcMcaPiabg2da9maalaaabaGaeGymaedabaWaaabeaeaacqWGKbazdaWgaaWcbaGaem4raCeabeaakiabcIcaOiabdAha2jabcYcaSiabdsha0jabcMcaPaWcbaGaemiDaqNaeyicI4SaemOvayfabeqdcqGHris5aaaaaaa@4250@
Hierarchy and constraint were calculated using Burt's formulation [42]. Constraint is a summary measure that indicates the level of independence of a node from its neighbourhood, depending on the number of edges that connect it to neighbour nodes. Thus, constraint is calculated as follows:
c i j = ( p i j + q p i q p q j ) 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGJbWydaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabg2da9maabmaabaGaemiCaa3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHRaWkdaaeqaqaaiabdchaWnaaBaaaleaacqWGPbqAcqWGXbqCaeqaaOGaemiCaa3aaSbaaSqaaiabdghaXjabdQgaQbqabaaabaGaemyCaehabeqdcqGHris5aaGccaGLOaGaayzkaaWaaWbaaSqabeaacqaIYaGmaaaaaa@45DF@
for qi, j, where p ij is the proportion of node i connections to j. Hierarchy is the extent to which constraint is concentrated in a single node and is calculated as follows:
H = j ( c i j C / N ) ln ( c i j C / N ) N ln N MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGibascqGH9aqpdaWcaaqaamaaqababaWaaeWaaeaadaWcaaqaaiabdogaJnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaaGcbaGaem4qamKaei4la8IaemOta4eaaaGaayjkaiaawMcaaiGbcYgaSjabc6gaUnaabmaabaWaaSaaaeaacqWGJbWydaWgaaWcbaGaemyAaKMaemOAaOgabeaaaOqaaiabdoeadjabc+caViabd6eaobaaaiaawIcacaGLPaaaaSqaaiabdQgaQbqab0GaeyyeIuoaaOqaaiabd6eaojGbcYgaSjabc6gaUjabd6eaobaaaaa@4BDB@
The cluster coefficient is the local density of a node's connections and is defined as the ratio between the observed number of connections Li and the total number of possible connections for a particular node i, ki (ki - 1). Thus, the clustering coefficient is calculated as follows:
C ( i ) = 2 L i k i ( k i 1 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqcqGGOaakcqWGPbqAcqGGPaqkcqGH9aqpdaWcaaqaaiabikdaYiabdYeamnaaBaaaleaacqWGPbqAaeqaaaGcbaGaem4AaSMaemyAaKMaeiikaGIaem4AaSMaemyAaKMaeyOeI0IaeGymaeJaeiykaKcaaaaa@3E85@

The Bron and Kerbosch algorithm [43] was used to find all cliques greater than a specified size [44].

To assess significance, the non-parametric Mann-Whitney U test was used to determine whether the median of the centrality measures was different between protein sets. To assess significance, the properties of CGPs were also compared to equivalent, randomly selected protein sets in the interactome. One thousand random iterations were performed in each case. Subsequently, average values, confidence intervals and empirical P values were obtained.

Functional association analysis

The Stats and Graphics packages in R [45] were used to analyze and plot the matrix of network distances between CGPs, respectively. Hierarchical clustering with an average linkage method was applied to the matrix so as to arrange CGPs according to their network distances. The FatiGO+ tool [46] was then used to assess GO and KEGG annotations enrichment between the set of CGPs within a particular cluster and the remaining CGPs in the same matrix. Calculated P values were adjusted by FDR, taking into account the total number of genes interrogated in each case.



We are grateful to C. Lázaro, Adolfo A. Ferrando and E. Serra for valuable discussions, and to three anonymous reviewers for their helpful criticism. We are also extremely grateful to all those involved in the maintenance of publicly available protein-protein interactions databases. MAGP would like to offer his personal thanks to Marc Vidal, for introducing him to and developing his knowledge of the world of systems biology. This work was supported by the Fundació la Caixa (grant BM05-254-00 awarded to MAGP), the Catalan Institute of Oncology (PH), the Spanish Ministry of Education and Science (grant BIO2005-01078 awarded to JD), the NRC Canada-SEPOCT Spain and the National Institute of Bioinformatics a platform of Genome Spain. MAGP is a Ramón y Cajal Researcher with the Spanish Ministry of Education and Science. JHC and DM are supported by the Fundación Genoma España.

Authors’ Affiliations

Bioinformatics and Biostatistics Unit, and Translational Research Laboratory, Catalan Institute of Oncology, IDIBELL, L'Hospitalet
Functional Genomics Unit, Bioinformatics Department, CIPF


  1. Liu ET: Classification of cancers by expression profiling. Curr Opin Genet Dev. 2003, 13 (1): 97-103. 10.1016/S0959-437X(03)00008-X.PubMedView ArticleGoogle Scholar
  2. Sancho E, Batlle E, Clevers H: Signaling pathways in intestinal development and cancer. Annu Rev Cell Dev Biol. 2004, 20: 695-723. 10.1146/annurev.cellbio.20.010403.092805.PubMedView ArticleGoogle Scholar
  3. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell. 2000, 100 (1): 57-70. 10.1016/S0092-8674(00)81683-9.PubMedView ArticleGoogle Scholar
  4. Khalil IG, Hill C: Systems biology for cancer. Curr Opin Oncol. 2005, 17 (1): 44-48. 10.1097/01.cco.0000150951.38222.16.PubMedView ArticleGoogle Scholar
  5. Kitano H: Cancer as a robust system: implications for anticancer therapy. Nat Rev Cancer. 2004, 4 (3): 227-235. 10.1038/nrc1300.PubMedView ArticleGoogle Scholar
  6. Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5 (2): 101-113. 10.1038/nrg1272.PubMedView ArticleGoogle Scholar
  7. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Barrette TR, Ghosh D, Chinnaiyan AM: Mining for regulatory programs in the cancer transcriptome. Nat Genet. 2005, 37 (6): 579-583. 10.1038/ng1578.PubMedView ArticleGoogle Scholar
  8. Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004, 36 (10): 1090-1098.PubMedView ArticleGoogle Scholar
  9. Rhodes DR, Chinnaiyan AM: Integrative analysis of the cancer transcriptome. Nat Genet. 2005, 37 Suppl: S31-7. 10.1038/ng1570.PubMedView ArticleGoogle Scholar
  10. Wachi S, Yoneda K, Wu R: Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics. 2005, 21 (23): 4205-4208. 10.1093/bioinformatics/bti688.PubMed CentralPubMedView ArticleGoogle Scholar
  11. Jonsson PF, Bates PA: Global topological features of cancer proteins in the human interactome. Bioinformatics. 2006, 22 (18): 2291-2297. 10.1093/bioinformatics/btl390.PubMed CentralPubMedView ArticleGoogle Scholar
  12. Gandhi TK, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, Sharma S, Pinkert S, Nagaraju S, Periaswamy B, Mishra G, Nandakumar K, Shen B, Deshpande N, Nayak R, Sarker M, Boeke JD, Parmigiani G, Schultz J, Bader JS, Pandey A: Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet. 2006, 38 (3): 285-293. 10.1038/ng1747.PubMedView ArticleGoogle Scholar
  13. Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TK, Chandrika KN, Deshpande N, Suresh S, Rashmi BP, Shanker K, Padma N, Niranjan V, Harsha HC, Talreja N, Vrushabendra BM, Ramya MA, Yatish AJ, Joy M, Shivashankar HN, Kavitha MP, Menezes M, Choudhury DR, Ghosh N, Saravana R, Chandran S, Mohan S, Jonnalagadda CK, Prasad CK, Kumar-Sinha C, Deshpande KS, Pandey A: Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. 2004, 32 (Database issue): D497-501. 10.1093/nar/gkh070.PubMed CentralPubMedView ArticleGoogle Scholar
  14. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002, 1 (2): 203-209. 10.1016/S1535-6108(02)00030-2.PubMedView ArticleGoogle Scholar
  15. Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci U S A. 2004, 101 (2): 615-620. 10.1073/pnas.2536479100.PubMed CentralPubMedView ArticleGoogle Scholar
  16. Notterman DA, Alon U, Sierk AJ, Levine AJ: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Res. 2001, 61 (7): 3124-3130.PubMedGoogle Scholar
  17. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001, 98 (24): 13790-13795. 10.1073/pnas.191502998.PubMed CentralPubMedView ArticleGoogle Scholar
  18. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralPubMedView ArticleGoogle Scholar
  19. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM: ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 2004, 6 (1): 1-6.PubMed CentralPubMedView ArticleGoogle Scholar
  20. Mathivanan S, Periaswamy B, Gandhi TK, Kandasamy K, Suresh S, Mohmood R, Ramachandra YL, Pandey A: An evaluation of human protein-protein interaction data in the public domain. BMC Bioinformatics. 2006, 7 Suppl 5: S19-10.1186/1471-2105-7-S5-S19.PubMedView ArticleGoogle Scholar
  21. Futschik ME, Chaurasia G, Herzel H: Comparison of human protein-protein interaction maps. Bioinformatics. 2007, 23 (5): 605-611. 10.1093/bioinformatics/btl683.PubMedView ArticleGoogle Scholar
  22. Jonsson PF, Cavanna T, Zicha D, Bates PA: Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinformatics. 2006, 7: 2-10.1186/1471-2105-7-2.PubMed CentralPubMedView ArticleGoogle Scholar
  23. Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks. Nature. 2000, 406 (6794): 378-382. 10.1038/35019019.PubMedView ArticleGoogle Scholar
  24. Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411 (6833): 41-42. 10.1038/35075138.PubMedView ArticleGoogle Scholar
  25. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004, 430 (6995): 88-93. 10.1038/nature02555.PubMedView ArticleGoogle Scholar
  26. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006, 34 (Database issue): D354-7. 10.1093/nar/gkj102.PubMed CentralPubMedView ArticleGoogle Scholar
  27. Papatsoris AG, Karamouzis MV, Papavassiliou AG: Novel insights into the implication of the IGF-1 network in prostate cancer. Trends Mol Med. 2005, 11 (2): 52-55. 10.1016/j.molmed.2004.12.005.PubMedView ArticleGoogle Scholar
  28. Gonzalez-Guerrico AM, Meshki J, Xiao L, Benavides F, Conti CJ, Kazanietz MG: Molecular mechanisms of protein kinase C-induced apoptosis in prostate cancer cells. J Biochem Mol Biol. 2005, 38 (6): 639-645.PubMedView ArticleGoogle Scholar
  29. Mika S, Rost B: Protein-protein interactions more conserved within species than across species. PLoS Comput Biol. 2006, 2 (7): e79-10.1371/journal.pcbi.0020079.PubMed CentralPubMedView ArticleGoogle Scholar
  30. Matthews LR, Vaglio P, Reboul J, Ge H, Davis BP, Garrels J, Vincent S, Vidal M: Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res. 2001, 11 (12): 2120-2126. 10.1101/gr.205301.PubMed CentralPubMedView ArticleGoogle Scholar
  31. Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M: Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 2004, 14 (6): 1107-1118. 10.1101/gr.1774904.PubMed CentralPubMedView ArticleGoogle Scholar
  32. Brown KR, Jurisica I: Online predicted human interaction database. Bioinformatics. 2005, 21 (9): 2076-2082. 10.1093/bioinformatics/bti273.PubMedView ArticleGoogle Scholar
  33. Lehner B, Fraser AG: A first-draft human protein-interaction map. Genome Biol. 2004, 5 (9): R63-10.1186/gb-2004-5-9-r63.PubMed CentralPubMedView ArticleGoogle Scholar
  34. Han JD, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol. 2005, 23 (7): 839-844. 10.1038/nbt1116.PubMedView ArticleGoogle Scholar
  35. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.PubMed CentralPubMedView ArticleGoogle Scholar
  36. []
  37. Montaner D, Tarraga J, Huerta-Cepas J, Burguet J, Vaquerizas JM, Conde L, Minguez P, Vera J, Mukherjee S, Valls J, Pujana MA, Alloza E, Herrero J, Al-Shahrour F, Dopazo J: Next station in microarray data analysis: GEPAS. Nucleic Acids Res. 2006, 34 (Web Server issue): W486-91. 10.1093/nar/gkl197.PubMed CentralPubMedView ArticleGoogle Scholar
  38. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31 (4): e15-10.1093/nar/gng015.PubMed CentralPubMedView ArticleGoogle Scholar
  39. Al-Shahrour F, Minguez P, Vaquerizas JM, Conde L, Dopazo J: BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res. 2005, 33 (Web Server issue): W460-4. 10.1093/nar/gki456.PubMed CentralPubMedView ArticleGoogle Scholar
  40. Freeman LC: A set of measures of centrality based on betweenness. Sociometry. 1977, 40: 35-10.2307/3033543.View ArticleGoogle Scholar
  41. Sabidussi G: The centrality of a graph. Psychometrika. 1966, 31 (4): 581-603. 10.1007/BF02289527.PubMedView ArticleGoogle Scholar
  42. Burt RS: Structural Holes: The social structure of competition. 1992, Cambridge , Harvard University PressGoogle Scholar
  43. Bron C, Kerbosch J: Finding all cliques of an undirected graph. Commun ACM. 1973, 16 (9): 575-577. 10.1145/362342.362367.View ArticleGoogle Scholar
  44. Luce R, Perry A: A method of matrix analysis of group structure. Psychometrika. 1949, 14: 95-116. 10.1007/BF02289146.PubMedView ArticleGoogle Scholar
  45. Website title [].
  46. Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004, 20 (4): 578-580. 10.1093/bioinformatics/btg455.PubMedView ArticleGoogle Scholar


© Hernández et al; licensee BioMed Central Ltd. 2007

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.