Learning contextual gene set interaction networks of cancer with condition specificity
© Jung et al.; licensee BioMed Central Ltd. 2013
Received: 14 August 2012
Accepted: 29 January 2013
Published: 19 February 2013
Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients.
In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further investigations.
The analysis on the contextual gene sets and characterization of networks of interaction composed of these sets discovered distinct functional differences underlying various types of cancer. The results show that our method successfully reveals many subtype-specific regions in the identified maps of biological contexts, which well represent biological functions that can be connected to specific subtypes.
KeywordsMolecular context Cancer Condition-specificity Gene regulatory network Glioblastoma
Many computational and mathematical techniques have been developed to infer molecular patterns of biological and translational interest from gene expression data profiled from human tumors. As most of these methodologies are highly dependent on simple correlation of changes in mRNA abundance as the primary measure of relatedness, they are intrinsically limited in their sensitivity and specificity by the highly heterogeneous, idiosyncratic nature of tumor gene expression patterns. Early expectations were that the molecular pathology of tumors arising from a particular tissue of origin would show striking similarities due to very common sets of oncogenic molecular processes accounting for each such tumor type’s initiation and progression. The finding that samples of tumors taken at different points in the course of an individual’s disease were more similar to each other than to any other tumor in the study was therefore quite surprising and stands as a noteworthy finding from early expression profiling studies . One tumor type, chronic myelogenous leukemia (CML), has been found to have a large number of genes behaving in a very homogeneous fashion , however this kind of behavior has been the exception rather than the rule. The relative homogeneity of CML is probably in line with the expectation that a cancer type would exhibit high homogeneity if there was a high similarity in the process of oncogenesis. In the case of CML, it is true that the means of transformation is simple and constant. As the biochemical mechanisms of tumor growth and survival have been subjected to ever more detailed analysis, it has become clear that for most tumor types there is substantial variation in how tumors use available normal and altered cellular functions to achieve relentless growth and disproportionate survival.
In recent studies, hence, the identification of genomic patterns that are specific to certain biological contexts is gaining more interest as the heterogeneity in biological data becomes better embraced. Biological contexts of interest can be derived from subtypes of diseases or different clinical outcomes within the same subtype, such as responses to therapy. One of the early approaches to identify context-specific patterns involved searching for the co-regulated sets of genes and depicting the relationships between the gene sets and the biological or clinical characterization of samples. Gasch and Eisen  used a modified fuzzy k-means clustering method to find gene sets and showed correlation between those gene sets and the experimental conditions that determined how yeast cells respond to environmental changes. Segal et al.  used existing knowledge sources as well as clustering techniques to find gene sets that are either functionally co-related or coherently expressed in each set, then determined their specificity to particular types of tumors.
More recently, new strategies to identify context specificity of biological interactions are being proposed, as it is being widely accepted that biological interactions are coordinated in systematic ways but distinctively so, depending on biological contexts. These approaches are often based on network models representing biological interactions. Identifying context specificity in biological interactions can reveal environmental conditions under which the activities of components of biological networks vary, and this can make significant contribution to reinforcing confidence in the network as well as to improving the understanding of more exact mechanisms of transcriptional or translational regulations. Several studies have shown that it is possible to identify context-specific activity in known biological networks [5–8]. In these studies, biological networks were built from existing prior knowledge, and context-specific gene expression or protein expression data was used to annotate biological interactions’ context specificities. While this approach is useful for understanding the context specificity in already characterized networks, its utility is limited by the scope of the network as context-specificity is identified only for those interactions known from prior knowledge. Considering such limitation, a more desirable approach is to simultaneously identify biological interaction networks and their context specificity from high-throughput data. Grzegorczyk et al.  proposed a method to learn a non-homogeneous Bayesian network that can represent multiple different conditions, by using a mixture model that unifies networks of different conditions. Such approach is ideal in situations where enough amount of samples and computing resources are available. However, the scalability of its application is significantly limited in practice due to the complexity of the model. Indeed, their study only show the results from networks of up to 11 genes. Another possible approach is applying conventional network learning methods to subtype-specific samples and comparing the results. However, this approach requires many samples for each condition to achieve reliable results, and comparing inferred models from different conditions can be arguable especially when different amount of samples were available across conditions.
In this study, we propose a novel method to learn contextual gene set interaction networks that can represent maps of functional modules in target biological systems with statistical interactions between gene sets, and identify condition/subtype-specificity of inferred interactions. Our method comprises two novel approaches. The first is using context-specific gene sets as nodes of networks instead of individual genes, and the second is measuring condition-specificity with a formulation based on the probabilistic graphical model. Using gene sets instead of individual genes as nodes in networks can significantly increase the scalability of the application. To identify context-specific gene sets, we use a computational method that we have developed to model context-specific genomic regulations [10–12]. Identified gene sets, termed contextual gene sets, have coherent expression patterns specific for a subset of samples where they have statistically significant coherency. This property helps to identify networks of gene sets and relevant condition-specificity. Another novel aspect of our approach is using the conventional homogeneous Bayesian network model to learn networks and to measure condition-specificity. Using homogeneous network models requires significantly less computational cost than using non-homogeneous mixture models, compared to Grzegorczyk et al. , thus it is more scalable and can be applied to problems of larger scales. For example, we found limited application of the non-homogeneous mixture Bayesian network model proposed by Grzegorczyk et al. , while the conventional homogeneous Bayesian network model has been widely used for applications of varying sizes, from about one hundred  to almost one thousand genes [14, 15]. However, such homogeneous model does not represent any condition specificity by itself. To overcome this limitation, we designed a novel formulation to quantitate the effect of the samples from different conditions/subtypes on the formation of networks to measure the degree and the statistical significance of condition specificity. The brief results of a simulation study is also given to show the feasibility of identifying condition-specificity.
Two cancer data sets were used as applications to show the benefit of identifying condition/subtype-specificity, 1) a refractory cancer gene expression data with 113 cancer patient samples of 32 different tissue types , and 2) The Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) gene expression data. Each resultant contextual gene set interaction network shows both cancer-generic and subtype-specific interactions. The comparison to the result using a conventional biclustering-based approach, using a refractory cancer data set, we show the proposed method is much more sensitive in identifying context-specific interactions. We also found that the identified cancer-generic and subtype-specific sub-networks have different functional roles. Besides the comparison of functional annotations, we also related the identified subtype-specific interactions to supporting evidence from other knowledge sources. These results show that our approach to identify condition specificity in learned networks can provide novel information about biological functions specific to the given conditions.
Results and discussion
Overview of learning contextual gene set interaction networks and identifying condition specificity
To infer networks of contextual gene sets, each contextual gene set is represented as a single variable. This requires that the original gene expression matrix needs to be transformed to a gene set expression matrix, where the value of a contextual gene set for a sample is a representative value of all genes in the contextual gene set. Expression values of genes in a contextual gene set for a sample are summarized to either UP or DOWN if the majority of the genes are over-expressed or under-expressed, and NOCHANGE value is given otherwise (STEP II). We are going to focus on the cases of statistically significant up-regulation or down-regulation, and most results from this study are from the cases of up or down-regulations.
A contextual gene set interaction network is learned from the summarized contextual gene set expression data, by evaluating the likelihood of dependency between each pair of contextual gene sets given all samples and building a connection if the dependency likelihood is larger than a given threshold (STEP III). Inference of interaction networks from the summarized data has a few advantages over traditional approach where all genes are used. Since the number of variables (nodes) is significantly smaller in this approach as all the genes in contextual gene set are aggregated to a single variable, the method suffers less in computational complexity, and thus it is subject to the curse of dimensionality to a lesser degree, leading to more reliable estimation of probability statistics on network models.
A resultant interaction between two contextual gene sets represents that there is a probabilistic dependency in their summarized expressions. Gene sets with dependency are expressed in coordinated manners, where the expression status of a gene set depends on the expression status of the other gene set. However, the effect to the dependency from the samples can be different for diverse conditions, as they can imply different activities of biological functions. Based on this idea, we identify condition-specific regions in the built network by measuring the effect from the samples of each condition on the likelihood of dependency. To measure the effect of a condition on a dependency, we evaluated the likelihood of the dependency without the samples of the condition and computed its difference with the original likelihood obtained using all available samples (STEP IV). If the original likelihood is significantly higher than the likelihood without the samples from the condition, it means that the samples under the condition have made significant contribution to the dependency. This implies that the dependency exists mainly due to the samples from the condition, thus it is declared as a condition-specific dependency.
Example and advantage of identifying condition-specificity and contextual gene set
Example of identifying condition-specificity
Advantage of contextual gene sets over biclusters
Contextual gene set interaction network from the refractory cancer gene expression data
Contextual gene set interaction network of refractory cancer with tissue type specificity
The number of specific interactions for each tissue type from the refractory cancer data
Contextual gene set
All identified interactions
Smooth muscle from uterus
T cell lymphoma
Discrepancy in biological functions between cancer-generic and tissue-centric contextual gene sets
Significantly associated GO terms to only cancer-generic or tissue-centric contextual gene sets
GO terms relevant only to cancer-generic
GO terms relevant only to tissue-centric
contextual gene sets
contextual gene sets
GO:0007166: cell surface receptor signal transduction
GO:0006163: purine nucleotide metabolism
GO:0007049: cell cycle
GO:0006164: purine nucleotide biosynthesis
GO:0006396: RNA processing
GO:0006629: lipid metabolism
GO:0008283: cell proliferation
GO:0009150: purine ribonucleotide metabolism
GO:0016070: RNA metabolism
GO:0009152: purine ribonucleotide biosynthesis
GO:0043207: response to external biotic stimulus
GO:0009259: ribonucleotide metabolism
GO:0000375: RNA splicing, via transesterification reactions
GO:0009260: ribonucleotide biosynthesis
GO:0000377: RNA splicing, via transesterification reactions
GO:0044255: cellular lipid metabolism
GO:0000398: nuclear mRNA splicing, via spliceosome
GO:0046148: pigment biosynthesis
GO:0006397: mRNA processing
GO:0000904: cellular morphogenesis
GO:0006959: humoral immune response
GO:0006099: tricarboxylic acid cycle
GO:0008380: RNA splicing
GO:0006119: oxidative phosphorylation
GO:0009613: response to pest, pathogen or parasite
GO:0006144: purine base metabolism
GO:0016071: mRNA metabolism
GO:0006188 : IMP biosynthesis
GO:0030333: antigen processing
GO:0006189: ’de novo’ IMP biosynthesis
GO:0000067: DNA replication and chromosome cycle
GO:0006510: ATP-dependent proteolysis
GO:0000075: cell cycle checkpoint
GO:0006554: lysine catabolism
GO:0006950: response to stress
GO:0006570: tyrosine metabolism
GO:0016064: humoral defense mechanism
GO:0006582: melanin metabolism
GO:0000279: M phase
GO:0006583: melanin biosynthesis from tyrosine
Cancer-generic network region
Top 10 most significant annotations for 12 contextual gene sets of region (A), Figure 3
MSigDB annotation (Source)
Graft versus host disease (KEGG)
Type I diabetes mellitus (KEGG)
Natural killer cell mediated cytotoxicity (KEGG)
Generation of second messenger molecules (REACTOME)
Allograft rejection (KEGG)
Viral myocarditis (KEGG)
Translocation of ZAP70 to immunological synapse (REACTOME)
Signaling in immune system (REACTOME)
Leishmania infection (KEGG)
Autoimmune thyroid disease (KEGG)
The regions (B) – (E) in Figure 4-I are the examples of network regions where many tissue-specific interactions exist. For each region, the expression patterns of contextual gene sets show strong correlation for the corresponding tissue type samples (see Additional file 4: Figure S2(B-E)).
The melanoma-specific region (B) in Figure 4-I showed association with apoptosis. The five under-expressed contextual gene sets were related to abnormality in pigmentation, cell death signaling pathways and apoptosis. Individual contextual gene set shows further details in tissue-specific functional abnormalities. For example, G122 and G223 in the region (B) are related to the metabolism of nicotinamide and nicotinamide adenine dinucleotide (NAD +) metabolism. Through these metabolisms, the coenzyme compound NAD + accepts or donates electrons in redox reactions  that play significant roles in releasing energy from nutrients by generating ATPs. This possible abnormal activity of energy generation can be related to the fact that melanoma (and also other cancers) is intensively positive in positron emission tomography (PET) scans due to their intense demand for energy, where tumor has up-regulated receptors that take in glucose and subsequently have high levels of glycolysis. One over-expressed contextual gene set in this region was related to GTP binding (P=1.71E−5) and guanyl nucleotide binding (P=1.37E−5), and this is supported by a report of the expression of small GTP-binding protein genes of the RAS family in melanoma .
The ovary-specific region (D) in Figure 4-I includes six contextual gene sets under-expressed in most of the ovarian samples, where they were associated with ovary-specific functional annotations such as reproduction and pregnancy. This can be related to the loss of normal ovarian function from the ovarian cancer patients. Besides the ovary-specific annotations, they are also related to the β-Arrestin pathway and the caspase mediated cleavage of cytoskeletal proteins. Arrestins can block G protein-mediated signaling, and redirect signaling to alternative G protein-independent pathways . Regarding this annotation, there is a report that caspase mediated cleavage of cytoskeletal actin plays a positive role in the morphological changes of apoptosis .
Contextual gene set interaction network from the GBM data of TCGA
Contextual gene set interaction network with phenotype/genotype specificity
The number of specific interactions for each GBM subtype and sample condition
Number of specific interactions
Age < 4\0
Functional difference between GBM-generic and subtype-centric contextual gene sets
The number of subtype-centric contextual gene sets and associated annotation terms
T k -centric gene sets
|Annot(T k )|
|Annot(T k ) ∩ Annot(G B M)| (Overlap %)
Comparison of contextual gene sets with other gene signatures of GBM subtypes
Comparison between GBM subtype-centric contextual gene sets and GBM subtype signature genes reported by Verhaak et al. 
Subtype-centric contextual gene set
Verhaak et al.
Comparison between contextual gene sets and MSigDB gene sets identified with GSEA
Contextual gene set
GBM-generic network region
The network region (A) in Figure 6 is one example of a GBM-generic region with 12 contextual gene sets in it. The heat map (Additional file 6: Figure S4(A)) of the contextual gene sets show that their expressions are closely correlated across all patient samples, making the interactions among these contextual gene sets be GBM-generic. From the associated annotations of the over-expressed contextual gene sets in their corresponding contextual conditions, this network region was mainly related to the tight junction and the intercellular adhesion, which occur in epithelia and brain endothelia. Also by considering other annotations such as epithelial cell differentiation and morphogenesis of an epithelium, this network component can represent active epithelia construction in GBM, which can imply active blood vessel construction. For the MSigDB annotations of the under-expressed contextual gene set, the presence of a transcriptional start site motif was statistically significant, which matches annotations for vitamin D receptor VDR. Considering the function of vitamin D killing GBM cells , a possible hypothesis is that the loss of vitamin D susceptibility in GBM patients can be related to the low activities of genes targeted by VDR, while the main cause of such low activities remains for further studies.
GBM subtype-specific regions
The Classial-specific region (B) in Figure 6 includes six contextual gene sets, where one is over-expressed and the other five are under-expressed across Classical samples. From the MSigDB analysis, one of the significant annotations related to the over-expressed contextual gene set is a transcriptional start site motif that matches annotation for a member of ETS oncogene family, ELK1, which is involved in pro-apoptotic and pro-differentiation in neuronal cells , and MAPK-ELK1 signaling pathway that contributes to cell survival . This implies that genes targeted by ELK1 are over-expressed and ELK1 is enabling many of its downstream genes. The genes in the under-expressed contextual gene sets were related to central nervous system development (P=2.58E−4), which can imply abnormalities in neural system development. A transcriptional start site motif was also related to the under-expressed contextual gene sets, which matches annotation for the androgen receptor AR. It has been reported that AR was detected in a higher proportion of gliomas , while there was a suggestion that the proliferative effect of GBM may not be through the activation of AR. However, there are reports of activated AR enhancing susceptibility of GBM cells to chemotherapeutics and radiation therapy , and promoting cell death . These reports suggest that there is a certain group of patients with low AR activity, where the activation of AR combined with other therapeutics can be a potential treatment for such patients. The Classical-specific region (C) reveals the presence of N-Myc over-expression, which directly regulates a number of genes associated with the classical phenotype gene signature including EGFR.
For the Mesenchymal-specific region (D) in Figure 6, five over-expressed contextual gene sets for Mesenchymal samples are related to cell surface interactions (integrin cell surface interactions, P=7.3E−5) and several signaling pathways related to cancer. We could see up-regulation of Collagen I, IV ECM components (COL1A2 up-regulation in 33 out of 58 Mesenchymal samples, P=1.22E−15; COL4A5 up-regulation in 10 Mesenchymal samples, P=0.1069) that signify increased ECM production. TGF beta receptor II is also up-regulated (46 out of 58 Mesenchymal samples, P<8.12E−21), which is associated with epithelial-mesenchymal transition (EMT). Jak-STAT signaling pathway was related, too (P=1.84E−4), with PIK3CD and JAK2 up-regulation. Genes involved in integrin cell surface interactions (COL1A2, ITGA11, RAP1B, COL4A5 and APBB1IP) were also included and these can lead to MAPK signaling . The other 10 contextual gene sets under-expressed in many Mesenchymal samples include down-regulated CTNNA2 (down-regulation in 18 of 58 Mesenchymal samples, P=1.89E−4), where CTNNA2 is known to control the stability of dendritic spines and synaptic contacts . This suggests that EMT in GBM also accompanies the low activity of CTNNA2 and resulting instability of neuronal cell-cell structures. These gene sets are also related to microtubule cytoskeleton organization and biogenesis (P=1E−5). Besides, we could observe the down-regulation of tubulins (TUBB8 down-regulation in 30 of 58 Mesenchymal samples, P=2.17E−5; TUBGCP6 down-regulation in 23 of 58 Mesenchymal samples, P=2.95E−3) and NEK1 (25 of 58 Mesenchymal samples, P=4E−3) and NEK11 (16 of 58 Mesenchymal samples, P=0.263), which are components of proliferation, and it suggests a hypothesis that proliferation activity can be low in Mesenchymal cells since their high migratory behavior. A brief summary and a heat map of gene expressions of this region is shown in Figure 7(A), and the findings mentioned above can imply active EMT, abnormalities in maintaining cell structures, and low proliferative activity, which well fit the characteristics of the Mesenchymal subtype.
The Neural-specific region (E) in Figure 6 can represent the cell cycle progression by p27 phosphorylation. The over-expressed G4 includes p27 (12 of 33 Neural samples, P=0.159) and RBX1 (16 of 33 Neural samples, P=5.51E−5), where p27 can block progression of cell cycle . However, F box protein binds with phosphorylated p27 with the involvement of RBX1, causes p27 degradation and cell cycle progression . Accordingly, the three under-expressed contextual gene sets were related to dephosphorylation (P=2.45E−3). A summary and a heat map of this region is also shown in Figure 7(B).
The Proneural-specific region (F) in Figure 6 includes over-expressed contextual gene sets that are related to cell cycle, and under-expressed contextual gene sets related to homeostatic processes. The Proneural-specific region (G) also shows active cell cycle processes, and degraded immune responses.
Other genotype/phenotype-specific regions
In addition to the four subtypes of GBM, other condition-specific interactions were also identified. Among genetic mutations, only EGFR mutation has associated interactions in regions (H) and (I). The condition of ages <40 is associated with regions (K), (L) and (M). An interesting result is the region (J) associated with the methylation of MGMT (Figure 7(C)). In the region (J), the two over-expressed contextual gene sets, which are connected with a MGMT methylation-specific interaction, were related to cell cycle, especially DNA replication and checkpoint (DNA dependent DNA replication with P=7.84E−10, mitotic cell cycle with P=3.47E−9 and mitotic M-M/G1 phases with P=1.59E−7). This specificity of MGMT methylation to the DNA replication and checkpoint in cell cycle is evident, by considering that MGMT is involved in the process of repairing damaged DNA during the replication process. The expression of MGMT could have been disturbed by methylation and eventually lost its function in the cell cycle checkpoint. The review by Casorelli et al.  also covers the role of the MGMT repair protein in cancer.
High heterogeneity in cancer has been evident since early studies, and approaches to reveal such heterogeneity embedded in genomic profiling data are showing promising results. In this work, we used a novel method of measuring the effect from expression samples of certain conditions on the components in contextual gene set interaction networks to identify condition-specificity. In addition to the simulation experiment, two cancer data sets were analyzed to support the validity of identifying condition-specificity, which are the refractory cancer data with 32 tissue types and TCGA GBM data with four subtypes. Contextual gene set interaction networks were built with tissue/phenotype/genotype-specificities. The resultant contextual gene set interaction networks with specificities showed different interaction patterns across conditions, and they provided new hypotheses as well as consistency with previous studies. Bayesian network learning was used in this work to more correctly estimate the likelihood of dependency, but simpler measures such as correlation or mutual information can be also used for the same formulation. Thorough analysis will follow this study for condition-specific interactions and related contextual gene sets to further analyze biological mechanisms of condition specificity in cancer. We are also developing methods to classify new patient samples based on the identified contextual features. Specifically, the analysis results of TCGA GBM data in this study are being considered to subtype additional GBM patient samples from TCGA as well as GBM xenograft models with drug response information. Such results of validation using independent GBM samples will be further discussed in future studies.
The refractory cancer and TCGA-GBM gene expression data
For the case of analyzing the refractory cancer data , gene expression data of 21,073 probes (from Agilent-011521 Human 1A Microarray G4110A) and 113 patient samples (32 different types of refractory cancer) were used in this study. The consenting of the patients involved has been performed as described in . The patients ranged in ages of 27 - 75 and there was no juvenile. For each tumor type, its (normal) tissue of origin was used as a baseline and the ratio of the tumor to its tissue of origin was computed, using a statistical model , and the ratio value was quantized to three discrete values of UP, DOWN and NOCHANGE with two-fold change as threshold. This two-fold change threshold was to ensure quantized values to represent minimal changes that can be reliably reproducible. The distribution of the 113 samples among different cancer tumor types is listed in the Additional file 1: Table S1.
As for the GBM study, we downloaded the GBM gene expression data from The Cancer Genome Atlas (http://cancergenome.nih.gov/). Among those, 202 samples with four known subtypes (54 Classical, 58 Mesenchymal, 33 Neural and 57 Proneural ) were used in this study.
Since we only used gene-level summary values for TCGA-GBM data, we used a heuristic approach to discretize the data for our analysis. The expression of 17,814 genes in the GBM samples were converted to z-scores using 10 normal samples as a reference. The standardized expression values were quantized to three levels of UP, DOWN and NOCHANGE by using one standard deviation as a threshold. Higher threshold resulted in too many NOCHANGE values, which led to less informative data for the analysis. The detailed sample information is available in the Additional file 3.
Identifying contextual gene sets
We define a contextual gene set as a set of genes that show consistent expression pattern under a biological context. This is based on the assumption that once a biological context reaches a steady state, genes involved in the process show consistent patterns under the biological context. It requires the identification of subsets of samples, which are representations of biological contexts. We use the context-mining algorithm [10, 12] to find contextual conditions as such samples, where a contextual condition is a subset of samples that have groups of closely related coherent expression patterns (such patterns are called context-motifs in the algorithm). In the process of context-mining, two consistency statistics conditioning (δ) and crosstalk (η) are used to find context-motifs, and a permutation test is applied to check their statistical significance. In our study, δ=0.1, η=0.3 and the significance P<0.05 (Benjamini and Hochberg corrected) were used. With the graphical representation of context-motifs, the Markov cluster (MCL) algorithm  is used to cluster closely related context motifs into biological contexts. The inflation parameter was set to 2 in our study, as suggested by the developer of MCL. A representative set of samples from each identified context is determined as a contextual condition using the sample association score (SAS) , and we used SAS <0.5 in our study. For each contextual condition, consistency statistics δ and η were used again to find contextual gene sets that include genes with consistent over-expression or under-expression. The same δ=0.1, η=0.3 and P<0.05 were used in this step.
Summarizing gene sets
To infer networks of contextual gene sets, each contextual gene set needs to be represented as a single variable as most network models assume each node as a single random variable. For each sample s k , the expression values of m genes in a contextual gene set G i is summarized to a single representative value G ik , where G ik is UP if more than r % of the genes in G i are over-expressed with statistical significance of a hypergeometric P lower than a given threshold (vice versa for the case of DOWN). Otherwise, G ik is given a value of NOCHANGE. In this study, r=50% and a P threshold 0.05 were used.
Learning contextual gene set interaction networks
where B N k is a Bayesian network structure from kth run of BANJO, and F is a function that returns 1 if B N k has G i ↔G j , or 0 otherwise. The direction of connections in the Bayesian networks was ignored as we consider either direction of a connection to represent the same existence of dependency between two contextual gene sets. If d i j is larger than a given threshold d θ , we declared that a dependency exists between G i and G j . In our study, we used R=1,024 and d θ =0.5.
Identifying condition-specific network components
where . One characteristic of γ is that is proportional to , which is a conventional measure for the relevance of G i ↔G j to the type T k .
Theorem 1. Once is given, is proportional to .
The benefit of this characteristic is that γ can be used instead of the conventional measure , especially when the direct measurement of can be unreliable due to the limited number of samples for each condition, which is the case of many biological applications.
To measure the statistical significance of , a permutation test is done by using instead of , which is built by randomly selecting samples from S U . If H out of M permutations gave γ greater than or equal to , the statistical significance P of is H/M. When is larger than a given threshold γ θ and its P is lower than a threshold, G i ↔G j is declared to be specific to the condition T k . In our study, we limited M to 100 for each significance test of γ due to the computational cost of repeating Bayesian network learning many times for each permutation. γ θ =2, which indicates two-fold or higher increase of dependency relationship by adding the sample of the condition T k , and P=0.05 were used as threshold values.
Annotation of gene sets
GATHER  was used to identify GO terms that are associated to each contextual gene set with statistical significance, where the terms from molecular function and biological process categories were considered. P=0.01 was used as a significance threshold. For annotating specific regions in a contextual gene set interaction network, we computed the overlap of genes (all, over-expressed, or under-expressed) in a region with the pathway gene sets in MSigDB , and its statistical significance was evaluated with a hypergeometric P. After the false discovery rate (FDR) correction of P values using Benjamini and Hochberg’s method, FDR-corrected P=0.01 was used as a significance threshold.
Chronic myelogenous leukemia
The Cancer Genome Atlas
Iterative Signature Biclustering Algorithm
nicotinamide adenine dinucleotide
positive in positron emission tomography
sample association score
This study was funded in part by the National Cancer Institute, National Institutes of Health, under Contract No. 29XS195 and 1U01CA168397-01 (SJ, JK, M Berens and SK). The content of this paper does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does the mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. The results published here are in whole or part based upon data generated by The Cancer Genome Atlas pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov/.
- Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours. Nature. 2000, 406 (6797): 747-752. 10.1038/35021093.View ArticlePubMedGoogle Scholar
- Nowicki MO, Pawlowski P, Fischer T, Hess G, Pawlowski T, Skorski T: Chronic myelogenous leukemia molecular signature. Oncogene. 2003, 22 (25): 3952-3963. 10.1038/sj.onc.1206620.View ArticlePubMedGoogle Scholar
- Gasch A, Eisen M: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 2002, 3 (11): Research0059.1-0059.22. 10.1186/gb-2002-3-11-research0059.View ArticleGoogle Scholar
- Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004, 36 (10): 1090-1098. 10.1038/ng1434.View ArticlePubMedGoogle Scholar
- Luscombe N, Madan Babu, Yu H, Snyder M, Teichmann S, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431 (7006): 308-312. 10.1038/nature02782.View ArticlePubMedGoogle Scholar
- Shlomi T, Cabili M, Herrgard M, Palsson B, Ruppin E: Network-based prediction of human tissue-specific metabolism. Nat Biotechnol. 2008, 26 (9): 1003-1010. 10.1038/nbt.1487.View ArticlePubMedGoogle Scholar
- Bossi A, Lehner B: Tissue specificity and the human protein interaction network. Mol Syst Biol. 2009, 5: 260-PubMed CentralView ArticlePubMedGoogle Scholar
- Shiraishi T, Matsuyama S, Kitano H: Large-scale analysis of network bistability for human cancers. PLoS Comput Biol. 2010, 6 (7): e1000851-10.1371/journal.pcbi.1000851. [http://dx.doi.org/10.1371/journal.pcbi.1000851]PubMed CentralView ArticlePubMedGoogle Scholar
- Grzegorczyk M, Husmeier D, Edwards K, Ghazal P, Millar A: Modelling non-stationary gene regulatory processes with a non-homogeneous Bayesian network and the allocation sampler. Bioinformatics. 2008, 24 (18): 2071-2078. 10.1093/bioinformatics/btn367.View ArticlePubMedGoogle Scholar
- Kim S, Sen I: Mining Molecular Contexts of Cancer via in-silico Conditioning. Proceedings of Sixth International Conference on Comput Syst Bioinf. 2007, San Diego, CA, 169-179.Google Scholar
- Dougherty E, Brun M, Trent J, Bittner M: Conditioning-Based Modeling of Contextual Genomic Regulation. IEEE/ACM Trans Comput Biol Bioinf. 2009, 6 (2): 310-320.View ArticleGoogle Scholar
- Ramesh A, Trevino R, Von Hoff D, Kim S: Clustering Context-Specific Gene Regulatory Networks. Proceedings of Pac Symp Biocomput, Volume 15. 2010, Fairmont Orchid, HI, 444-455.Google Scholar
- Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S: Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics. 2003, 19 (suppl 2): ii227-ii236. 10.1093/bioinformatics/btg1082. [http://bioinformatics.oxfordjournals.org/content/19/suppl_2/ii227.abstract]View ArticlePubMedGoogle Scholar
- Friedman N, Linial M, Nachman I, Pe’er D: Using Bayesian Networks to Analyze Expression Data. J Comput Biol. 2000, 7: 601-620. 10.1089/106652700750050961.View ArticlePubMedGoogle Scholar
- Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S: Gene Regulatory Network Reconstruction Using Bayesian Networks, the Dantzig Selector, the Lasso and Their Meta-Analysis. PLoS ONE. 2011, 6 (12): e29165 EP-[http://dx.doi.org/10.1371/journal.pone.0029165]View ArticleGoogle Scholar
- Von Hoff DD, Stephenson JJ, Rosen P, Loesch DM, Borad MJ, Anthony S, Jameson G, Brown S, Cantafio N, Richards DA, Fitch TR, Wasserman E, Fernandez C, Green S, Sutherland W, Bittner M, Alarcon A, Mallery D, Penny R: Pilot Study Using Molecular Profiling of Patients’ Tumors to Find Potential Targets and Select Treatments for Their Refractory Cancers. J Clin Oncol. 2010, 28 (33): 4877-4883. 10.1200/JCO.2009.26.5983. [http://jco.ascopubs.org/content/28/33/4877.abstract]View ArticlePubMedGoogle Scholar
- Kervizic G, Corcos L: Dynamical modeling of the cholesterol regulatory pathway with Boolean networks. BMC Syst Biol. 2008, 2: 99-10.1186/1752-0509-2-99. [http://dx.doi.org/10.1186/1752-0509-2-99]PubMed CentralView ArticlePubMedGoogle Scholar
- Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network. Nat Genet. 2002, 31 (4): 370-377.PubMedGoogle Scholar
- Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang J, Nevins J: GATHER: a systems approach to interpreting genomic signatures. Bioinformatics. 2006, 22 (23): 2926-2933. 10.1093/bioinformatics/btl483.View ArticlePubMedGoogle Scholar
- Belenky P, Bogan K, Brenner C: NAD+ metabolism in health and disease. Trends Biochem Sci. 2007, 32: 12-19. 10.1016/j.tibs.2006.11.006.View ArticlePubMedGoogle Scholar
- Chen D, Guo J, Gahl W: RAB GTPases expressed in human melanoma cells. Biochimica et Biophysica Acta (BBA)/Molecular Cell Research. 1997, 1355: 1-6. 10.1016/S0167-4889(96)00169-3.View ArticleGoogle Scholar
- Shembade N, Parvatiyar K, Harhaj NS, Harhaj EW: The ubiquitin-editing enzyme A20 requires RNF11 to downregulate NF-[kappa]B signalling. EMBO J. 2009, 28 (5): 513-522. 10.1038/emboj.2008.285.PubMed CentralView ArticlePubMedGoogle Scholar
- DeWire SM, Ahn S, Lefkowitz RJ, Shenoy SK: Beta-Arrestins and Cell Signaling. Annu Rev Physiol. 2007, 69: 483-510. 10.1146/annurev.physiol.69.022405.154749. [http://dx.doi.org/10.1146/annurev.physiol.69.022405.154749]View ArticlePubMedGoogle Scholar
- Mashima T, Naito M, Tsuruo T: Caspase-mediated cleavage of cytoskeletal actin plays a positive role in the process of morphological apoptosis. Oncogene. 1999, 18: 2423-2430. 10.1038/sj.onc.1202558.View ArticlePubMedGoogle Scholar
- Verhaak RGW, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O’Kelly M, Tamayo P, Weir BA, Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG, James CD, Sarkaria JN, Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP, Gray JW, Meyerson M, Getz G, Perou CM, Hayes DN: Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer cell. 2010, 17: 98-110. 10.1016/j.ccr.2009.12.020. [http://linkinghub.elsevier.com/retrieve/pii/S1535610809004322]PubMed CentralView ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert B, Gillette M, Paulovich A, Pomeroy S, Golub T, Lander E, Mesirov J: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Magrassi L, Adomi L, Montorfano G, Rapelli S, Butti G, Berra B, Milanesi G: Vitamin D metabolites activate the sphingomyelin pathway and induce death of glioblastoma cells. Acta Neurochir. 1998, 140 (7): 707-713. 10.1007/s007010050166.View ArticlePubMedGoogle Scholar
- Besnard A, Galan-Rodriguez B, Vanhoutte P, Caboche J: Elk-1 a transcription factor with multiple facets in the brain. Front Neurosci. 2011, 5: 35-[http://dx.doi.org/10.3389/fnins.2011.00035]PubMed CentralView ArticlePubMedGoogle Scholar
- Booy E, Henson E, Gibson S: Epidermal growth factor regulates Mcl-1 expression through the MAPK-Elk-1 signalling pathway conributing to cell survival in breast cancer. Oncogene. 2011, 30: 2367-2378. 10.1038/onc.2010.616.PubMed CentralView ArticlePubMedGoogle Scholar
- Kabat GC, Etgen AM, Rohan TE: Do Steroid Hormones Play a Role in the Etiology of Glioma?. Cancer Epidemiol, Biomarkers & Prev. 2010, 19: 2421-2427. 10.1158/1055-9965.EPI-10-0658.View ArticleGoogle Scholar
- Merritt RL, Foran CM: Influence of Persistent Contaminants and Steroid Hormones on Glioblastoma Cell Growth. J Toxicol Environ Health, Part A. 2006, 70: 19-27. 10.1080/15287390600748807. [http://dx.doi.org/10.1080/15287390600748807]View ArticleGoogle Scholar
- Badeaux AM: The membrane androgen receptor as a therapeutic target for glioblastoma. PhD thesis, University of North Texas Health Science Center at Fort Worth 2012
- Gatson JW, Singh M: Activation of a Membrane-Associated Androgen Receptor Promotes Cell Death in Primary Cortical Astrocytes. Endocrinology. 2007, 148 (5): 2458-2464. 10.1210/en.2006-1443. [http://endo.endojournals.org/content/148/5/2458.abstract]View ArticlePubMedGoogle Scholar
- Perini G, Diolaiti D, Porro A, Della Valle: In vivo transcriptional regulation of N-Myc target genes is controlled by E-box methylation. Proc Nat Acad Sci USA. 2005, 102 (34): 12117-12122. 10.1073/pnas.0409097102. [http://www.pnas.org/content/102/34/12117.abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Howe A, Aplin AE, Alahari SK, Juliano R: Integrin signaling and cell growth control. Curr Opin Cell Biol. 1998, 10 (2): 220-231. 10.1016/S0955-0674(98)80144-0. [http://www.sciencedirect.com/science/article/pii/S0955067498801440]View ArticlePubMedGoogle Scholar
- Abe K, Chisaka O, van Roy F, Takeichi M: Stability of dendritic spines and synaptic contacts is controlled by [alpha]N-catenin. Nat Neurosci. 2004, 7 (4): 357-363. 10.1038/nn1212. [http://dx.doi.org/10.1038/nn1212]View ArticlePubMedGoogle Scholar
- Sherr CJ, Roberts JM: CDK inhibitors: positive and negative regulators of G1-phase progression. Genes & Dev. 1999, 13 (12): 1501-1512. 10.1101/gad.13.12.1501. [http://genesdev.cshlp.org/content/13/12/1501.short]View ArticleGoogle Scholar
- Nakayama KI, Nakayama K: Ubiquitin ligases: cell-cycle control and cancer. Nat Rev Cancer. 2006, 6 (5): 369-381. 10.1038/nrc1881. [http://dx.doi.org/10.1038/nrc1881]View ArticlePubMedGoogle Scholar
- Casorelli I, Russo TM, Bignami M: Role of Mismatch Repair and MGMT in Response to Anticancer Therapies. Anti-Cancer Agents in Medicinal Chemistry. 2008, 8: 368-380. 10.2174/187152008784220276.View ArticlePubMedGoogle Scholar
- Chen Y, Kamat VG, Dougherty ER, Bittner ML, Meltzer PS, Trent JM: Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics. 2002, 18 (9): 1207-1215. 10.1093/bioinformatics/18.9.1207.View ArticlePubMedGoogle Scholar
- van Dongen S: Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht 2000
- Sen I, Verdicchio M, Jung S, Trevino R, Bittner M, Kim S: Context-Specific Gene Regulations in Cancer Gene Expression Data. Proc Pac Symp Biocomput. 2009, Fairmont Orchid, HI, 75-86.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.