Learning contextual gene set interaction networks of cancer with condition specificity

Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further investigations. Conclusions The analysis on the contextual gene sets and characterization of networks of interaction composed of these sets discovered distinct functional differences underlying various types of cancer. The results show that our method successfully reveals many subtype-specific regions in the identified maps of biological contexts, which well represent biological functions that can be connected to specific subtypes.


Supplementary information I -Synthetic data generation from the Boolean network model of the cholesterol pathway
We assume that only the observation on steady states of the pathway is possible for data sampling. The proper way of sampling from steady states should be running the Boolean network from every possible initial state until an attractor is encountered and identifying the distribution of steady states based on the statistics of attractors. However, the Boolean network model of cholesterol regulatory pathway has 33 variables and there are total 2 33 ≈ 8.59 × 10 9 possible states, thus it is infeasible to trace the dynamics from all initial states. For this reason, we use an empirical approach to decide the sampling probability of each state. Let us suppose that a state s belongs to an attractor A, where an attractor is a subset of states from which a Boolean network cannot go to any other state than the ones in the attractor. The probability of observing s and A is as follows: . As a result, the probability of observing s becomes as follows: ! ! = ! ! ! !(!) This can be interpreted as the observation of s is done by first observing A and then observing s among the states in A.
In this simulation, we take the following scenario of observing the steady states of the cholesterol regulatory pathway in two different conditions -the absence of statins and the existence of statins.
1. Observing a steady state from the pathway without statins (by fixing the state of statins to "0") 2. Providing statins to the pathway as a perturbation (by setting the state of statins to "1") 3. After the pathway becomes stable, observing a steady state from the pathway From this scenario, we sampled L attractors for each case of without statins and with statins. Among 2 32 possible states with the statins status of "0", L states are randomly selected as initial states. From each initial state, state transition proceeds according to the specification of the Boolean network model until an attractor is encountered. Because we are using a deterministic Boolean network model, every initial state must arrive at an attractor. Attractors sampled in this way correspond to the set of steady states without statins perturbation. From each sampled attractor, a steady state is randomly chosen with a uniform probability and the perturbation of setting the status of statins to "1" is applied to the chosen state. The state transition proceeds again until an attractor is encountered. Attractors sampled in this case correspond to the set of steady states after perturbation by statins. With sampled attractors, we defined !(!) of a state s in an attractor A as follows:  Table S2. The statistics of sampled attractors. Sampled attractors for each case of without statins and with statins perturbation. Size represents the number of states in an attractor. Frequency indicates the number of initial states that arrived to that attractor.
We sampled L = 10 5 states from attractors for each case of without statins and with statins. The statistics of sampled states and attractors are listed in Table S2. Based on the statistics of sampled attractors and !(!), 100 states were sampled from the condition without statins as ! !"_!"#"$%! and 100 states were sampled from the condition with statins perturbation as ! !"#"$%! .

Supplementary Figure S1
The summarized gene set expression data of refractory cancer patients. There are 339 contextual gene sets for 113 cancer patient samples of 32 tissue types. The expressions of genes in each contextual gene set were summarized to one of three discrete values -UP (red), DOWN (green), and NOCHANGE (black).