 Research
 Open Access
 Published:
CMRF: analyzing differential gene regulation in two group perturbation experiments
BMC Genomicsvolume 13, Article number: S2 (2012)
Abstract
Background
Microarray experiments often measure expressions of genes taken from sample tissues in the presence of external perturbations such as medication, radiation, or disease. The external perturbation can change the expressions of some genes directly or indirectly through gene interaction network. In this paper, we focus on an important class of such microarray experiments that inherently have two groups of tissue samples. When such different groups exist, the changes in expressions for some of the genes after the perturbation can be different between the two groups. It is not only important to identify the genes that respond differently across the two groups, but also to mine the reason behind this differential response. In this paper, we aim to identify the cause of this differential behavior of genes, whether because of the perturbation or due to interactions with other genes.
Results
We propose a new probabilistic Bayesian method CMRF based on Markov Random Field to identify such genes. CMRF leverages the information about gene interactions as the prior of the model. We compare the accuracy of CMRF with SSEM and Student's t test and our old method SMRF on semisynthetic dataset generated from microarray data. CMRF obtains high accuracy and outperforms all the other three methods. We also conduct a statistical significance test using a parametric noise based experiment to evaluate the accuracy of our method. In this experiment, CMRF generates significant regions of confidence for various parameter settings.
Conclusions
In this paper, we solved the problem of finding primarily differentially regulated genes in the presence of external perturbations when the data is sampled from two groups. The probabilistic Bayesian method CMRF based on Markov Random Field incorporates dependency structure of the gene networks as the prior to the model. Experimental results on synthetic and real datasets demonstrated the superiority of CMRF compared to other simple techniques.
Background
Microarray experiments often measure expressions of genes taken from sample tissues in the presence of external perturbations such as medication, radiation, or disease [1, 2]. Typically in such experiments, gene expressions are measured before and after the application of external perturbation, and are called control data and noncontrol data, respectively. In this paper, we focus on an important class of such microarray experiments that inherently have two groups of tissue samples. Different groups in a microarray measurement can exist in many different ways. For instance, samples can be taken from members of multiple closely related species (e.g. rat versus mouse). Within the same species there can be subgroups with different phenotypes (e.g. African American versus Caucasian American). Another example is when the samples have already been through several alternative external perturbations (e.g. fasting and not fasting). When such different groups exist, it is not only important to observe overall changes in gene expression, but also to observe how different groups respond to the external perturbation. For example, Taylor et al. applied medications on 36 Caucasian American and 33 African American patients infected with Hepatitis C [3]. Gene expressions were collected before and after the medication.
In a perturbation experiment, some of the genes respond by noticeably changing their expression values between the control and noncontrol data. Genes that change their expressions in a statistically significant way are referred to as differentially expressed (DE), while those that do not, are referred to as equally expressed (EE) genes. In the context of two groups, we refer to a gene that has the same state in both the groups, i.e. either DE or EE for both the groups, as equally regulated (ER) gene. On the contrary, if a gene is DE in one group and EE in the other, we denote it as differentially regulated (DR).
Genes for any organism typically interact with each other via regulatory and signaling networks. For simplicity, we will refer to them as gene networks for the rest of this paper. A small portion of an example gene network can be seen in Figure 1.
Once an external perturbation is applied, a gene can be DE in one of the two ways  as a direct effect of the perturbation or via interaction with other DE genes through gene networks. We denote a gene as primarily affected DE, if it is DE due to the external perturbation. Similarly, a gene is secondarily affected DE, if it is DE due to another gene in the gene network. Figure 1 shows the state of the genes in the Pancreatic Cancer pathway after a hypothetical external perturbation is applied. In this figure, genes KRas, Raf and Cob42Roc are primarily affected and MEK, Ral and RalGDS are secondarily affected through interactions.
Recall that for a gene to be DR, it has to be EE in one group and DE in another group. For such a gene, if it happens to be DE in one group because of the external perturbation, we call it as primarily differentially regulated (PDR) gene. When it is DE in one group because of the interaction with other DE genes in the gene networks, we will refer to it by secondarily differentially regulated (SDR) gene. In this paper, we consider the problem of identifying the PDR genes in a given set of control and noncontrol gene expressions from two groups of samples.
Existing methods to identify the primarily affected DE genes using association analysis techniques [4, 5], haploinsufficiency profiling [6–8] and chemicalgenetic interaction mapping [9] are limited to applications where additional information such as fitness based assays of drug response or a library of genetic mutants is available. Bernardo et al. suggested a regression based approach named MNI that assumes that the internal genetic interactions are offset by the external perturbation [10]. It estimates genegene interaction coefficients from the control data and uses them to predict the target genes in the noncontrol data. Cosgrove et. al. proposed a method named SSEM that is similar to MNI [11]. SSEM models the effect of perturbation by an explicit change of gene expression from that of the unperturbed state.
We have also developed a method to detect the primarily and secondarily affected genes in perturbation experiments with a single data group [12]. We will call this method SMRF (single MRF) in the rest of this paper for it applies MRF on single group datasets. In that paper we developed a Bayesian probabilistic method based on Markov Random Field that leverages the information from gene networks as the prior belief of the model.
Though these methods analyze primary and secondary effects of perturbation on gene expressions, they are not directly applicable for multigroup perturbation experiments.
Several recent studies aim to identify DE genes in multiple groups of data points. maSigPro is a two stage regression based method that identifies genes that demonstrate differential gene expression profiles across multiple experimental groups [13]. Hong et al. proposed a functional hierarchical model for detecting temporally differentially expressed genes between two experimental conditions [14]. They modeled gene expressions by basis function expansion and estimate the parameters using a Monte Carlo EM algorithm. Tai et al. ranked DE genes using data from replicated microarray time course experiments, where there are multiple biological conditions [15]. They derived a multisample multivariate empirical Bayes statistic for ranking genes. Angelini et al. proposed a Bayesian method for detecting temporally DE genes between two experimental conditions [16]. Deun et al. developed a Bayesian method to find the genes that are differentially expressed in a single tissue or condition over multiple tissues or conditions [17]. All these methods identify differentially expressed genes in multiple groups. However, none of these methods analyzed the primary and secondary effects in a two group perturbation experiment. In this paper, we develop a method to solve this problem.
Our approach
In this paper, we propose a new probabilistic Bayesian method CMRF to find the PDR genes in two group perturbation experiment dataset as defined above. We call this method CMRF (Comparative MRF) for it applies MRF on two groups of data for comparison purpose. Our Bayesian method incorporates information about relationship from gene networks as prior beliefs. We consider the gene network as a directed graph where each node represents a gene, and a directed edge from gene g_{ i } to gene g_{ j } represents a genetic interaction (e.g g_{ i } activates or inhibits g_{ j }). We define two genes as neighbors of each other if they share a directed edge. For example, in Figure 1, genes KRas and Raf are neighbors as KRaf activates Ras. We also classify a neighbor as incoming or outgoing, if it is at the start or at the end of the directed edge respectively. In Figure 1, Raf is an incoming neighbor of MEK and MEK is an outgoing neighbor of Raf. When the expression level of a gene is altered, it can affect some of its outgoing neighbors. Thus, the gene expression can change due to external perturbation or because of one or more of the affected incoming neighbors.
We represent the external perturbation by a hypothetical gene (i.e. metagene) g_{0} in the gene network. We add an edge from the metagene to all the other genes because the external perturbation has the potential to affect all the other genes. So, g_{0} is an incoming neighbor to all the other genes. We call the resulting network the extended gene network. CMRF estimates the probability that a gene g_{ j } is DR due to an alteration in the activity of gene ${g}_{i}\left(\forall {g}_{i}\in \mathcal{G}\cup \left\{{g}_{0}\right\},\phantom{\rule{2.77695pt}{0ex}}{g}_{j}\in \mathcal{G}\right)$ if there is an edge from g_{ i } to g_{ j } in the extended network. We use a Bayesian model in our solution with the help of Markov Random Field (MRF) [18] to capture the dependency between the genes in the extended gene network. We define feature functions that encapsulate the domain knowledge available from gene networks and gene expression data. CMRF optimizes the joint posterior distribution over the random variables in the MRF using Iterated Conditional Modes (ICM) [19]. The optimization provides the state of the genes, the regulation of the genes and the probabilistic estimate of pairwise interactions between the genes including the metagene. Given this, we can rank the genes according to the data likelihood that a gene is DR because of the metagene g_{0}, and obtain a list of possible PDR genes.
Figure 2 illustrates different components of CMRF and the connectivity between them. Note that, (C) corresponds to the Bayesian prior based on MRF.
We compare the accuracy of CMRF with that of SSEM and Students t test on semisynthetic dataset generated from microarray data in Cosgrove et al [11]. We also compare CMRF with our old method SMRF that we developed to identify the primarily affected DE genes in a single group perturbation data [12]. CMRF obtains high accuracy and outperforms all the other three methods. Also, we conduct a statistical significance test using a parametric noise based experiment to evaluate the accuracy of CMRF. In this experiment our model demonstrates reasonable confidence regions for various values of the parameters.
The rest of the paper is organized as follows. Section Results and discussion presents the results of our experiments. Section Methods describes our methods in detail. Section Conclusions concludes our discussion.
Results and discussion
In this section we discuss the experiments we conducted to evaluate the quality of CMRF. We implemented CMRF in MATLAB and Java. We obtained the code for Differential Evolution from http://www.icsi.berkeley.edu/~storn/code.html. We compared CMRF with SSEM as SSEM is one of the most recent methods that considers identifying primarily affected genes in a perturbation experiments [11].
We obtained SSEM from http://gardnerlab.bu.edu/SSEMLasso. We executed our code on a QuadCore AMD Opteron 2 Ghz workstation with 32 GB of memory.
Dataset
We used four different sets of data to conduct the experiments in this paper.

Dataset 1. The first dataset was collected by Smirnov et al. [20]. This dataset was generated using 10 Gy ionizing radiation over immortalized B cells obtained from 155 members of 15 Centre d'tude du Polymorphisme Humain (CEPH) Utah pedigrees [21]. Microarray snapshots were obtained before (at zeroth hour) and after (at second and sixth hours) the application of radiation.

Dataset 2. The second dataset corresponds to a drug response experiment conducted by Taylor et al [3]. Medications were applied on 36 Caucasian American and 33 African American patients infected with Hepatitis C. Gene expressions were collected before the medication was started and at 1, 2, 7, 14, 28 days after the medication was administered.
Both dataset 1 and 2 are microarray time series data with more than two time points. We adapted these two time series data two create control and noncontrol data suitable for our experiments. We used the data before perturbation as control data. For the noncontrol data we calculated the expected expression of a gene at each points after the perturbation. We selected the one with highest absolute difference from the expected expression of control data for that gene.

Dataset 3. We created dataset 3 using dataset 1. We used the control group of dataset 1 as the control group of dataset 3. Then, we changed the expression values of some of the randomly selected genes to model the primary effect of external perturbation. From that perturbed dataset, we simulated the secondary effects using the sigmoid method described in Garg et al. [22]. We denote the parameter for primary perturbation effect by deviation. Deviation is the ratio of the change of expression value Δx of a gene to its original expression value x (i.e. $derivation=\frac{\Delta x}{x}$) which is normalized between zero and one. We tuned the other parameters of the method to create a meaningful dataset as follows; alpha = 1, β = 0.01, k_{ ac }= 1.0, k_{ in }= 1, h = 0.1.

Dataset 4. We create this dataset from dataset 1 in two steps as follows.

 Selection of genes. In order to carry out experiments on larger scale data with known PDR genes, we generated data in the presence of a hypothetical perturbation from the real datasets as follows. We first select three sets of genes. Each set consists of some primarily affected genes and a higher number of secondarily affected genes. Here, we describe how we construct each of the three sets of affected genes. We first select a random gene from the network and label it as a primarily affected DE gene. We then traverse its outgoing neighbors in a breadth first search manner. As we visit a gene during traversal, we label it as a secondarily affected DE gene with a probability of 1 − (1 − q)^{η}, where η is the number of incoming DE neighbors. Here q is the probability that a gene is DE due to a DE predecessor (0.4 in our experiments). We repeat these steps to create the desired number of primarily affected genes.
After we obtain the three set of genes, we assign one set to both D_{ A }and D_{ B }groups. We assign the other two sets of genes to different groups. These two set of genes are differentially regulated as they are affected in only one group and not in the other. The three groups can contain different number of primarily and secondarily affected genes. We call these three sets of genes as primarily differentially regulated, secondarily differentially regulated and equally regulated genes.

Generation of gene expression. Once we identify these three sets of genes in the two groups, we create control and noncontrol data for D_{ A }and D_{ B }over N samples. We use the control part of the real dataset in Smirnov et al. as the control part of our synthetic dataset in both D_{ A }and D_{ B }[20]. To generate the noncontrol dataset, we traverse each of the genes that participate in the gene networks. Consider a gene g_{ i }with mean and standard deviation of expression in the control dataset given by µ_{ i }and σ_{ i }respectively.
If the gene is EE we generate its noncontrol data points from the a normal distribution given by the parameters (µ_{ i }, ${\sigma}_{i}^{2}$). If the gene is DE, we use the same variance but different mean as that of the control group. For the primarily and secondarily affected genes we use µ_{ i }± d_{ p }and µ_{ i }± d_{ s }respectively, where d_{ p }>d_{ s }.

To summarize, we used the same variance in the noncontrol group as that in the control group. However, for an affected gene we changed the value of the mean in the noncontrol group from that in the control group. For a primarily affected gene we applied a higher deviation of mean than that of the secondarily affected genes.
Regulatory networks
We collected 24,663 genetic interactions from the 105 regulatory and signaling pathways of KEGG database [23]. Overall 2,335 genes belong to at least one pathway in KEGG. In our model, we considered only the genes that take part in the gene networks.
Comparison to other methods
Our method provides us a list of differentially regulated genes. We sort the list of those genes as follows. Consider a DR gene g_{ i }, which is DE in D_{ A } and EE in D_{ B }. We calculate the likelihood of being EE in D_{ A } and DE in D_{ B } for that gene. We can interpret this step as the probability of being DR, but in a reverse way. We could instead use the probability that the gene is DE in D_{ A } and EE in D_{ B }. However, according to our observation, the earlier metric provides a much better accuracy. We sort all the DR genes with increasing order of that likelihood.
As per our knowledge, no other method exists that differentiates between the primary and secondary effects in a twogroup perturbation experiment. There exist some studies in identifying primarily affected genes in single group datasets. We compared the accuracy of CMRF to three such methods namely, SMRF, Student's t test and SSEM.
Experimental setup
Given an input dataset, using each of the four methods, we ranked all the genes. Highly ranked genes have higher chance of being a PDR according to each method. However, as other three methods are not tailored to solve this problem, we created separate ranking on D_{ A } and D_{ B }. Then, out of those two ranks, we created a unified rank of differentially regulated genes. We shall elaborate on this unified rank creation later. We, first, explain how we create ranks on individual groups D_{ A } and D_{ B } for other three methods.

SMRF. We apply the SMRF to each group separately and obtain a set of differentially expressed genes. We sort the genes in decreasing order of joint likelihood with the metagene. A higher joint likelihood implies a higher chance of being primarily affected.

SSEM. We train SSEM on the control dataset, where it learns the correlation between the genes. We test SSEM on the noncontrol dataset of each group, where it produces a rank for each single data point.

Student's t test. We use the function called ttest2 from MATLAB. We apply it on every individual gene, where it takes control and noncontrol dataset as input and produces a pvalue as output. We assume that the null hypothesis corresponds that the gene is EE. So a substantially lower pvalue implies a higher chance of being primarily affected. We perform the test on all the genes and rank them according the increasing order of pvalues.
Now we describe how we create an unified ranking of differentially regulated genes for these three methods. We denote the ranks from data group D_{ A } and D_{ B } by R_{ A } and R_{ B } respectively. The unified rank is defined by R_{ U } . We denote the number of genes in each rank to be ω_{ A } and ω_{ B } respectively. We scan both the ranks simultaneously from first position to ω = min(ω_{ A }, ω_{ B }). While scanning at the kth position, we denote the equally regulated set obtained till that position by Λ_{ k } = R_{ A } (1: k) ∩ R_{ B } (1: k). We include R_{ T } (k) to the unified rank R_{ U } if R_{ T } (k) ∉ Λ_{ k }, T ∈ {A, B}. For SSEM we obtain a separate R_{ U } for each data point. We average the accuracies over all these ranks.
Results
In this experiment we used dataset 3, that we have just described. To observe the accuracy of CMRF at varying degree of difficulties, we conducted experiments with four different values of deviation, namely, {0.5, 0.6, 0.7, 0.8}. However, we discuss only two of them in this paper (see Figure 3) since for other two parameters the results are similar. The results we discuss correspond to the cases when deviation = {0.6, 0.8}.
Figures 3(a) and 3(b) show the sensitivity of the four methods with the two deviation settings. The former one corresponds to the computationally harder case as the difference between the noncontrol groups of primarily and secondarily affected genes is small. As the deviation increases identifying primarily affected genes becomes easier. Form the figure, we observe that CMRF is significantly more accurate than the other three methods for all datasets consistently. It reaches almost 50% sensitivity (i.e., it can find around 1518 primarily affected genes out of 30) in the top 50 ranked genes, when the deviation is 0.6. On the other hand, its achieves a sensitivity of 0.6 when the deviation is 0.8. We obtained similar results for other deviations, which we do not discuss here. The method in SMRF reaches to 30% and 40% accuracy, however at a slower pace. The ttest reaches around 25% and 30% sensitivity at ranking position 50 for these two cases respectively. SSEM's sensitivity is below 0.1 for all experiments even within the top 50 positions.
We believe that there are three major factors for the success of our method over the other competing methods. First, the other methods do not simultaneously handle two groups of datasets and are not able to generate an unified ranking of differentially regulated genes. CMRF encompasses both groups in a single model and probabilistically determines the PDR genes. Hence, it is more shielded against the false positives introduced during the unification of ranking. Second, CMRF can successfully incorporate the gene interactions using MRFs while others ignore this information. Finally, in real perturbation experiments, multiple genes are often primarily affected. CMRF is capable of dealing with both large and small number of primarily affected genes, while performances of other methods deteriorate as the number of primarily affected genes grows. Thus, we conclude that our method is more suitable for real perturbation experiments.
Statistical significance experiment
The experiments in the last section enable us to compare the accuracy of CMRF with that of the other methods on synthetic datasets. We also wanted to evaluate the accuracy of CMRF on real dataset. However, we do not have any gold standard available that enlists true set of PDR genes. Hence, we conducted a set of statistical significance experiments to estimate the confidence of our accuracy. Specifically, we obtained the control data from a real dataset, perturbed it in a controlled way for a number of genes. We calculated the likelihood probabilities of those genes and created a distribution. We repeated this process with varying amount of perturbation. Finally, we executed CMRF on a real dataset and analyzed the result.
Results
We obtained the real dataset from drug response experiment conducted by Taylor et al [3], which is actually dataset 2. Apart from this real dataset, we create different versions of dataset 4 by varying d_{ p } as {0.1, 0.2, 0.3,..., 3.0}. If d_{ p } > 1.1, we set d_{ s } to 1, otherwise d_{ s } = 0.5 × d_{ p }. Thus, we have 30 synthetic datasets in total. In every dataset, we fix the number of primarily and secondarily differentially regulated genes to 50 and 172 respectively. To decide whether a gene g_{ i } is DR, when g_{ i } is DE in D_{ A } and EE in D_{ B }, we define a nullhypothesis H_{0i}: g_{ i } is DR, but in the reverse way, i.e. g_{ i } is EE in D_{ A } and DE in D_{ B }. We calculate the likelihood of being EE in D_{ A } and DE in D_{ B } for that gene, as described. For gene g_{ i }, we denote the log likelihood of accepting H_{0i}by LL_{ i }. In every dataset, we create a box plot of the 50 LL_{ i } values, as the number of DR genes in each dataset is 50. A lower value LL_{ i } indicates that g_{ i } has a higher probability of being differentially regulated.
Figure 4 illustrates the statistical significance of the experiments over the datasets with d_{ p } = 1.2 to 2.0. The box plot demonstrates a relationship between the Pvalue and d_{ p }. A higher value of d_{ p } indicates a lower Pvalue and hence, a high chance of being PDR. We also observe that the variance of Pvalue increases with the increase of d_{ p }.
We also executed CMRF on the real datasets without any modification. Interestingly, on the real dataset from Taylor et al. [3] (dataset 2), we did not obtain any genes as differentially regulated. A careful observation concludes that when both the number of data points and the gap d_{ p } (i.e. the signal to noise ratio) is low, the coefficients γ_{6} and γ_{7} in the prior density become strong and all genes are identified as equally regulated. However, when either the number of data points or d_{ p } is significantly high, the data can overcome the prior. In the current real dataset, the number of data points is only 33 and the gaps between the control and noncontrol group were less than 1.2 × σ. As a result, CMRF identifies no differentially regulated genes in the dataset. Thus, we can conclude that either there is not much difference between the two groups in the real data, or the data does not contain enough data points, so that our model can highlight difference between the two groups.
In Figure 4 we present the results for d_{ p } = 1.2 to 2.0. Note, that for d_{ p } < 1.2 × σ our model did not identify any DR genes. Here also, we attribute a similar reason for not finding any DR genes as both d_{ p } and the number of data points are small. On the other hand, in Section Comparison to other methods, when we execute CMRF on synthetic datasets with 155 data points we were able to identify a substantial number of true PDR genes even with d_{ p } = 0.02 × σ. To substantiate our conclusion that there exists little difference between the two groups in the real dataset, we conducted a set of permutation tests. We shuffled the two original groups to create new sets of data. We repeated this process for a number of times (40 in the present experiment) and executed CMRF on each of them. For every derived dataset, CMRF did not find any DR genes. Hence, this experiment bolsters the claim that there are no DR genes in the original real data.
An interesting question can be raised is that "is there indeed no DR genes in the real dataset from Taylor et al. [3]?" Another similar question can be "will our method be able to detect DR and PDR genes from similar other real datasets?" We believe that CMRF requires a bigger dataset for DR and PDR genes to be discovered. For example, CMRF is able to identify the DR and PDR genes from the synthetic dataset that contains substantially higher number of data points than that of the real dataset. Since the difference between control and noncontrol groups of a DE gene is small compared to the variance of the data points, it is difficult to detect that subtle effect of perturbation with a small dataset. For a small dataset, the prior due to third hypothesis becomes strong and the two corresponding parameters γ_{6} and γ_{7} assumes extreme values. Thus the support from data is not sufficient to overcome the prior and hence, the method is not able to identify the DR and PDR genes. There are two solutions to overcome this problem. First of them is to employ a bigger dataset. With the advancement of comparatively inexpensive and high throughput technologies bigger dataset are increasingly common nowadays. From that perspective, CMRF is supposed to perform more accurately in the near future. A second option to circumvent the problem is to restrict the growth of the two parameters γ_{6} and γ_{7}. If we have knowledge about the values of these two parameters, we can assign then as input to the program and refrain from estimating their values. This will enable us to employ a comparatively noninformative prior which will be easier for the data to overcome. Also, we can use specific bound over those variables while estimating them to avoid them becoming stronger.
Conclusions
Microarray experiments often measure expressions of genes taken from sample tissues in the presence of external perturbations such as medication, radiation, or disease. Typically in such experiments, gene expressions are measured before and after the application of external perturbation.
In this paper, we solved the problem of finding primarily differentially regulated genes in the presence of external perturbations when the data is sampled from two groups. The probabilistic Bayesian method based on Markov Random Field incorporates dependency structure of the gene networks as the prior to the model. Experimental results on synthetic and real datasets demonstrated the superiority of CMRF compared to other simple techniques.
Methods
In this section we describe different components of CMRF. Section Notation and problem formulation describes the notation and formulates the problem. Section Overview of the solution provides a high level overview of the solution. Section Computation of the prior density function describes the calculation of the prior density function of MRF. Section Approximation of the objective function discusses the definition of a tractable objective function. Section Computation of likelihood density function discusses the calculation of the likelihood function. Finally, Section Objective function optimization describes the algorithm to optimize the objective function.
Notation and problem formulation
In this section, we describe our notation and formally define the problem. We define a Bayesian model for gene expression in a twogroup perturbation experiment. We classify the random variables of the model into two different groups, namely observed variables and hidden variables. We have the values for the observed variables, while we estimate the values of the hidden variables.
Observed variables
We define two sets of observed variables, one for microarray gene expression data and another for the neighborhood in the extended gene network.

Microarray data. We denote the number of genes by M and the number of data points in the two groups D_{ A }and D_{ B }by N_{ A }and N_{ B }respectively. We represent the set of genes with $\mathcal{G}=\left\{{g}_{1},\phantom{\rule{2.77695pt}{0ex}}{g}_{2},\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}{g}_{M}\right\}$. For each gene and for each group the microarray data contains the gene expression values before and after the perturbation, i.e. control and noncontrol data respectively. We denote the expression value of the ith gene from the jth sample in the control data of group D_{ A }with y_{ Aij }. We represent the same for the noncontrol data with ${y}_{Aij}^{\prime}$. Thus the expression values of the gene g_{ i }for all the samples in D_{ A }for control and noncontrol data are ${\mathbf{y}}_{\mathbf{A}\mathbf{i}}=\left\{{y}_{Ai1},\phantom{\rule{2.77695pt}{0ex}}{y}_{Ai2},\phantom{\rule{2.77695pt}{0ex}}\cdots {y}_{Ai{N}_{A}}\right\}$ and ${\mathbf{y}}_{\mathbf{A}\mathbf{i}}\prime =\left\{{y}_{Ai\mathsf{\text{1}}}\prime ,\phantom{\rule{2.77695pt}{0ex}}{y}_{Ai\mathsf{\text{2}}}\prime ,\phantom{\rule{2.77695pt}{0ex}}\cdots {y}_{Ai{N}_{A}}\prime \right\}$ respectively. We denote all the expression values in group D_{ A }for gene g_{ i }with ${Y}_{Ai}\left(\mathsf{\text{i}}.\mathsf{\text{e}}.\phantom{\rule{2.77695pt}{0ex}}{Y}_{Ai}={y}_{Ai}\cup {{y}^{\prime}}_{Ai}\right)$. We denote the collection of the gene expressions of all the genes in group D_{ A }by ${\mathcal{Y}}_{A}={\bigcup}_{i=1}^{M}{Y}_{Ai}$. We define ${\mathcal{Y}}_{B}$ similarly for all the genes in D_{ B }. We refer the complete gene expression data using variable $\mathcal{Y}={\mathcal{Y}}_{A}\cup {\mathcal{Y}}_{B}$.

Neighborhood variables. We use the term $\mathcal{W}=\left\{{W}_{ij}\right\}$ to indicate if two genes g_{ i }and g_{ j }are neighbors in the extended gene network. If g_{ i }is an incoming neighbor of g_{ j }(i.e. g_{ j }has an incoming edge from g_{ i }), then we set the value of W_{ ij }(1 ≤ i, j ≤ M) to 1. It is 0 otherwise.
Hidden variables
We define three sets of hidden variables, These variables govern the state of genes, regulations of genes and interactions among genes respectively.

State variables. We use ${\mathcal{S}}_{A}=\left\{{S}_{Ai}\right\}$ and ${\mathcal{S}}_{B}=\left\{{S}_{Bi}\right\}$, (1 ≤ i ≤ M) to denote the states of the genes in group D_{ A }and D_{ B }. S_{ Ai }= 1 if g_{ i }is DE in D_{ A }and 0 if it is EE in D_{ A }. We define S_{ Bi }similarly. We assume that the metagene g_{0} is DE for both D_{ A }and D_{ B }. Thus, S_{ A }_{0} = S_{ B }_{0} =1.

Regulation variables. We denote the regulation condition of gene g_{ i }with Z_{ i }. Table 1 enumerates different values of Z_{ i }for the values of S_{ Ai }and S_{ Bi }. In this formulation, the cases Z_{ i }= {2, 3} indicate that g_{ i }is DR, whereas Z_{ i }= {1, 4} indicate that g_{ i }is ER. The metagene is guaranteed to be ER, since S_{ A }_{0} = S_{ B }_{0} = 1.

Interaction variables. In order to govern the joint regulation states of genes g_{ i }and g_{ j }we define interaction variables $\mathcal{X}=\left\{{X}_{ij}\right\},\left(1\le i,j\le M\right)$. Mathematically, X_{ ij }= 4 × (Z_{ i }− 1) + Z_{ j }. Note that, this equation is created to maintain brevity of the mapping between the interaction variables and the regulation variables by carefully assigning different numeric constants between one and 16 to appropriate values of an interaction variable. Table 1 enumerates different values of X_{ ij }for values of Z_{ i }and Z_{ j }. Specifically, X_{0j}∈ {2, 3} and X_{0j}∈{1, 4} correspond to the cases where g_{ j }is DR and ER respectively because of interaction with the metagene g_{0}.
It is easy to see that the hidden variables follow a hierarchical structure. For instance, the value of Z_{ i } depends on the values of S_{ Ai } and S_{ Bi }. Similarly, the value of X_{ ij } depends on the values of Z_{ i } and Z_{ j }. Thus, the value of the dependent variable X_{ ij } is based on the values of four independent variables S_{ Ai }, S_{ Bi }, S_{ Aj } and S_{ Bj } . Table 1 enumerates the values of Z_{ i }, Z_{ j } and X_{ ij } for different values of S_{ Ai }, S_{ Bi }, S_{ Aj } and S_{ Bj } .
It is worth noting that the different values that we assign to the hidden variables are categorical in nature.
Problem formulation
Let $\mathcal{G}=\left\{{g}_{1},\phantom{\rule{2.77695pt}{0ex}}{g}_{2},\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}{g}_{M}\right\}$ denote the set of all genes. Using the definition of the neighborhood variables , we denote the collection $\left(\mathcal{G},\phantom{\rule{2.77695pt}{0ex}}\mathcal{W}\right)$ by which essentially represents the gene networks. We denote the metagene by g_{0}. Given an observed data $\left\{\mathcal{V},\phantom{\rule{2.77695pt}{0ex}}\mathcal{Y}\right\}$ we want to estimate the probabilities $p\phantom{\rule{0.3em}{0ex}}\left({X}_{ij}=x\mathcal{X}{X}_{ij},\phantom{\rule{2.77695pt}{0ex}}\mathcal{Y},\phantom{\rule{2.77695pt}{0ex}}\mathcal{V}\right),\phantom{\rule{2.77695pt}{0ex}}x\in \left\{1,2,\phantom{\rule{2.77695pt}{0ex}}\cdots 16\right\}$.
A higher value of p (X_{0j}= {2, 3}·) indicates a higher probability of a gene g_{ j } being PDR. Using the estimated values of p (X_{0j}·), ∀_{ j } ∈ {1, 2,... M}, we can create an ordered list of candidate PDR genes.
Overview of the solution
This section describes a high level overview of our approach to estimate p(X_{0j}·), ∀_{ j } ∈ {1, 2,... M}. One simple approach can be using a hypothesis test to find out the PDR genes in the given dataset [15]. However, the available hypothesis tests do not consider the interactions among genes in the gene network. Also, deciding on the significance of test can be a complex step. Another approach can be to use SSEM to create a rank of the potential primarily affected genes in each group separately [11]. Then we can select the top k genes in each group and perform a set difference to obtain the PDR genes. Though SSEM considers the correlation between the genes, it does not utilize any known information from the gene networks.
We build a Bayesian probabilistic method based on Markov Random Field where we leverage the information from gene networks as the prior belief of the model. Using Bayes theorem [24] we can write the joint probability density of interaction variables as,
The first term in the numerator, $P\left(\mathcal{Y}\mathcal{X},\phantom{\rule{2.77695pt}{0ex}}\mathcal{V},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{Y}\right)$, is the likelihood of the observed expression data given the interaction variables and gene network. θ_{ Y } represents the parameters for the likelihood function. A detailed discussion of how we compute this likelihood can be found in Section Computation of likelihood density function. The second term in the numerator $P\left(\mathcal{X}\mathcal{V},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{X}\right)$ represents this prior belief. θ_{ X } represents the parameters for the prior density function. We define a Markov Random Field (MRF) over the interaction variables and the priors are encoded via feature functions in the MRF. Details of the priors and the associated feature functions are outlined in Section Computation of the prior density function. The denominator of Equation 1 is the normalization constant that represents the sum of the product of the likelihood and the prior over all possible assignments of interaction variables .
Given the joint probability density function outlined in Equation 1, our original problem reduces to obtaining assignments for the interaction variables and the parameters θ_{ X } and θ_{ Y } that maximize it.
A Maximum Likelihood Estimation (MLE) of Equation 1 is practically infeasible even for a small number of genes since the number of terms in the denominator grows exponentially. Instead we use a pseudolikelihood version of the objective function as shown in Section Approximation of the objective function. We use Iterative Conditional Modes (ICM) [19] and Differential Evolution [25] in an alternating optimization technique to maximize the pseudolikelihood with respect to , θ_{ X } and θ_{ Y }.
After the optimization, we obtain an assignment for , θ_{ X } and θ_{ Y }. Using these assignments and the observed data, we estimate the posterior probability of all X_{ ij } variables. Using the estimated values of p(X_{0j}·), ∀_{ j } ∈ {1, 2,... M}, we create an ordered list of candidate PDR genes. We elaborate on each of these steps next.
Figure 2 illustrates different portions of CMRF and the connectivity between them.
Computation of the prior density function
In this section, we describe how we incorporate gene network as the the prior belief into our Bayesian model. From the structure and properties of gene network, we build three hypotheses and embed them into our model. We present the entire concept in three numbered subsections.
1. Statement of hypotheses
Here we state the three hypotheses on the biological networks in brief.

Hypothesis 1. In each group D_{ T }(T ∈ {A, B}), the metagene g_{0} can change the state of all the other genes. Thus, all the genes can be directly affected by the external perturbation.

Hypothesis 2. In each group D_{ T }(T ∈ {A, B}), a gene g_{ i }can change the states of its outgoing neighbors g_{ j }in the same data group, i.e. a gene can be indirectly affected by the perturbation through genetic interactions.

Hypothesis 3. Each gene has a high probability of being equally regulated. This follows from the observation that, often the difference between the expressions of most of the genes in two groups is small. We expect that the response of genes in these groups is very similar.
Clearly, when the data does not follow one or more of the hypotheses, the optimization function can overcome the prior belief with a strong support from the data.
2. Markov Random Field construction
In order to compute the prior density function, we define a Markov Random Field (MRF) over the variables [18]. MRF is a probabilistic model, where the state of a variable depends only on the states of its neighbors. MRF is useful to model our problem as the states of genes depend on their neighbors. Here, the MRF is an undirected graph $\Psi =\left(\mathcal{X},\phantom{\rule{2.77695pt}{0ex}}\mathcal{E}\right)$, where $\mathcal{X}=\left\{{X}_{ij}\right\}$ variables represent the vertices of the graph (i.e. each interaction variable X_{ ij } corresponds to a vertex). We denote the set of edges with $\mathcal{E}=\left\{\left({X}_{ij},\phantom{\rule{2.77695pt}{0ex}}{X}_{pj}\right){W}_{pi}={W}_{ij}=1\right\}\cup \left\{\left({X}_{ij},\phantom{\rule{2.77695pt}{0ex}}{X}_{ik}\right){W}_{jk}={W}_{ij}=1\right\}$. Thus, two variables in share an edge if they share a common subscript at the same position and the two genes corresponding to the other subscript interact in the gene network. For example, in Figure 5(b), X_{35} and X_{25} are neighbors, as they share 5 (i.e. gene g_{5}) as the second subscript and g_{2} and g_{3} interact in the gene network in Figure 5(a).
One important point to note is that, this graph does not use the state variables or the regulation variables to model the dependencies between the genes. Rather, it establishes those dependencies over the variables. For example, in Figure 5(b) we draw the MRF graph corresponding to the hypothetical gene network in Figure 5(a). In the gene network, there is an edge from g_{2} to g_{3}. So, g_{2} can potentially change the state of g_{3}. We create an edge from X_{12} to X_{13} that corresponds to the edge from g_{2} to g_{3}. As g1 is common for X_{12} and X_{13}, if they assume the same value (i.e. X_{12} = X_{13}), it implies that the genes g_{2} and g_{3} are in same state (i.e. S_{ T }_{2} = S_{ T }_{3},T ∈ {A, B}). We formulate these dependency constraints using a set of unary and binary functions called feature functions. We discuss these feature functions next.
3. Development of feature functions
We denote the neighbors of X_{ ij } in the MRF graph as ${X}_{ij}^{*}=\left\{{X}_{kj}{W}_{ki}=1\right\}\cup \left\{{X}_{ip}{W}_{jp}=1\right\}$. We define a clique over each X_{ ij } and its neighbors ${X}_{ij}^{*}$ by C_{ ij } provided W_{ ij } = 1. A feature function f(C_{ ij } ) is a Boolean function defined over the clique C_{ ij } . This function evaluates to one or zero, if it is satisfied or not, respectively. We define a potential function ψ(C_{ ij } ) corresponding to f(C_{ ij }) as an exponential function given by exp(γf (C_{ ij } )). Here γ is a coefficient associated with f(C_{ ij }) that represents the relevance of f(C_{ ij }) in the MRF. According to HammersleyClifford theorem, we express the joint density function of the MRF over as product of potential functions defined over that MRF as, $p\phantom{\rule{0.3em}{0ex}}\left(\mathcal{X}{\theta}_{X}\right)=\frac{1}{\Delta}{\prod}_{{C}_{ij},{W}_{ij}=1}\psi \left({C}_{ij}\right)$[26]. In this formulation, Δ is the normalization function $\mathrm{\Delta}={\displaystyle {\sum}_{\mathcal{X}}}{\displaystyle {\prod}_{{C}_{ij}}}\psi ({X}_{ij})$. To limit the complexity of our model, we consider only cliques of size one and two.
We define seven feature functions to capture the dependencies among the variables in according to the three hypotheses.
Unary feature functions
F_{1}, F2, F3. A primary component of the prior density function is modeling the frequency of X_{ ij } itself. Here, we focus on two values of X_{ ij } namely X_{ ij } = {2, 3}, since they correspond to the events that a gene g_{ j } is DR due to the metagene g_{0}. When X_{ ij } = 2, g_{ j } is DE in D_{ A } and EE in D_{ B }. To capture this, we define a feature function F_{1}(X_{ ij }) which returns one when X_{ ij } = 2. It returns zero otherwise. Similarly, X_{ ij } = 3 when g_{ j } is EE in D_{ A } and DE in D_{ B }. We define another feature function F_{2}(X_{ ij }), which returns one when X_{ ij } = 3. We capture all the other values of X_{ ij } by a feature function called F_{3}(X_{ ij }). It returns zero when X_{ ij } ∈ {2, 3} and equals to one otherwise. Table 2 enumerates the the domains and ranges of F_{1}, F_{2} and F_{3}.
Binary feature functions
F_{4}, F_{5}. Let ϒ represent the hypothesis that in a group D_{ T } ,T ∈ {A, B} a gene g_{ j } including the metagene can change the state of one of its outgoing neighbors g_{ k }. We make a stronger hypothesis ϒ° that, ϒ holds simultaneously in D_{ A } and D_{ B } with high probability. Note that, this stronger hypothesis is based on the assumption that the genes in both D_{ A } and D_{ B } express in a similar fashion. This assumption is meaningful as in these twogroup perturbation experiments the different groups belong to similar biological conditions [3].
ϒ° is encoded in domain as follows. Consider four genes g_{ p }, g_{ i }, g_{ j } and g_{ k }, such that g_{ p } → g_{ i }, g_{ i } → g_{ j } and g_{ j } → g_{ k }. Here → indicates that the gene on the left activates or inhibits the gene on the right. By definition, (X_{ pj }, X_{ ij }) and (X_{ ij } , X_{ ik }) are edges in the MRF. Note that the first edge corresponds to an incoming neighbor g_{ p } of g_{ i }, while the second edge corresponds to an outgoing neighbor g_{ k } of g_{ j } . We discriminate between these two sets of neighbors of X_{ ij } , as they are related to the incoming neighbors of g_{ i } and outgoing neighbors of g_{ j } respectively. It can be shown that, for the first set of edges, X_{ pj } equals to X_{ ij } if and only if (iff) Z_{ p } = Z_{ i }, i.e. ϒ° holds true. Similarly, for the second set of edges X_{ ij } equals to X_{ ik } iff Z_{ j } = Z_{ k }, which in tern implies that ϒ° is satisfied.
We define two sets of feature functions to formalize these equalities based on the incoming neighbors of g_{ i } and the outgoing neighbors of g_{ j }.

Left external equality. We denote the incoming neighbors of g_{ i }with In (g_{ i }). We write a feature function f_{4}(X_{ pj }, X_{ ij }), ∀_{ p }, g_{ p }∈ In (g_{ i }). f_{4}(X_{ pj }, X_{ ij }) = 1 if Z_{ i }= Z_{ p }and W_{ pi }= W_{ ij }= 1. Otherwise, f_{4}(X_{ pj }, X_{ ij }) = 0. We denote the summation of this function over all the incoming neighbors of g_{ i }as,
$${F}_{4}\left({X}_{ij}\right)=\sum _{p,\phantom{\rule{0.3em}{0ex}}{W}_{ij}=1,{W}_{pi}=1}{f}_{4}\left({X}_{ij},\phantom{\rule{2.77695pt}{0ex}}{X}_{pj}\right).$$ 
Right external equality. We denote the outgoing neighbors of g_{ j }as Out (g_{ j }). We define a feature function f_{5}(X_{ ik }, X_{ ij }), ∀_{ k }, g_{ k }∈ Out (g_{ j }). f_{5} (X_{ ik }, X_{ ij }) = 1 if S_{ k }= S_{ j }and W_{ jk }= W_{ ij }= 1. Otherwise, f_{5} (X_{ ik }, X_{ ij }) = 0. We denote the summation of this function over all the outgoing neighbors of g_{ j }as,
$${F}_{5}\left({X}_{ij}\right)=\sum _{k,\phantom{\rule{0.3em}{0ex}}{W}_{ij}=1,{W}_{jk}=1}{f}_{5}\left({X}_{ij},\phantom{\rule{2.77695pt}{0ex}}{X}_{ik}\right).$$
Tables 3 and 4 enumerate the values of f_{4} and f_{5} for different values of X_{ ij }. The missing entries in these tables correspond to the cases which can not occur during the optimization. For instance, in Table 3, a missing entry corresponds to different values of Z_{ j } in X_{ ij } and X_{ pj } which is not possible.
For feature functions f_{4} and f_{5}, X_{ pj } or X_{ ik } may not represent any interactions from the extended gene network when W_{ pj } = 0 or W_{ ik } = 0 respectively. We represent them by dotted rectangles in Figure 5(b).
Unary feature functions
F_{6}, F_{7}. We introduce two unary feature functions to incorporate our last hypothesis, that all genes are ER with a high probability. We consider two genes g_{ i } and g_{ j } such that g_{ i } → g_{ j }. This hypothesis holds true, if g_{ i } is equally regulated or g_{ j } is equally regulated.

Left internal equality. We define this feature function to capture the events when g_{ i }is equally regulated. As, g_{ j }can assume any state, this feature function holds true for eight different values of X_{ ij }. We denote the feature function by f_{6}(X_{ ij }, t) that returns one if its two arguments are equal and zero otherwise. We denote the summation of this functions over all these eight values of X_{ ij }as,
$${F}_{6}\left({X}_{ij}\right)=\sum _{i,j,{W}_{ij}=1,t\in \left\{1,\cdots ,4,\phantom{\rule{2.77695pt}{0ex}}13,\cdots \phantom{\rule{0.3em}{0ex}},16\right\}}{f}_{6}\left({X}_{ij},\phantom{\rule{2.77695pt}{0ex}}t\right).$$ 
Right internal equality. We define this feature function to capture the events when g_{ j }is equally regulated. As, g_{ i }can assume any state, this feature function holds true for eight different values of X_{ ij }. We denote the feature function by f_{7}(X_{ ij }, t) that returns one if its two arguments are equal and zero otherwise. We denote the summation of this functions over all these eight values of X_{ ij }as,
$${F}_{7}\left({X}_{ij}\right)=\sum _{i,j,{W}_{ij}=1,t\in \left\{1,4,5,8,9,12,13,16\right\}}{f}_{7}\left({X}_{ij},\phantom{\rule{2.77695pt}{0ex}}t\right).$$
The last two columns of Table 2 enumerate these two internal equalities.
Based on these feature functions, we define the joint density function of as,
In the above equation γ_{ k }, k ∈ {1, 2,... 7} are the coefficients of the seven feature functions in MRF.
In the next section, we discuss how we approximate the objective function of the MRF and the data. We also describe how we formulate the posterior probability density function for X_{ ij }.
Approximation of the objective function
A direct maximization of the objective function given by Equation 1 is intractable, as it requires evaluation of exponential number of terms in the denominator. We employ pseudolikelihood as an established substitute to Equation 1 [27]. Pseudolikelihood is the simple product of the conditional probability density function of the X_{ ij } variables. Geman et al. proved the consistency of the maximum pseudolikelihood estimate [28]. The approximated objective function can be written as,
The posterior density function F_{ ij } of X_{ ij } as,
Derivation of F_{ ij }.
In step 2 of the derivation, we substitute by Y_{ Ai }, Y_{ Bi }, Y_{ Aj } and Y_{ Bj } as X_{ ij } is independent of all Y_{ Ck } such that k ≠ {i, j} and C ≠ {A, B}. Also, in the 15th step we assume that X_{ ij } is independent of θ_{ Y } given $\mathcal{X}{X}_{ij}$ and θ_{ X }. □
Derivation of $p\left({X}_{ij}\mathcal{X}{X}_{ij},{\theta}_{X}\right)$, W_{ ij } = 1.
A (X_{ ij }) is $exp\left({\sum}_{k\in \left\{1,2,\cdots \phantom{\rule{0.3em}{0ex}},7\right\}}{\gamma}_{k}{F}_{k}\left({X}_{ij}\right)\right)$ and B(ij) is given by $exp\left({\sum}_{m,n,ij\ne mn,k\in \left\{1,2,\cdots \phantom{\rule{0.3em}{0ex}},7\right\}}{\gamma}_{k}{F}_{k}\left({X}_{mn}\right)\right)$. Here, we denote the prior density parameters {γ_{1}, γ_{2},...·γ_{7}} by θ_{ X }. Canceling B(ij) from numerator and denominator the density function simplifies to,
□
There are two different terms in objective function of Equation 4. $p\left({X}_{ij}\mathcal{X}{X}_{ij},{\theta}_{X}\right)$ stands for the conditional prior density function of X_{ ij } which we just have derived from using Bayes rule. In the next section, we discuss the likelihood function $p\left({Y}_{Ai},\phantom{\rule{2.77695pt}{0ex}}{Y}_{Bi},\phantom{\rule{2.77695pt}{0ex}}{Y}_{Aj},\phantom{\rule{2.77695pt}{0ex}}{Y}_{Bj}{X}_{ij},\phantom{\rule{2.77695pt}{0ex}}{X}_{ij}^{*},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{Y}\right)$.
Computation of likelihood density function
In this section, we describe how we derive the likelihood function in three numbered subsections. Here, we assume that gene expressions in a group follow a normal distribution, We can rewrite the derivations if gene expressions follow some other distribution.
1. Likelihood for a single gene
Consider a set of measurements for a gene g_{ i } that follows a single Gaussian distribution by z_{ i } = {z_{ i }_{1}, z_{ i }_{2},..., z_{ iN }}. We denote the latent mean of z_{ i } as µ and the standard deviation as σ. As different genes can have different average expressions, we assume that µ follows a genome wise distribution with mean µ_{0} and standard deviation τ [29]. Thus, for z_{ i }, the likelihood for the data points in that group is given by,
The derivation of Equation 5 can be obtained from Demichelis et al [30]. If a gene is DE, its expression measurements in control and noncontrol groups follow different distributions [29]. On the other hand, for equally expressed genes, all the measurements in both groups share the same mean. The likelihood function for a DE gene g_{ i } in group D_{ T } ,T ∈ {A, B} is given by,
Similarly, for EE genes it is given by,
For instance, the likelihood of a gene to be DE in group D_{ A } is given by ${\mathcal{L}}_{{A}_{{}_{DE}}}\left({g}_{i}\right)$.
2. Likelihood for a regulation variable
As for a gene g_{ i }, the regulation variable Z_{ i } can assume four different values from 1 to 4, the equations of the likelihood that a gene is DR or ER also take four different forms given by,
2. Likelihood for an interaction variable
We have 16 different forms for the likelihood of the X_{ ij } due to its 16 different values. However, here, we shall derive only for X_{ ij } = 1, as for the other values of X_{ ij } we have a similar derivation.
From the definition of ${X}_{ij},\phantom{\rule{2.77695pt}{0ex}}p\left({Z}_{i}={\tau}_{i},\phantom{\rule{2.77695pt}{0ex}}{Z}_{j}={\tau}_{i},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{Y}{X}_{ij}=1,\phantom{\rule{2.77695pt}{0ex}}{X}_{ij}^{*},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{Y}\right)$ equals to 1 when Z_{ i } = 1 and Z_{ j } = 1. Its value is zero for all other values of Z_{ i } and Z_{ j }. So, continuing from the last step of Equation 8,
In a similar way, we can derive the likelihood functions for all the 16 different values of X_{ ij } variables. A special case arises when g_{ i } is the metagene, i.e. g_{0}. We assume that ${\mathcal{L}}_{{\text{T}}_{{}_{\text{DE}}}}\left({g}_{0}\right)=1$ and ${\mathcal{L}}_{{\text{T}}_{\text{EE}}}\left({g}_{0}\right)=0,\phantom{\rule{2.77695pt}{0ex}}T\in \left\{A,\phantom{\rule{2.77695pt}{0ex}}B\right\}$. Thus, the likelihood of the metagene given Z_{0} = 1 equals to 1. Its value is zero otherwise.
Objective function optimization
So far, we have described how we compute the posterior density function. The final challenge is to find the values of the hidden variables that maximize the objective function (Equation 3). We develop an iterative algorithm to address this challenge.
In our model we have three different sets of parameters. The nodes of the MRF given by consist of one set. Other two sets are the parameters of conditional probability density function of X_{ ij } and likelihood function of observed data given by θ_{ X } = {γ_{1},... γ_{7}} and θ_{ Y } = {µ_{0}, σ, τ), respectively. In each iteration, we first estimate θ_{ X } and θ_{ Y } based on the estimated value of in the previous iteration. Next, based on the estimated parameters, we estimate that maximize the objective function in Equation 3.
The likelihood function is nonconvex in terms of the parameters θ_{ Y } = {µ_{0}, σ, τ). Also, the conditional density is nonconvex in terms of θ_{ X } = {γ_{1},... γ_{7}}. We use a global optimization method called differential evolution to optimize both of them [25]. To optimize the objective function in Equation 3, we employ the ICM algorithm described by Besag [19]. Briefly, our iterative algorithm works as follows.

1.
Obtain an initial estimate of variables. In our implementation we use student's ttest assuming the data follows normal distribution. We use 5% confidence interval for this purpose.

2.
Estimate parameters θ_{ Y } that maximizes the data likelihood function given by,
$$\text{arg}\underset{{\theta}_{Y}}{\text{max}}\prod _{{X}_{ij},{W}_{ij}=1}p\left({Y}_{Ai},\phantom{\rule{2.77695pt}{0ex}}{Y}_{Bi},\phantom{\rule{2.77695pt}{0ex}}{Y}_{Aj},\phantom{\rule{2.77695pt}{0ex}}{Y}_{Bj}{X}_{ij},\phantom{\rule{2.77695pt}{0ex}}{X}_{ij}^{*},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{Y}\right)$$We implement this step using Differential Evolution, which is similar to the genetic algorithm.

3.
Calculate an estimate of the parameters θ_{ X } that maximizes the conditional prior density function by,
$$\text{arg}\underset{{\theta}_{X}}{\text{max}}\prod _{{X}_{{}_{ij}},{W}_{{}_{ij}}=1}p\left({X}_{ij}\mathcal{X}\left\{{X}_{ij}\right\},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{X}\right)$$We also implement this step using Differential Evolution.

4.
Carry out a single cycle of ICM using the current estimate of , θ_{ X } and θ_{ Y }. For all S_{ i } , maximize ${\prod}_{{X}_{mn}}p\left({X}_{mn}\mathcal{X}{X}_{mn},\phantom{\rule{2.77695pt}{0ex}}\mathcal{Y},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{X},\phantom{\rule{2.77695pt}{0ex}}{\theta}_{Y}\right)$ when ${X}_{mn}\in \left\{{X}_{rt}r=i\phantom{\rule{2.77695pt}{0ex}}\text{or}\phantom{\rule{2.77695pt}{0ex}}t=i,\phantom{\rule{2.77695pt}{0ex}}{W}_{rt}=1\right\}$.

5.
Go to step 2 for a fixed number of cycles or until converges to a certain predefined value.
We optimize the objective function in terms of the S_{ i } (1 ≤ i ≤ M) variables instead of X_{ ij } variables. Specifically, in step 4, we go over all the S_{ i } variables, and optimize F_{ ij } function (given by Equation 4) for only those X_{ ij } variables that are impacted by the change of S_{ i }. Figure 2 illustrates different components of CMRF and the connectivity between them.
The optimization procedure is guaranteed to converge since in every iteration the value of the objective function increases. We continue the iterative process, until the changes in estimates of the parameters between two consecutive iterations reach below a certain cutoff level.
References
 1.
Cheng R, Zhao A, Alvord W, Powell D, Bare R, Masuda A, Takahashi T, Anderson L, Kasprzak K: Gene expression doseresponse changes in microarrays after exposure of human peripheral lung epithelial cells to nickel(II). Toxicol Appl Pharmacol. 2003, 191: 2239. 10.1016/S0041008X(03)00228X.
 2.
Ideker T, Thorsson V, Ranish J, Christmas R, Buhler J, Eng J, Bumgarner R, Goodlett D, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001, 292 (5518): 92934. 10.1126/science.292.5518.929.
 3.
Taylor K, PenaHernandez K, Davis J, Arthur G, Duff D, Shi H, Rahmatpanah F, Sjahputera O, Caldwell C: Largescale CpG methylation analysis identifies novel candidate genes and reveals methylation hotspots in acute lymphoblastic leukemia. Cancer Res. 2007, 67 (6): 261725. 10.1158/00085472.CAN063993.
 4.
Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109126. 10.1016/S00928674(00)000155.
 5.
Marton MJ, Derisi JL, Bennett HA, Iyer VR, Meyer MR, Roberts CJ, Stoughton R, Burchard J, Slade D, Dai H, Bassett DE, Hartwell LH, Brown PO, Friend SH: Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat Med. 1998, 4 (11): 12931301. 10.1038/3282.
 6.
Giaever G, Shoemaker DD, Jones TW, Liang H, Winzeler EA, Astromoff A, Davis RW: Genomic profiling of drug sensitivities via induced haploinsufficiency. Nature Genetics. 1999, 21 (3): 278283. 10.1038/6791.
 7.
Giaever G, Flaherty P, Kumm J, Proctor M, Nislow C, Jaramillo DF, Chu AM, Jordan MI, Arkin AP, Davis RW: Chemogenomic profiling: Identifying the functional interactions of small molecules in yeast. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101 (3): 793798. 10.1073/pnas.0307490100.
 8.
Lum P, Armour C, Stepaniants S, Cavet G, Wolf M, Butler J, Hinshaw J, Garnier P, Prestwich G, Leonardson A, GarrettEngele P, Rush C, Bard M, Schimmack G, Phillips J, Roberts C, Shoemaker D: Discovering modes of action for therapeutic compounds using a genomewide screen of yeast heterozygotes. Cell. 2004, 116: 12137. 10.1016/S00928674(03)010353.
 9.
Parsons AB, Brost RL, Ding H, Li Z, Zhang C, Sheikh B, Brown GW, Kane PM, Hughes TR, Boone C: Integration of chemicalgenetic and genetic interaction data links bioactive compounds to cellular target pathways. Nature Biotechnology. 2003, 22: 6269.
 10.
di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ: Chemogenomic profiling on a genomewide scale using reverseengineered gene networks. Nature Biotechnology. 2005, 23 (3): 377383. 10.1038/nbt1075.
 11.
Cosgrove EJ, Zhou Y, Gardner TS, Kolaczyk ED: Predicting gene targets of perturbations via networkbased filtering of mRNA expression compendia. Bioinformatics. 2008, 24 (21): 24822490. 10.1093/bioinformatics/btn476.
 12.
Bandyopadhyay N, Somaiya M, Kahveci T, Ranka S: Modeling perturbations using gene networks. Proc LSS Comput Syst Bioinform Conf. 2010, 2637.
 13.
Conesa A, Nueda MJ, Ferrer A, Talón M: maSigPro: a method to identify significantly differential expression profiles in timecourse microarray experiments. Bioinformatics. 2006, 22 (9): 10961102. 10.1093/bioinformatics/btl056.
 14.
Hong F, Li H: Functional Hierarchical Models for Identifying Genes with Different TimeCourse Expression Profiles. Biometrics. 2006, 62 (2): 534544. 10.1111/j.15410420.2005.00505.x.
 15.
Chuan Tai Y, Speed TP: On Gene Ranking Using Replicated Microarray Time Course Data. Biometrics. 2009, 65: 4051. 10.1111/j.15410420.2008.01057.x.
 16.
Angelini C, De Canditiis D, Pensky M: Bayesian models for twosample timecourse microarray experiments. Computational Statistics & Data Analysis. 2009, 53 (5): 15471565. 10.1016/j.csda.2008.07.015.
 17.
Van Deun K, Hoijtink H, Thorrez L, Van Lommel L, Schuit F, Van Mechelen I: Testing the hypothesis of tissue selectivity: the intersectionunion test and a Bayesian approach. Bioinformatics. 2009, 25 (19): 25882594. 10.1093/bioinformatics/btp439.
 18.
Li SZ: Markov Random Field Modeling in Image Analysis. 2009, Springer Publishing Company, Incorporated, 3
 19.
Besag J: On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society. 1986, 48 (3): 259302.
 20.
Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiationinduced changes in human gene expression. Nature. 2009, 459 (7246): 58791. 10.1038/nature07940.
 21.
Dausset J, Cann H, Cohen D, Lathrop M, Lalouel J, White R: Centre d'etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics. 1990, 6 (3): 5757. 10.1016/08887543(90)90491C.
 22.
Garg A, Mendoza L, Xenarios I, DeMicheli G: Modeling of multiple valued gene regulatory networks. Conf Proc IEEE Eng Med Biol Soc. 2007, 2007: 1398404.
 23.
Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28: 2730. 10.1093/nar/28.1.27.
 24.
Bishop CM: Pattern Recognition and Machine Learning (Information Science and Statistics). 2006, Secaucus, NJ, USA: SpringerVerlag New York, Inc
 25.
Storn R, Price K: Differential evolution — a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization. 1997, 11 (4): 341359. 10.1023/A:1008202821328.
 26.
Hammersley JM, Clifford P: Markov fields on finite graphs and lattices. Unpublished manuscript. 1968
 27.
Besag J: Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika. 1977, 64 (3): 616618. 10.1093/biomet/64.3.616.
 28.
Geman S, Graffigne C: Markov random field image models and their applications to computer vision. Proceedings of the International Congress of Mathematics: Berkley. 1986, 14961517.
 29.
Kendziorski C, Newton M, Lan H, Gould M: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat Med. 2003, 22 (24): 3899914. 10.1002/sim.1548.
 30.
Demichelis F, Magni P, Piergiorgi P, Rubin M, Bellazzi R: A hierarchical Naive Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays. BMC Bioinformatics. 2006, 7: 51410.1186/147121057514.
Acknowledgements
This work was supported partially by NSF under grants CCF0829867 and IIS0845439.
This article has been published as part of BMC Genomics Volume 13 Supplement 2, 2012: Selected articles from the First IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2011): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S2.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
NB conceived the study, analyzed the data, implemented the methods, supplied the analysis tools, designed the experiments, performed the experiments and wrote the paper. MS conceived the study and participated in writing the paper. SR conceived the study, designed the experiments and participated in writing the paper. TK conceived the study, designed the experiments and participated in writing the paper.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Published
DOI
Keywords
 Differential Evolution
 Differentially Express
 Differentially Express Gene
 Real Dataset
 Markov Random Field