Identifying regulational alterations in gene regulatory networks by state space representation of vector autoregressive models and variational annealing

Background In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.


Proof of Proposition in Main Manuscript
We give a proof of Proposition 1 in the main manuscript.

More Details on Procedures of Variational Annealing on Proposed Model
In the variational annealing on the proposed model, we calculate Q functions for hidden variables X, parameters Θ, and binary variables E iteratively while cooling temperature τ to zero gradually at each iteration cycle. Here, we show the calculation procedures of Q(X), Q(Θ), and Q(E) as variational E-step, variational M-step, and variational A-step, respectively under the complete likelihood of the proposed model:.
For the notational brevity, we denote the expectation of a value x with a probability distribution Q(y) as x Q(y) .

Variational E-step
The proposed model is considered as the state space model in terms of hidden variables In the state space model, system matrix is given by A • E (c) , and observation matrix is a p-dimensional identity matrix. Therefore, the parameters of Q(X) are mean of x t , variance of x t , and cross time variance of x t−1 and x t . These parameters can be calculated via variational Kalman filter by using following terms expected with . For the details of variational Kalman filter, see Chapter 5 of [1]. Let mean of x t , variance of x t , and cross time variance of x t−1 and x t be μ xt , Σ t , and Σ t,t−1 , respectively.
From the parameters of Q(X), expectations ofx with Q(X) required in other steps are calculated as follows: a vector given by (A i1 , . . . , A ip ) . From the design of the proposed model, Q(A i |h i ), Q(h i ), Q(r i ), and Q(z ij ) are given in the following form:

Variational M-step
Here, T A i is a matrix given by 2 c=1 and l i are given as follows: obs |, By using the parameters, we consider the following expectations required for the calculation of variational E-step:

Variational A-step
For the calculation of Q(E), we assume the factorization of ij ) in order to make the computation tractable. The likelihood with respect to E ij . For the preparation, we calculate ij ) is then iteratively calculated by using these expectations as well as the expectations E Without loss of generality, we consider the calculation of e (c) ij for c = 1. e (1) ij is given by By using the obtained Q(E), we consider the expectations E