 Proceedings
 Open Access
 Published:
Identifying regulational alterations in gene regulatory networks by state space representation of vector autoregressive models and variational annealing
BMC Genomicsvolume 13, Article number: S6 (2012)
Abstract
Background
In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded.
Methods
We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood.
Results
For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGFreceptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGFreceptors, but the effect would be counteracted due to the selective inhibition of EGFreceptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible offtargets of Gefitinib.
Conclusions
From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of offtarget genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Background
Gene network estimation from time series gene expression data is a key task for elucidating cellular systems. Thus far, wide variety of approaches have been proposed based on the vector autoregressive (VAR) model [1, 3], the state space model [4–6], and the dynamic Bayesian network [7, 8]. Recently, time series gene expression data on multiple conditions aiming at analyzing effects of cell treatment such as drug dosing and heat shock are available. We here assume that some gene regulations are disrupted but many of the gene regulations do not change due to some treatment of interest, and try to find a small number of changes on regulations as keys for elucidating effects by the treatment.
A possible way for finding changes on regulations is to estimate networks from two data sets separately and then compare their structures. However, due to the limited length of time series data (usually less than 10 time points) and unignorable measurement noise, networks are estimated with high error rates and the estimation errors cause the serious failure on identifying changes on regulations. Thus, approaches using two time series data in an efficient manner are strongly demanded. Also, widely used statistical methods such as the VAR model and dynamic Bayesian network assume equally spaced time points in time series data. However, observed time points on usually available time series data are not equally spaced [5, 6, 9], and approaches that can handle unequally spaced time series data in a theoretically correct way should be considered.
We propose a new statistical model that estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. As the basis of the proposed model, we employ the state space representation for VAR model (VARSSM), in which observation noise is considered between the measured or observed gene expressions and the true gene expressions in observation model and gene regulations between true gene expressions are considered in the system model [10]. The VARSSM can handle unequally spaced time series data by ignoring observation model on the nonobserved time points. For considering the changes on regulations, we introduce hidden variables to the VARSSM in order to indicate the presence of regulations in each condition. If hidden binary variables on two conditions indicating the presence of a regulation are both estimated as one, the regulation is considered as a commonly existing regulation. On the other hand, if only one of the hidden binary variables for the regulation is estimated as one, the regulation is considered as a condition specific regulation. We also introduce a potential function between the hidden binary variables that is designed to take high probability if the hidden binary variables on two conditions take the same value. From the design of the potential function, the similarity of networks on two conditions is automatically considered. Since the time series data on both conditions are used for estimating commonly existing regulation due to the use of the hidden binary variables, an efficient data assignment is achieved. In addition, from the more accurate estimation of commonly existing regulations by the efficient data assignment, accurate identification of changes on regulations is induced.
The hidden binary variables are estimated by searching the configuration of binary variables that maximizes the marginal likelihood of the model. However, searching the optimal configuration is computationally intractable. Thus, as an alternative approach, we derive a new variational annealing method based on [11] in order to estimate the hidden binary variables. We also give a proof for the effectiveness of the variational annealing compared to other candidate alternatives, the variational annealing and the EM algorithm, in order to show the validity of using the variational annealing.
For the performance evaluation, we generate two regulatory networks in such a way that most of the regulations commonly exist and some exist only on one of the networks. We then apply our proposed approach and existing var model based and dynamic Bayesian network based approaches to two equally spaced time series data drawn separately from the generated networks. From the comparisons of true positive rates and false positive rates of these approaches, we confirm the effectiveness of our approach. We also generate unequally spaced time series data from these networks, and show that our approach works correctly on unequally spaced time series data while the performance of the existing approaches assuming equally spaced time points is drastically worsened.
Our proposed approach is used to analyze changes on regulations in gene networks between normal Human lung cells and Human lung cells treated by stimulating EGFreceptors and dosing an anticancer drug termed Gefitinib. A lung cancer condition is simulated by the stimulation of EGFreceptors in the treated cells. Since Gefitinib is known as a selective inhibitor of EGFreceptors, the stimulation of EGFreceptors would be counteracted by Gefitinib, and hence the treated cells are expected to be the same condition as normal cells. However, gene expression profiles from normal and treated cells are actually different, and offtargets of Gefitinib causing unexpected positive or negative effects are implied. We focus on genes with changes on regulations between the networks estimated by our approach and find possible offtarget genes of Gefitinib. According to the published clinical information, one of the possible offtarget genes is suggested as one of factors of interstitial pneumonia, which is known as a side effect of Gefitinib.
Methods
Vector autoregressive model and its state space representation
Vector autoregressive model
Given gene expression profile vectors of p genes during T time points {y_{1}, ..., y_{ T }}, the first order vector autoregressive (VAR(1)) model at time point t is given by
where A is a p × p autoregressive coefficient matrix, and ε_{ t }is observation noise at time t and follows $\mathcal{N}\left(0,\mathsf{\text{diag}}\left[{\sigma}_{1}^{2},...,{\sigma}_{p}^{2}\right]\right)$, a normal distribution with mean 0 and variance $\mathsf{\text{diag}}\left[{\sigma}_{1}^{2},...,{\sigma}_{p}^{2}\right]$. The (i, j)th element of A, A_{ ij }, indicates a temporal regulation from the j th gene to the i th gene, and if A_{ ij }≠ 0, regulation from the j th gene to the i th gene is considered. By examining whether A_{ ij }is zero or not for all i and j, a gene network is constructed. Since equally space time points are assumed in the VAR model, it has difficulty on handling unequally spaced time series data.
State space representation of VAR model (VARSSM)
Let $\mathcal{T}$ be the set of equally spaced entire T time points and ${\mathcal{T}}_{obs}$ the set of time points where gene expressions are observed. Note that ${\mathcal{T}}_{obs}\subseteq \mathcal{T}$ holds. VARSSM is comprised of two models: system model and observation model. Let x_{ t }be hidden variable vector representing true gene expression at time t. The system model is given as the VAR model of x_{ t }:
where η_{ t }is the system noise normally distributed with mean 0 and variance H = diag[h_{1},...,h_{ p }]. The observation model represents measurement error of observed gene expression y_{ t }and true gene expression x_{ t }at observed time point $t\in {\mathcal{T}}_{obs}$:
where ρ_{ t }is the observation noise normally distributed with mean 0 and variance R = diag[r_{1},..., r_{ p }]. Unequally spaced time series data are handled by ignoring observation model at nonobserved time points.
Joint model of VARSSM for two time series data
Let ${\left\{{\mathit{y}}_{t}^{\left(c\right)}\right\}}_{t\in {\mathcal{T}}_{obs}^{\left(c\right)}}$ be time series gene expression data on cell condition c, where ${\mathcal{T}}_{obs}^{\left(c\right)}$ is the set of observed time points on cell condition c. We also let ${\mathcal{T}}^{\left(c\right)}$ be the set of time points from 1 to T^{(c)}, where ${T}^{\left(c\right)}=max\left\{t\in \underset{obs}{\overset{\left(c\right)}{\mathcal{T}}}\right\}$. Given time series data on two types of cell conditions c = 1 and 2, we propose a new VARSSM model to estimate gene networks in the two conditions as well as identify changes on regulations between them. The model is comprised of the following two equations:
where ∘ denotes the Hadamard product, E^{(c)}is a p × p binary matrix, and ${\mathit{\eta}}_{t}^{\left(c\right)}$ and ${\mathit{\rho}}_{t}^{\left(c\right)}$ are respectively system and observation noises from $\mathcal{N}\left(0,H\right)$ and $\mathcal{N}\left(0,R\right)$. In this model, the (i, j)th element ${E}_{ij}^{\left(c\right)}$ takes one if regulation from gene j to gene i exists on condition c and zero otherwise, i.e., the presence of regulations is controlled by E^{(c)}and the AR coefficient matrix A is commonly used in conditions 1 and 2. Changes on regulations are identified when regulations exist only in a condition.
The complete likelihood of our model, P(Y, X, Θ, E), where Y, X, Θ, and E are respectively the sets of ${\mathit{y}}_{t}^{\left(c\right)},{\mathit{x}}_{t}^{\left(c\right)}$, parameters, and E^{(c)}, is given by the following equation:
where the prior distribution P(Θ, E) is assumed to be factorized as
Here, z_{ ij }is a parameter for a potential function of ${E}_{ij}^{\left(1\right)}$ and ${E}_{ij}^{\left(2\right)}$ defined later. The prior distributions of A_{ ij }is given by
where α_{0} and α_{1} are parameters controlling the shrinkage of coefficients A and F_{ ij }is a binary variable that takes 1 if ${E}_{ij}^{\left(1\right)}$ or ${E}_{ij}^{\left(2\right)}$ takes 1 and 0 otherwise, i.e., F_{ ij }is given by $1{\prod}_{c=1}^{2}\left(1{E}_{ij}^{c}\right)$. α_{1} is set to a large value, while α_{0} is set to smaller than α_{1}. From the design of the prior for A_{ ij }, if ${E}_{ij}^{\left(1\right)}$ or ${E}_{ij}^{\left(2\right)}$ takes one, i.e., there exists regulation from gene j to i in condition 1 or 2, weaker prior $\mathcal{N}\left({A}_{ij};0,{h}_{i}\cdot {\alpha}_{1}\right)$ is selected and the shrinkage of the coefficients are avoided. Otherwise stronger prior $\mathcal{N}\left({A}_{ij};0,{h}_{i}\cdot {\alpha}_{0}\right)$ is selected and the sparsity of the network structures is promoted. The prior distributions of h_{ i }, and r_{ i }are given by
where $\mathcal{I}\mathcal{G}$ represents the density function of inverse gamma distribution, u_{0} and v_{0} are the shape parameters, and k_{0} and l_{0} are the inverse scaling parameters. Under the assumption that a small number of regulations change between two conditions, we design the prior distribution for ${E}_{ij}^{\left(1\right)}$ and ${E}_{ij}^{\left(2\right)},P\left({E}_{ij}^{\left(1\right)},{E}_{ij}^{\left(2\right)},{z}_{ij}\right)$ by using the following potential function between ${E}_{ij}^{\left(c\right)}$ for c = 1 or 2:
In this setting, if z_{ ij }is small, ${E}_{ij}^{\left(1\right)}$ and ${E}_{ij}^{\left(2\right)}$ tend to take the same value and thus most of the regulations exist in both two conditions. We also introduce a prior distribution for z_{ ij }by beta distribution with parameters ζ_{i 0}and ζ_{i 1}:
where B(·) is the beta function. Thus, the prior distribution of ${E}_{ij}^{\left(c\right)}$ is given by
Figure 1 shows a graphical representation of the proposed model, where dependency of the parameters and variables are indicated. The hyperparameters are omitted and the observed data y_{ t }is represented by gray nodes. In the observed data ${y}_{t}^{\left(1\right)}$ and ${y}_{t}^{\left(2\right)}$ are propagated mainly via hidden variables ${E}_{ij}^{\left(1\right)}$ and ${E}_{ij}^{\left(2\right)}$. Due to the data propagation, more accurate estimation is expected in the proposed model than the approaches considering data on two conditions independently.
For the parameter estimation, we search the configuration of E maximizing the following marginal likelihood:
Finding the optimal configuration of E is computationally intractable, and heuristics approaches such as the EM algorithm and the variational method are used in practice. Here, we use the variational annealing, an extension of the deterministic annealing for discrete variables [11]. In the next section, we give a small explanation of the variational annealing and show its effectiveness compared to the EM algorithm and the variational method.
Parameter estimation by variational annealing
In the deterministic annealing, optimization problem is solved while gradually changing temperature in a some schedule, and maximum likelihood estimator is obtained like the EM algorithm [12–14]. Yoshida and West proposed to use the deterministic annealing to find the configuration of the binary variables that maximizes the likelihood of factor models with sparseness priors [11]. We derive a new variational annealing method by extending Yoshida and West's approach to find the configuration of the binary variables on marginal likelihood function, which can be applied for searching E that maximizes Equation (1).
Let E, X, and Θ be p dimensional binary variables, unobserved variables, and parameters, respectively, and consider to search E maximizing the following marginal likelihood:
The maximum of the marginal likelihood on E is bounded by the following formula:
where τ is called temperature and the equality holds for τ → +0. Hereafter, the integral range of E is omitted if no confusion occurs. Let Q(E) be a normalized nonnegative function, i.e., Q(E) ≥ 0 and ∫ Q(E)dE = 1.
From the Gibbs inequality, the right side of Equation (3) is also bounded:
Here, Q(E) is considered as an approximation function of $\frac{P{\left(E\right)}^{1\u2215\tau}}{\int P{\left({E}^{\prime}\right)}^{1\u2215\tau}d{E}^{\prime}}$. Under the assumption that P(X,Θ,E) ∝ Q(X)Q(Θ), where Q(X) and Q(Θ) are normalized nonnegative functions, we have the following inequality
Thus, as an approximation of E maximizing Equation (2), we try to find $\xca=arg\underset{E}{max}Q\left(E\right)$, where Q(E) is the function maximizing the lower bound in Equation (4). Since higher values on P(E) are weighed more in the approximation by Q(E) for τ < 1, the better approximation is expected for the higher values. This property on the limited case is shown in Proposition 1. As is in the variational method and the EM algorithm, the maximum of the lower bound in Equation (4) is searched by a hill climbing from high temperature τ > 1.
In the hill climbing, Q(X), Q(Θ), and Q(E) are alternately updated from the following equations:
Gradually converging τ to 0, local optimum of the lower bound and corresponding Q(E), Q(X), and Q(Θ) are obtained.
Effectiveness of variational annealing
As alternatives of the variational annealing, we may consider the variational method and the EM algorithm where X is the set of hidden variables and E is handled as the set of parameters to be maximized. We show the effectiveness of variational annealing compared to the variational method and the EM algorithm under the following conditions: P(X, Θ, E) is factorized into P(X, E)P(Θ, E), and P(E_{ i }X, Θ, E\{E_{ i }}) is given as a binomial distribution, where E_{ i }is the i th element of E. If factorization of P(X, Θ, E) = Q(X)Q(Θ)Q(E) is assumed in the calculation of the variational method, arg max_{ E }Q(E) is not the optimal solution of Equation (2) in general. In the EM algorithm, by allowing E to move around a pdimensional continuous space ${\left(0,1\right)}^{p},{\xca}_{\mathsf{\text{EM}}}=arg\underset{E\in {\left(0,1\right)}^{m}}{max}P\left(E\right)$ can be calculated. Let ${\xca}_{\mathsf{\text{EM}},i}$ be the i th element of ${\xca}_{\mathsf{\text{EM}}}$. Usually, ${\xca}_{\mathsf{\text{EM}}}\mathsf{\text{,}}i$ is mapped to 1 if ${\xca}_{\mathsf{\text{EM,}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{i}}}>0.5$ and 0 otherwise for discretizing ${\xca}_{\mathsf{\text{EM}}}$ to the pdimensional binary space {0,1}^{p}, but such a mapping is not guaranteed to provide $arg\underset{E\in {\left\{0,1\right\}}^{m}}{max}P\left(E\right)$. Although other mappings can be considered, to the best of our knowledge, no mapping is guaranteed to provide the optimal solution in polynomial time of p.
In the following, we prove a proposition in order to show that the variational annealing possibly give the optimal solution of Equation (2) even if the factorization of $Q\left(E\right)={\prod}_{i=1}^{p}Q\left({E}_{i}\right)$ is additionally considered.
Proposition 1. P(X,Θ,E) is factorized into P(Θ,E)P(X,E), and P(E_{ i }X,Θ,E\{E_{ i }}) is given as a binomial distribution. Let $\widehat{Q}\left({E}_{i}\right)$be Q(E_{ i }) maximizing the lower bound of the variational annealing for τ → +0 given by
Then, the set of E_{ i }∈ {0,1} maximizing ${\xca}_{\mathsf{\text{EM}}}$is $arg\underset{E\in {\left\{0,1\right\}}^{p}}{max}P\left(E\right)$.
For the proof of the proposition, see Section 1 in Additional file 1. From Proposition 1, if the factorization P(X, Θ, E) = P(Θ, E)P(X, E) is satisfied and optimal Q functions are found, the variational annealing is guaranteed to provide the optimal solution of Equation (2) while the variational method and the EM algorithm are not. Although the factorization is not a generally satisfied property, the factorization is often assumed in approaches based on the variational method, and the assumption usually works as good approximations. Thus, the variational annealing is expected to provide the better performance than the variational method and the EM algorithm even if the factorization is not satisfied exactly.
Procedures of variational annealing on proposed model
In the variational annealing on the proposed model, we calculate Q functions for hidden variables X, parameters Θ, and binary variables E iteratively while cooling temperature τ to zero gradually at each iteration cycle. In the following, we show the calculation procedures of Q(X), Q(Θ), and Q(E) on the proposed model as variational Estep, variational Mstep, and variational Astep, respectively. More details of the procedures are given in Additional file 1. For the notational brevity, we denote the expectation of a value x with a probability distribution Q(y) as 〈x〉_{Q(y)}.
Variational Estep
Parameters of Q(X) are mean of x_{ t }, variance of x_{ t }, and cross time variance of x_{t1}and x_{ t }. These parameters can be calculated via variational Kalman filter by using following terms expected with Q(Θ)Q(E): 〈E^{(c)}〉_{Q(Θ)Q(E)}, 〈A〉_{Q(Θ)Q(E)}, 〈H^{1} A ○ E^{(c)}〉_{Q(Θ)Q(E)}, and 〈(A ○ E^{(c)})'H^{1}A ○ E^{(c)}〉_{Q(Θ)Q(E)}. For the details of variational Kalman filter, see [4]. From the parameters of Q(X), expectations of the following terms with Q(X) required in other steps are calculated: ${\u3008{\mathit{x}}_{t}^{\left(c\right)}\u3009}_{Q\left(X\right)},{\u3008{\mathit{x}}_{t}^{\left(c\right)}{\left({\mathit{x}}_{t}^{\left(c\right)}\right)}^{\prime}\u3009}_{Q\left(X\right)}$, and ${\u3008{\mathit{x}}_{t+1}^{\left(c\right)}{\left({\mathit{x}}_{t}^{\left(c\right)}\right)}^{\prime}\u3009}_{Q\left(X\right)}$.
Variational Mstep
Q(Θ) is factorized into ∏_{ i }Q(A_{ i }h_{ i }) Q(h_{ i })Q(r_{ i }) ∏_{ j }Q(z_{ ij }), where A_{ i }is a vector given by (A_{ i }_{1}, ..., A_{ ip })'. From the design of the proposed model, Q(A_{ i }h_{ i }), Q(h_{ i }), Q(r_{ i }), and Q(z_{ ij }) are given in the following form:
Parameters for the above functions are calculated by using ${\u3008{\mathit{x}}_{t}^{\left(c\right)}\u3009}_{Q\left(X\right)},{\u3008{\mathit{x}}_{t}^{\left(c\right)}{\left({x}_{t}^{\left(c\right)}\right)}^{\prime}\u3009}_{Q\left(X\right)},{\u3008{\mathit{x}}_{t+1}^{\left(c\right)}{\left({\mathit{x}}_{t}^{\left(c\right)}\right)}^{\prime}\u3009}_{Q\left(X\right)},$, and ${\u3008{E}_{ij}^{\left(c\right)}\u3009}_{Q\left(E\right)}$.
Variational Astep
For the calculation of Q(E), we assume the factorization of Q(E) to ${\prod}_{c}{\prod}_{ij}Q\left({E}_{ij}^{\left(c\right)}\right)$ in order to make the computation tractable. $Q\left({E}_{ij}^{\left(c\right)}\right)$ follows a binomial distribution that takes one with probability ${e}_{ij}^{\left(c\right)}$ and zero with probability $1{e}_{ij}^{\left(c\right)}$, and thus the expectation of ${E}_{ij}^{\left(c\right)}$ with Q(E) is given by ${e}_{ij}^{\left(c\right)}$. For the preparation, we calculate ${\u3008A\u3009}_{Q\left(\mathrm{\Theta}\right)},{\u3008{H}^{1}A\u3009}_{Q\left(\mathrm{\Theta}\right)},{\u3008{A}^{\prime}{H}^{1}A\u3009}_{Q\left(\mathrm{\Theta}\right)},{\u3008{\mathit{x}}_{t}^{\left(c\right)}\u3009}_{Q\left(X\right)},{\u3008{\mathit{x}}_{t}^{\left(c\right)}{\left({\mathit{x}}_{t}^{\left(c\right)}\right)}^{\prime}\u3009}_{Q\left(X\right)}$, and ${\u3008{\mathit{x}}_{t+1}^{\left(c\right)}{\left({\mathit{x}}_{t}^{\left(c\right)}\right)}^{\prime}\u3009}_{Q\left(X\right)}$. $Q\left({E}_{ij}^{\left(c\right)}\right)$ is then iteratively calculated by using these expected terms as well as ${\u3008{E}_{ik}^{\left(c\right)}\u3009}_{Q\left(E\right)}$ for k ≠ j. A few iterations are enough for the convergence.
Update and selection of hyperparameters
The proposed model contains u_{0}, k_{0}, v_{0}, l_{0}, ζ_{i 0}, ζ_{i 1}, α_{0}, and α_{1} as hyperparameters. α_{0} and α_{1} should be α_{0} <α_{1} as in the model setting, but this condition can be violated in the update step of the variational method. Thus, we select α_{0} and α_{1} by cross validation, and update other hyperparameters as in the variational method.
We first consider update of hyperparameters u_{0}, k_{0}, v_{0}, l_{0}, ζ_{i 0}, and ζ_{i 1}to increase the lower bound of marginal probability. u_{0} and k_{0} are updated by maximizing the following equation:
${\xfb}_{0}$ and ${\widehat{k}}_{0}$ are obtained by numerical optimization methods such as the NewtonRaphson method. v_{0} and l_{0} are also updated in a similar manner to u_{0} and k_{0}. ζ_{i 0}and ζ_{i 1}are updated by solving the following equation:
${\widehat{\zeta}}_{i0}$ and ${\widehat{\zeta}}_{i1}$ can also be obtained by the NewtonRaphson method.
For the selection of α_{0} and α_{1}, we set α_{1} to some large value and select α_{0} by a leave one out cross validation procedure. For condition c ∈ {1, 2} and time point $t\in {\mathcal{T}}_{obs}^{\left(c\right)}$, we remove ${\mathit{y}}_{t}^{\left(c\right)}$ from data set y and use the data set to train the model. We calculate square sum of residues ${r}_{t,c}^{2}$ between ${\mathit{y}}_{t}^{\left(c\right)}$ and the prediction of ${\mathit{x}}_{t}^{\left(c\right)}$ estimated from the variational Kalman filter on the trained model given by ${r}_{t,c}^{2}={\left({\mathit{y}}_{t}^{\left(c\right)}{\mathit{x}}_{t}^{\left(c\right)}\right)}^{\prime}\left({\mathit{y}}_{t}^{\left(c\right)}{\mathit{x}}_{t}^{\left(c\right)}\right)$. By grid search on parameter space of α_{0}, we select α_{0} that minimizes ${\sum}_{c=1}^{2}{\sum}_{t\in {\mathcal{T}}_{obs}^{\left(c\right)}}{r}_{t,c}^{2}$.
Summary of procedures
The procedures for estimating parameters in the proposed model are summarized as follows:

1.
Set τ to some large value. Also set α_{0} to a small value and α_{1} to a large value satisfying that α_{0} <α_{1}.

2.
Initialize other hyperparameters and hidden variables.

3.
Perform the following procedures:

(a)
Calculate variational Mstep.

(b)
Update hyperparameters.

(c)
Calculate variational Estep.

(d)
Calculate variational Astep.

(e)
Go back to step (a) until some convergence criterion is satisfied.

(a)

4.
Divide τ by some value > 1 such as 1.05.

5.
Go back to step 3 if τ is larger than some very small value > 0.
In our setting, α_{1} is set to 1,000. For the initialization of τ and other hyperparameters, we use the following settings: $\tau =2.5,{E}_{ij}^{\left(c\right)}=0.5,{u}_{i}=1,{k}_{i}=1,{v}_{i}=1,{l}_{i}=1,{\zeta}_{i0}=10,\text{and}{\zeta}_{i1}=10$. If ${\mathit{y}}_{t}^{\left(c\right)}$ is observed, we initialize ${\mathit{x}}_{t}^{\left(c\right)}$ with ${\mathit{y}}_{t}^{\left(c\right)}$. Otherwise, we use the linearly interpolated one.
Results and discussion
Performance evaluation by Monte Carlo experiments
For the evaluation of the proposed approach, we generate two linear regulatory network models with similar topological structures G_{1} and G_{2} based on a linear regulatory network model G_{0}. G_{0} is prepared in the following manner: (i) a scale free network of 100 nodes and 150 edges is generated; (ii) edge directions are assigned randomly; (iii) autoloop edges are added to root nodes of the directed network; and (iv) AR coefficients for the directed edges are chosen randomly from {0.9, 0.8, 0.7, 0.6, 0.5, 0.5, 0.6, 0.7, 0.8, 0.9}. We then generate G_{1} and G_{2} from G_{0} as follows: (i) autoloop edges and 70% of nonautoloop edges in G_{0} are used for commonly existing edges in G_{1} and G_{2}; and (ii) the other 30% of nonautoloop edges are randomly assigned as either G_{1} or G_{2} specific edges. Note that AR coefficients on edges of G_{1} and G_{2} are preserved, i.e., if a regulation from gene j to gene i exists in G_{1} or G_{2}, then its coefficient is the same as that of the regulation from gene j to gene i in G_{0}.
Figure 2 gives graph structure of G_{1} and G_{2}, where commonly regulations, G_{1} specific regulations, and G_{2} specific regulations are represented with black, red, and green arrows, respectively.
From each of G_{1} and G_{2}, we obtain two equally spaced time series data of 25 time points and 50 time points. For system noise and observation noise, normally distributed values with mean 0 and standard deviation 1 and mean 0 and standard deviation 0.1 and 1 are used, respectively. The signalnoise ratio for system noise with standard deviation 1 and observation noise with standard deviation 0.1 is 0.03dB, and the signal is bit stronger than the noise. On the other hand, for observation noise with standard deviation 1, the signalnoise ratio is 0.26dB. In the condition, the noise is stronger than the signal, and the noise level is quite high.
Comparison between variational annealing and EM algorithm
We first compare the performances of the proposed approach and the approach that is based on the proposed approach but uses the EM algorithm instead of the variation annealing using the equally spaced time series data of 50 and 25 time points on the system noise with standard deviation 1 and observation noise with standard deviation 0.1. From the comparison, we verify the effectiveness of the variational annealing, compared to the EM algorithm. Table 1 summarizes the results of the proposed approach and the EM algorithm based approach. For the EM algorithm, the regulation from j to i on condition c is considered to exist if the estimated ${E}_{ij}^{\left(c\right)}$ is more than 0.5.
From the comparison, the results of the proposed approach contain more true positives than those of the EM algorithm based approach except for identifying changes on regulations for time points 50. For identifying changes on regulations, the EM algorithm based approach estimates bit more true positives than the proposed approach, but the difference is so small that it can be ignored. On the other hand, the results of the EM algorithm based approach contain more false positives than those of the proposed approach, and hence the precision of the results by the EM algorithm is worse than that of the proposed approach. Therefore, the effectiveness of the variational annealing is confirmed in the computational experiment as well.
Comparison between proposed approach and existing approaches
We employ the elastic net based VAR model approach [2] and the dynamic Bayesian network based approach termed G1DBN [8] as existing approaches for estimating networks from time series data. For the experiments, two versions of these approaches are considered: ENet1 and ENet2 from the elastic net based VAR model approach, and G1DBN1 and G1DBN2 from G1DBN. These approaches are different in the following point: ENet1 and G1DBN1 estimate G_{1} and G_{2} independently, i.e., for the estimation of G_{1}, only time series data from G_{1} is used, while ENet2 and G1DBN2 assume that G_{1} and G_{2} have the same network structure and estimate a network by using two time series data. Thus, ENet2 and G1DBN2 are considered to more use data sample than ENet1 and G1DBN1 for network estimation, but changes on regulations between G_{1} and G_{2} are not considered. For selection of hyperparameters in ENet1 and ENet2, AICc is used [2], and for hyperparameters α_{1} and α_{2} of G1DBN1 and G1DBN2, a setting of α_{1} = 0.1 and α_{2} = 0.0059 considered in [8] is used.
For the comparison of these approaches, we focus on the following two points: the number of correctly estimated regulations and the number of correctly estimated changes on regulations. The former is usually considered for evaluating the performance of gene network estimation methods. The numbers of true positives and false negatives of the estimated regulations are summarized in Table 2(a). The precisions of the results given by
are also provided. The results are averaged on ten data sets. The number of regulations in the true network models of G_{1} and G_{2} are in total 305. For the latter point, we consider the estimated regulations existing only in one of two estimated networks as changed regulations, and check if they correctly exist only in the corresponding true network. The numbers of true positives and false negatives of the estimated changes on regulations and the precisions are summarized in Table 2(b). The results are also averaged on ten data sets. The number of true changes on regulations between in G_{1} and G_{2} are 47, i.e., the number of true positives on this case is at most 47.
For the estimation of the regulations in Table 2(a), the proposed approach outperforms other approaches in terms of true positives. The proposed approach contains more false positives than ENet1, G1DBN1, and G1DBN2 on the data of 25 time points. We further consider these cases in terms of the precision. The precisions of the proposed approach, ENet1, G1DBN1, and G1DBN2 are given by 0.77, 0.62, 0.68, and 0.74, respectively. From this analysis, the proposed approach shows better performance than ENet2, G1DBN1, and G1DBN2 on the whole. More true positives are estimated by ENet2 than ENet1, and the precisions of ENet2 tend to be better than those of ENet1. This type of relationship is also observed between G1DBN1 and G1DBN2. Also, the elastic net based VAR model approaches estimate more true positives than the approaches from G1DBN. However, in the precision, the approaches from G1DBN are better than the elastic net based VAR model approaches. From the results in Table 2(b), we see that the proposed approach estimates more true changes on regulations than ENet1 and G1DBN1 on both data of 25 and 50 time points. In addition, the results of the proposed approach contain less false positives than those of ENet1 and G1DBN1. No changes on regulations are detected in ENet2 and G1DBN2 as topological differences are ignored in these approaches. Since the proposed approach considers differences on network structures as well as uses two time series data efficiently, it can provide better results than other approaches on both estimating regulations and identifying changes on regulations.
One may think it is strange that false positives in ENet1 and G1DBN1 in Table 2(b) is more than those in Table 2(a), but this case can occur from the following reason. If a regulation exists in both of the true network models of G_{1} and G_{2}, but is estimated only for G_{1}, then the case is not counted as a false positive in Table 2(a) while it is counted as a false positive in Table 2(b). Thus, the number of false positives in Table 2(b) can be greater than those in Table 2(a).
In order to show the performance in unequally spaced time series data, we generate unequally spaced time series data of 25 and 50 observed time points. For time series data of 25 observed time points, we first generate equally spaced time series data of 40 time points and divide it into three blocks: 15 time points, 10 time points, and 15 time points. We then remove time points in the following manner: no time point is removed in the first block; one of every two time points are removed in the second block; and two of every three time points are removed in the third block. Figure 3 shows the time point schedule of time series data obtained in this process. For time series data of 50 observed time points, we first generate equally space time series data of 80 time points, divide it into three blocks: 30 time points, 20 time points, and 30 time points. Then, some time points are removed in a similar manner. We apply the proposed approach, ENet1, ENet2, G1DBN1, and G1DBN2 to the unequally spaced time series data. Results for the dataset are also summarized in Tables 2(a) and 2(b). From the comparison of results on equally and spaced time series data, the results of the proposed approach from unequally spaced time series data are worse than those from equally spaced one even with the same number of observed time points. However, results of ENet1, ENet2, G1DBN1, and G1DBN2 are worsened more than those of the proposed approach. This is probably because unequally spaced time points break their assumption, and their estimation process is misled.
We also consider the time series data with the high level noise: system noise with standard deviation 1 and observation noise with standard deviation 1. The results for the case are summarized in Tables 3(a) and 3(b). The proposed approach shows the better performance on the number of true positives and precisions than other approaches except for the identification of the changes on regulations from the equally spaced time series data of 50 time points. For equally spaced time series data of 50 time points, the number of true positives on the changes on regulations estimated by the proposed approach is more than that of G1DBN1, but less than that of ENet1. However, the precisions on the estimated changes by both ENet1 and G1DBN are much worse than the proposed approach. Thus, overall, the proposed approach is more effective than other methods. Although the proposed approach provides the better performance than other methods, the results of all the approaches are worsened due to the high level noise, and the differences on the performance among the approaches get smaller, compared to the case of observation noise with standard deviation 0.1.
Analysis of time series microarray data from Human small airway epithelial cells
We apply the proposed approach to two time series microarray gene expression data from normal Human small airway epithelial cells (SAECs) and SAECs treated by stimulating EGFreceptors and dosing an anticancer drug termed Gefitinib. EGFreceptors are often overexpressed in lung cancer cells such as tumoral SAECs, and a lung cancer condition is simulated in the treated SAECs by stimulating EGFreceptors. Since Gefitinib is known as a selective inhibiter of EGFreceptors, the stimulation of EGFreceptors would be counteracted by Gefitinib, and the condition of treated SACEs should be the same as that of normal SAECs in theory. However, since some gene expression patterns are different between the two conditions in practice, some unknown effects by Gefitinib may be involved in the phenomenon. Thus, we focus on changed regulations between the gene networks estimated from gene expression data in these two conditions in order to find some insights on the unknown effects of Gefitinib.
For gene set selection, we first screen 500 genes from the ranking of the gene list sorted by coefficient variation [5]. We then select 100 genes with highly varied expression profiles between normal SAECs and treated SAECs. The time series gene expression data from both types of cells are comprised of spaced 14 time points in 48 hours. The time schedule of 14 time points are {0 h, 6 h, 9 h, 12 h, 15 h, 18 h, 21 h, 24 h, 27 h, 30 h, 33 h, 36 h, 39 h, 48 h}. For the analysis on the proposed approach, we set interval on system model to three hours.
The estimated networks from time series gene expression data in normal SAECs and treated SAECs are summarized and given in Figure 4. Black arrows, red arrows, and green arrows indicate regulations in both conditions, only in normal SAECs, and only in treated SAECs, respectively. Table 4 gives a list of the estimated regulations only in normal SAECs or in treated SAEC.
From Table 4, we see that Prss22 is involved in most of the regulations only in treated SAECs. Prss22 is a tryptase, one of serine proteases, and its relationship with the airways is suggested by a report about its expression in the airways in a developmentally regulated manner. Tryptase is a potent mitogen of fibroblast [15], and it is reported that the increase and activation of fibroblast are promoted in lung cells under the condition of interstitial pneumonia. Interstitial pneumonia is known as a side effect of Gefitinib, and these findings suggest that Prss22 is an offtarget of Gefitinib and is possibly related to the side effect of Gefitinib. Also, FILIP1L, a gene estimated as one of targets of Prss22 in treated SAECs, is reported as an inhibitor of cell proliferation of fibroblast [16]. This observation supports the relation between Prss22 and fibroblast as well.
We also focus on other several genes related to changes on regulations in normal and treated SAECs. LIF, leukemia inhibitory factor, is known to affect cell growth and development. Gefitinib is also known to be effective for acute myelogenous leukemia via Sky, which is an offtarget gene of Gefitinib [17]. In addition, Wang et al. reported that LIF prolongs the cell cycle of stem cells on acute myelogenous leukemia lines [18]. Although, to the best of our knowledge, no direct influence from Gefitinib to LIF is reported, the above facts suggest some relation between Gefitinib and LIF, and support changes on regulations of LIF between normal and treated SAECs.
Heikema et al. reported that Human Siglecs, Siglecs14, 15, and 16 interact with transmembrain adaptor proteins containing the immunoreceptor tyrosinebased activation motif such as DAP12 [19], and therefore they potentially mediate the activation of intracellular signaling. Gefitinib is a selective inhibitor of tyrosine kinase, and the inhibition of tyrosine kinase is considered to affect the regulations around Siglec15.
Although the stimulation of EGFreceptors in the treated SAECs is considered to be counteracted by Gefitinib, the expressions of some genes may be affected by the stimulation in practical conditions. HAS3 is related to synthesis of the unbranched glycosaminoglycan hyaluronic acid and is reported to be upregulated by EGF [20]. The stimulation of EGFreceptors affects the amount of EGF taken into cells, and hence the stimulation is expected to cause the changes on the regulations around HAS3. Foxn2 is a member of family of Fox proteins. Fox proteins are known to play important roles on controlling the expressions of genes related to cell growth, proliferation, and differentiation. Some members of Fox family are related to EFGreceptors, e.g., the expression of Foxn1 is suppressed by EGFreceptor signaling [21] although no direct relation between EGFreceptors and Foxn2 is found. FOXA2 is also a member of family of Fox proteins. EGFreceptor signaling is known to decrease the expression of FOXA, which prevents the mucus production [22]. [23] reported the relation between EGFreceptors and FOXA2 in the airways of asthmatic patients. Thus, it appears that the change on the regulation related to FOXA2 is caused by the stimulation of EGFreceptors.
Conclusions
We proposed the new computational model that is based on VARSSM and estimates gene networks from time series data on normal and treated conditions as well as identifies changes regulations by the treatment. Unlike many of existing gene network estimation approaches assuming equally spaced time points, our approach can handle unequally spaced time series data. The efficient use of time series data is achieved by representing the presence of regulations on each condition with hidden binary variables. Since finding the optimal configuration of the hidden binary variables on the proposed model is computationally in tractable, we derive the extended variational annealing method in order to address the problem as the alternative method.
In the Monte Carlo experiments, we use equally and spaced time series data from synthetically generated two regulatory networks whose structures are different in several regulations, and verified the effectiveness of the proposed model in both estimation of regulations and changes on regulations between the two conditions, compared to existing methods.
As the real data application, we use the proposed approach to analyze two time series data from normal SAECs and SAECs treated by stimulating EGFreceptors and dosing Gefitinib. From genes related to changes on regulations by the treatment, we find possible offtarget genes of Gefitinib, and one of these genes is suggested to be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib. In this study, we consider changes on regulations in two conditions, but the proposed approach can be extended to identifying changes among more than two conditions.
References
 1.
OpgenRhein R, Strimmer K: From correlation to causation networks: a simple approximate learning algorithm and its application to highdimensional plant gene expression data. BMC Syst Biol. 2007, 1: 3710.1186/17520509137.
 2.
Shimamura T, Imoto S, Yamaguchi R, Fujita A, Nagasaki M, Miyano S: Recursive regularization for inferring gene networks from timecourse gene expression profiles. BMC Syst Biol. 2009, 3: 4110.1186/17520509341.
 3.
Fujita A, Sato JR, Kojima K, Gomes LR, Nagasaki M, Sogayar MC, Miyano S: Identification of Granger causality between gene sets. Journal of Bioinformatics and Computational Biology. 2010, 8: 679701. 10.1142/S0219720010004860.
 4.
Beal MJ: Variational algorithms for approximate Bayesian inference. PhD thesis. 2003, University College London, Gatsby Computational Neuroscience Unit
 5.
Yamaguchi R, Imoto S, Yamauchi M, Nagasaki M, Shimamura T, Hatanaka Y, Ueno K, Higuchi T, Gotoh N, Miyano S: Predicting differences in gene regulatory systems by state space models. Genome Inform. 2008, 21: 101113.
 6.
Shiraishi Y, Kimura S, Okada M: Inferring clusterbased networks from differently stimulated multiple timecourse gene expression data. Bioinformatics. 2010, 26 (8): 10731081. 10.1093/bioinformatics/btq094.
 7.
Perrin BE, Ralaivola L, Mazurie A, Bottani S, Mallet J, d'Alché Buc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics. 2003, 19 (Suppl 2): ii138ii148. 10.1093/bioinformatics/btg1071.
 8.
Lèbre S: Inferring dynamic genetic networks with low order independencies. Stat Appl Genet Mol Biol. 2009, 8: Article 9
 9.
BarJoseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS: Comparing the continuous representation of timeseries expression profiles to identify differentially expressed genes. Proc Natl Acad Sci U S A. 2003, 100 (18): 1014610151. 10.1073/pnas.1732547100.
 10.
Kojima K, Yamaguchi R, Imoto S, Yamauchi M, Nagasaki M, Shimamura T, Ueno K, Higuchi T, Gotoh N, Miyano S: A state space representation of VAR models with sparse learning for dynamic gene networks. Genome Inform. 2010, 22: 5668.
 11.
Yoshida R, West M: Bayesian learning in sparse graphical factor models via variational meanfield annealing. J Mach Learn Res. 2010, 11: 17711798.
 12.
Rose K: Deterministic annealing for clustering, compression, classication, regression, and related optimization problems. Proceedings of the IEEE. 1998, 86 (11): 22102239. 10.1109/5.726788.
 13.
Ueda N: Deterministic annealing EM algorithm. Neural Networks. 1998, 11 (2): 271282. 10.1016/S08936080(97)001330.
 14.
Itaya Y, Zen H, Nankaku Y, Miyajima C, Tokuda K, Kitamura T: Deterministic annealing EM algorithm in parameter estimation for acoustic model. The 8th International Conference on Spoken Language Processing. 2004, 433436.
 15.
Ruoss SJ, Hartmann T, Caughey GH: Mast cell tryptase is a mitogen for cultured fibroblasts. J Clin Invest. 1991, 88 (2): 493499. 10.1172/JCI115330.
 16.
Kwon M, Hanna E, Lorang D, Quick JS, He M, Adem A, Stevenson C, Chung J, Hewitt SM, Zudaire E, Esposito D, Cuttitta F, Libutti SK: Functional characterization of filamin A interacting protein 1like, a novel candidate for antivascular cancer therapy. Cancer Res. 2008, 68 (18): 73327341. 10.1158/00085472.CAN081087.
 17.
Downing JR: Can treating the SYK cell cure leukemia?. Cancer Cell. 2009, 16 (4): 270271. 10.1016/j.ccr.2009.09.020.
 18.
Wang C, Lishner M, Minden MD, McCulloch EA: The effects of leukemia inhibitory factor (LIF) on the blast stem cells of acute myeloblastic leukemia. Leukemia. 1990, 4 (8): 548552.
 19.
Heikema AP, Bergman MP, Crocker PR, Gilbert M, Samsom JN, van Wamel WJ, Endtz HP, van Belkum A: Characterization of the specific interaction between sialoadhesin and sialylated Campylobacter jejuni lipooligosaccharides. Infect Immun. 2010, 78 (7): 32373246. 10.1128/IAI.0127309.
 20.
PasonenSeppänen S, Karvinen S, Törrönen K, Hyttinen JM, Jokela T, Lammi MJ, Tammi MI, Tammi R: EGF upregulates, whereas TGFbeta downregulates, the hyaluronan synthases Has2 and Has3 in organotypic keratinocyte cultures: correlations with epidermal proliferation and differentiation. J Invest Dermatol. 2003, 120 (6): 10381044. 10.1046/j.15231747.2003.12249.x.
 21.
Mandinova A, Kolev V, Neel V, Hu B, Stonely W, Lieb J, Wu X, Colli C, Han R, Pazin MJ, Ostano P, Dummer R, Brissette JL, Dotto GP: A positive FGFR3/FOXN1 feedback loop underlies benign skin keratosis versus squamous cell carcinoma formation in humans. J Clin Invest. 2009, 119 (10): 31273137. 10.1172/JCI38543.
 22.
Zhen G, Park SW, Nguyenvu LT, Rodriguez MW, Barbeau R, Paquet AC, Erle DJ: IL13 and epidermal growth factor receptor have critical but distinct roles in epithelial cell mucin production. Am J Respir Cell Mol Biol. 2007, 36 (2): 244253.
 23.
Lai HY, Rogers DF: Mucus hypersecretion in asthma: intracellular signaling pathways as targets for pharmacotherapy. Curr Opin Allergy Clin Immunol. 2010, 10: 6776. 10.1097/ACI.0b013e328334643a.
Acknowledgements
We thank the anonymous reviewers for their constructive comments and suggestions, which improved the quality of this publication.
This article has been published as part of BMC Genomics Volume 13 Supplement 1, 2012: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/14712164/13?issue=S1.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
KK, SI, RY, and SM designed the approach to identify the changes on regulations by the cell treatment to SAECs. KK, SI, and AF contributed to the statistical modeling for the approach, and devised the details of methodologies for estimating the proposed model. MY and NG carried out the microarray experiment for measuring time series the gene expression data on normal and treated SAECs.
Rights and permissions
About this article
Published
DOI
Keywords
 Gefitinib
 Time Series Data
 Variational Annealing
 Marginal Likelihood
 Dynamic Bayesian Network