Anomaly detection in gene expression via stochastic models of gene regulatory networks
 Haseong Kim^{1} and
 Erol Gelenbe^{1}Email author
https://doi.org/10.1186/1471216410S3S26
© Kim and Gelenbe; licensee BioMed Central Ltd. 2009
Published: 3 December 2009
Abstract
Background
The steadystate behaviour of gene regulatory networks (GRNs) can provide crucial evidence for detecting diseasecausing genes. However, monitoring the dynamics of GRNs is particularly difficult because biological data only reflects a snapshot of the dynamical behaviour of the living organism. Also most GRN data and methods are used to provide limited structural inferences.
Results
In this study, the theory of stochastic GRNs, derived from GNetworks, is applied to GRNs in order to monitor their steadystate behaviours. This approach is applied to a simulation dataset which is generated by using the stochastic gene expression model, and observe that the GNetwork properly detects the abnormally expressed genes in the simulation study. In the analysis of real data concerning the cell cycle microarray of budding yeast, our approach finds that the steadystate probability of CLB2 is lower than that of other agents, while most of the genes have similar steadystate probabilities. These results lead to the conclusion that the key regulatory genes of the cell cycle can be expressed in the absence of CLB type cyclines, which was also the conclusion of the original microarray experiment study.
Conclusion
Gnetworks provide an efficient way to monitor steadystate of GRNs. Our method produces more reliable results then the conventional ttest in detecting differentially expressed genes. Also Gnetworks are successfully applied to the yeast GRNs. This study will be the base of further GRN dynamics studies cooperated with conventional GRN inference algorithms.
Keywords
Background
Identifying the key features and dynamics of gene regulatory networks (GRNs) is an important step towards understanding behaviours of biological systems. Thanks to the development of highthroughput technology, the amount of microarray gene expression data has greatly increased, and numerous mathematical models attempt to explain gene regulations using gene networks [1, 2]. Once a network structure is inferred, its dynamics needs to be considered. However, most methods focus on the inference of network structure which only provides a snapshot of a given dataset. Probabilistic Boolean Networks (PBNs) represent the dynamics of GRNs [3], but PBNs are limited by the computational complexity of the related algorithms [4].
In [5], a new approach to the steadystate analysis of GRNs based on GNetwork theory [6, 7] is proposed, while GNetworks were firstly applied to GRNs with simplifying assumptions concerning gene expression in [8]. However, the GNetwork approach also exhibits specific difficulties because of the large number of parameters that are needed to compute their steadystate solution. Thus, in this study we reduce the number of model parameters on the basis of biological assumptions and focus on estimating two parameters in particular: the total input rate and steadystate probability of a gene.
A GNetwork is a probabilistic queuing network having special customers which include positive and negative "customers", signals and triggers [6, 7]. It was originally developed also as a model of stochastic neuronal networks [9] with "negative and positive signals or spikes" which represent inhibition and excitation. In terms of GRNs, a queue is a "place" in which mRNAs are stored, and an mRNA can be considered to be a "customer" of the GNetwork. The positive and negative signals are interpreted as the protein activities such as transcription factors, inducers and repressors. Note that the customers or signals of the GNetwork can be any biological molecules. However, in our study, we focus on behaviours of mRNAs because the available GRN data are usually mRNA expressions. Each queue has an input and service rates which represent a transcription and degradation processes, respectively. Our interest is to estimate the steadystate probability that a queue is busy, which corresponds to the probability that an mRNA is present, and we are also interested in the total mRNA input rate of each queue. To evaluation the accuracy of the proposed method, we generated a simple simulation dataset by using the stochastic gene expression models processed with the widely accepted Gillespie algorithm [10, 11]. We also examine a real biological dataset obtained from the cell cycle of the budding yeast [12].
Although queueing theory is a common computational tool, GNetworks are an essential departure from queueing theory; in particular conventional queues could not be possibly applied to GRNs because the notion of inhibition does not exist in queueing theory but was introduced by GNetwork theory. There are two other essential novelties in our work. First, our approach enables us to obtain the steadystate of GRNs with only polynomial computational complexity due to the product form solution of GNetworks; the computational cost due to large memory space and nonpolynomial computational complexity are basic limitations in conventional methods such as PBN. Also our method can provide more reliable measures to detect differentially expressed genes in microarray analysis (as shown in our simulation study).
Gnetworks and gene regulatory networks
where Λ_{ i }is the total input rate (sum of transcription rate, λ_{ i }and increment rate of mRNAs come from outside of system, I_{ i }), μ_{ i }is the service rate (e.g. Degradation rate of mRNAs). o(Δt) → 0 as t → 0. Let r_{ i }is representing the activity (signal process) rate of each gene i. Then 1/r_{ i }is the average time between successive interactions of gene i with other genes. If the i th gene interacts with other genes, the following events occur:

With probability P^{+} (i, j), gene i activates gene j; when this happens, K_{ i }(t) is depleted by 1 and K_{ j }(t) is increased by 1

With probability P^{} (i, j), gene i inhibits gene j; when this happens, both K_{ i }(t) and K_{ j }(t) are depleted by 1

With probability Q(i, j, l) gene i joins with gene j to act upon gene l in excitatory mode, as a result of which both K_{ i }(t) and K_{ j }(t) are reduced by 1, while K_{ l }(t) is increased by 1
the signal of gene i exits the system so K_{ i }(t) is depleted by 1
where for any subset I ⊂ 1, ..., n such that q_{ m }< 1 for each m ∈ I, and I{m_{1}, ..., m_{I}}.
Results and discussion
Simple gene regulatory networks using stochastic gene expression model
Parameters of stochastic gene expression model
Parameters  Values  References  

Transcription initiation  λ _{2}  0.0025 sec^{1}  
Translation initiation  λ _{3}  0.0612 sec^{1}  
mRNA degradation  γ _{2}  0.00578 sec^{1}  [16] 
Monomer degradation  γ _{3}  0.00077 sec^{1}  
Dimer degradation  γ _{4}  0.00057 sec^{1}  
Dimer association  k _{a 1}  0.1  [17] 
Dimer dissociation  k _{d 1}  0.01  [17] 
DNAprotein association  k _{a 2}  0.189  [18] 
DNAprotein dissociation  k _{d 2}  0.0157  [18] 
Burst size  b  10  
Accumulation time of proteins  Δt  0.1  [11] 
The transcription process in (5) follows an exponential distribution with transcription initiation rate λ_{2} [16]. The translation processes are given in (6) and include direct competition between the ribosome binding site and the RNAseE binding site which degrade the mRNAs. Thus the translation process follows a geometric distribution with probability p and busting size b = p(1  p) [13, 16]. T_{ D }is the average time interval between successive competitions, and the number of surviving mRNAs n_{2} in the population after transcription is blocked with n_{2} = n_{2,0} . This is equal to T_{ half }= (log(2)/log(p))T_{ D }[13]. Thus the translation initiation rate, λ_{3} = 1/T_{ D }, can be computed. The protein dimer association and disassociation rates are k_{a 2}and k_{d 2}, respectively, as shown in (7) and (8) [17]. We also consider the DNAprotein association and disassociation rates (k_{a 1}and k_{d 2}in (9) and (10), respectively) [18]. The degradation rate of mRNA and of proteins are obtained by using the halflife of each molecule (11) [16, 17].
Steadystate probability and total income rate of dataset showing significant pvalue of G_{ A }
Normal  Case  ttest  

q  Λ  q  Λ  q_{ C }/ q_{ N }  Λ_{ C }/Λ_{ N }  p value  
Dataset 1 50 Samples  G _{ A }  0.512  0.765  0.296  0.465  0.57  0.60  0.000 
G _{ B }  0.517  0.785  0.595  0.775  1.15  0.98  0.123  
G _{ C }  0.502  0.765  0.546  0.875  1.08  1.14  0.311  
G _{ D }  0.487  0.735  0.563  0.875  1.15  1.19  0.127  
Dataset 2 50 Samples  G _{ A }  0.445  0.675  0.369  0.565  0.82  0.83  0.202 
G _{ B }  0.423  0.615  0.556  0.765  1.31  1.24  0.016  
G _{ C }  0.472  0.675  0.432  0.675  0.91  1.00  0.439  
G _{ D }  0.510  0.755  0.525  0.755  1.02  1.00  0.661  
Dataset 3 500 Samples  G _{ A }  0.474  0.725  0.319  0.495  0.67  0.68  0.000 
G _{ B }  0.503  0.745  0.584  0.775  1.16  1.04  0.000  
G _{ C }  0.460  0.695  0.443  0.705  0.96  1.01  0.304  
G _{ D }  0.521  0.765  0.541  0.785  1.03  1.02  0.122 
Table 2 summarizes the results of the three datasets. In the case groups of Datasets 1 and 2, both the q_{ A }and Λ_{ A }have the lowest values among the four nodes while the ttest of the G_{ A }expression in Dataset 2 shows that it is not significant (pvalue = 0.202). In the small sample results (Datasets 1 and 2), our method provides consistent results with large sample analysis (Dataset 3). The ratios (case/normal) also show that the q_{ A }and Λ_{ A }, in the case group, are smaller than one while the other ratios stay around one. In Dataset 3, the pvalue of G_{ B }is significant along with that of G_{ A }because the expression of G_{ A }directly affects the expression of G_{ B }. However, G_{ B }is not the causal gene in this study. Our GNetwork analysis reveals that only G_{ A }has lower q and Λ values than other nodes including G_{ B }. All these results concur with the simulation data generated with one half of the normal transcription rate.
Modeling cell cycle gene regulatory networks in budding yeast
The activity of cyclindependent kinases (CDKs) plays an important role in controlling periodic events during cell cycle. Some studies of cell cycle with highthroughput technologies have suggested alternative regulation models of periodic transcription [20]. D. Olando et., al. [12] measured the transcription levels of cell cycle related genes with the use of Yeast 2.0 oligonucleotide array and determined the manner in which transcription factor networks contribute to CDKs and to global regulation of the cellcycle transcription process. This microarray dataset is used in our study with the cell cycle network structure of Figure 4; it consists of two groups: one group is obtained from wildtype (WT) cells and the other is from cyclinmutant (CM) cells which are disrupted for all Sphase and mitotic cyclins (mutate clb1, 2, 3, 4, 5, and 6).
Steadystate probability of the 13 genes in cell cycle GRNs
State  Cells  CLN3  WHI5  SWI4  MBP1  CLB2  YOX1  YHP1  HCM1  FKH2  NDD1  SWI5  ACE2  SIC1 

S1  WT  0.880  0.813  0.829  0.839  0.784  0.99  0.803  0.843  0.855  0.836  0.799  0.99  0.99 
CM  0.878  0.814  0.818  0.848  0.770  0.99  0.802  0.842  0.864  0.839  0.787  0.99  0.99  
C/W  0.998  1.001  0.987  1.011  0.981  1.00  0.999  0.999  1.011  1.004  0.986  1.00  1.00  
S2  WT  0.882  0.845  0.845  0.840  0.847  0.99  0.850  0.870  0.863  0.863  0.825  0.99  0.99 
CM  0.876  0.837  0.846  0.847  0.769  0.99  0.853  0.873  0.865  0.861  0.807  0.99  0.99  
C/W  0.994  0.990  1.000  1.008  0.909  1.00  1.004  1.004  1.002  0.998  0.978  1.00  1.00  
S3  WT  0.890  0.840  0.826  0.846  0.886  0.99  0.844  0.855  0.863  0.854  0.871  0.99  0.99 
CM  0.880  0.846  0.820  0.849  0.751  0.99  0.863  0.863  0.869  0.870  0.840  0.99  0.99  
C/W  0.989  1.008  0.993  1.003  0.847  1.00  1.022  1.010  1.007  1.019  0.964  1.00  1.00  
S4  WT  0.890  0.841  0.837  0.845  0.866  0.99  0.839  0.870  0.862  0.853  0.857  0.99  0.99 
CM  0.879  0.835  0.821  0.849  0.757  0.99  0.864  0.864  0.859  0.863  0.845  0.99  0.99  
C/W  0.988  0.993  0.982  1.005  0.874  1.00  1.029  0.994  0.996  1.012  0.986  1.00  1.00  
S5  WT  0.891  0.850  0.837  0.846  0.877  0.99  0.839  0.869  0.862  0.856  0.865  0.99  0.99 
CM  0.869  0.830  0.823  0.842  0.756  0.99  0.862  0.862  0.857  0.861  0.845  0.99  0.99  
C/W  0.976  0.977  0.983  0.995  0.862  1.00  1.027  0.991  0.994  1.006  0.976  1.00  1.00 
Estimated total input rate of the 13 genes in cell cycle GRNs
State  Cells  CLN3  WHI5  SWI4  MBP1  CLB2  YOX1  YHP1  HCM1  FKH2  NDD1  SWI5  ACE2  SIC1 

S1  WT  4.127  2.248  5.309  0.763  1.278  2.006  0.914  0.995  0.884  1.015  1.783  1.006  1.006 
CM  4.127  2.248  5.238  0.793  1.217  2.006  0.914  0.995  0.914  1.036  1.702  1.006  1.006  
C/W  1.000  1.000  0.987  1.040  0.953  1.000  1.000  1.000  1.034  1.020  0.955  1.000  1.000  
S2  WT  4.187  2.339  5.521  0.763  1.430  2.006  0.995  1.036  0.854  0.995  1.945  1.006  1.006 
CM  4.187  2.309  5.521  0.793  1.187  2.006  0.995  1.036  0.854  1.056  1.743  1.006  1.006  
C/W  1.000  0.987  1.000  1.040  0.830  1.000  1.000  1.000  1.000  1.061  0.896  1.000  1.000  
S3  WT  4.187  2.339  5.379  0.763  1.551  2.006  0.995  1.015  0.884  0.955  2.187  1.006  1.006 
CM  4.187  2.339  5.379  0.793  1.127  2.006  1.036  1.036  0.884  1.096  1.824  1.006  1.006  
C/W  1.000  1.000  1.000  1.040  0.726  1.000  1.041  1.020  1.000  1.148  0.834  1.000  1.000  
S4  WT  4.187  2.339  5.450  0.763  1.490  2.006  0.975  1.036  0.854  0.955  2.106  1.006  1.006 
CM  4.187  2.309  5.379  0.793  1.157  2.006  1.036  1.036  0.854  1.076  1.864  1.006  1.006  
C/W  1.000  0.987  0.987  1.040  0.776  1.000  1.062  1.000  1.000  1.127  0.885  1.000  1.000  
S5  WT  4.187  2.369  5.450  0.763  1.521  2.006  0.975  1.036  0.854  0.955  2.147  1.006  1.006 
CM  4.127  2.278  5.379  0.793  1.157  2.006  1.036  1.036  0.854  1.076  1.864  1.006  1.006  
C/W  0.986  0.962  0.987  1.040  0.761  1.000  1.062  1.000  1.000  1.127  0.868  1.000  1.000 
Conclusion
This paper has used the GNetwork approach [5–8] to model GRNs. Two model parameters, the steadystate probability, q_{ i }, and the total input rate, Λ_{ I }, are estimated by determining the boundary of Λ_{ i }and using a grid search. We first use simulated gene expression data generated on the basis of a stochastic gene expression model. Two groups (normal and case) of expression data are examined. These two groups are exactly the same except for one parameter, the transcription initiation rate. We have observed that the GNetwork based method is able to detect the abnormally expressed genes, while the ttest produces false positives. Then, using real data, we have observed that the steadystate probability of CLB2 is lower than that of other agents and concluded that the key genes of cell cycle regulation can be expressed without the clb cyclines; this result is consistent with the original experimental study.
However, the unchanged steadystate probabilities in all the five states may need to be considered, because the cell cycle has four phases (G1, S, G2, M) and expressions of genes involved with a specific phase are expected to be different from those in other phases. Also the small decrease rate and relatively large total input rates of CLB2 may require a more careful analysis of the GNetwork approach in relation to cell cycle GRN structure. The manner in which we have used GNetwork models in this paper did not currently include simultaneous interactions with three or more nodes. However this is not really a limiting effect of the model, since it suffices to include chain representations of dependencies in the GNetwork model as has been done for neuronal networks [9] to cover excitatory and inhibitory effects that involve three or more nodes, and in fact random chains of nodes of any length. Although in this study the probabilities that gene i affect gene j, P^{+} (i, j) and P^{} (i, j) in (3), are fixed at the value one, we think that the conventional reverse engineering GRN methods using the "Ensemble" method [21] can provide these probabilities more accurately for an improved steadystate analysis of GRNs.
In conclusion, our study has illustrated the use of GNetworks as a new approach for the steadystate analysis of GRNs, and has shown their usefulness in obtaining quantities such as the effective transcription rate and the steadystate probabilities, using them to detect differentially expressed genes, thus introducing a new approach which differs from more conventional microarray analysis methods. Future research will investigate the ensemble approaches to GRNs [21] based on the GNetwork methodology in [5], which will allow to infer GRN structures, and also to monitor their steadystate behaviour.
Methods
In (12), the Λ_{ i }and R_{ i }is the total input (Λ_{ i }= λ_{ i }+ I_{ i }) and total output rates (R_{ i }= r_{ i }+ μ_{ i }), respectively. is a function of activation probabilities of genes which affect to gene i positively and is a function of activation probabilities of genes which affect to gene i negatively. We fix the r_{ i }as the number of out degrees of gene i and the degradation rate of mRNA, i, as a constant (Table 1) because the total output rate, R_{ i }is not our interest. Therefore, we need to estimate two parameters, the total input rate, Λ_{ i }, and the steadystate probability, q_{ i }.
where the probabilities and are one.
where x_{ ij }is the observed expression level (number of mRNAs) of i th gene at the j th observation and max(x_{ ij }) is the maximum value among all observed values of i th gene. Let Λ_{ iu }is a value of total input rate between the lower bound and the upper bound of Λ_{ i }( ). Then the steadystate probability q_{ i }can be obtained numerically by solving (12) with the and the Λ_{ iu }. Once the steadystate probability is determined, the loglikelihood of the given model can be computed by using (4) which is the same form of the likelihood of geometric distribution. It is known that the loglikelihood of geometric distribution is convex so we choose appropriate value Λ_{ i }which maximizes the loglikelihood function.
Note that the q_{ iu }is a numerical solution of (12) with initial value, , and total input rate, Λ_{ iu }. In order to compute , the initial value, in (13) is substituted by which is a numerical solution of (12) with initial value, , and total input rate, . Then the steadystate probability of gene i, q_{ i }, can be obtained by updating its value iteratively until the d^{(t)}<δ where d^{(t)}is the difference between and at t th iteration. In this study, δ is 0.0001.
Note
Other papers from the meeting have been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, available online at http://www.biomedcentral.com/14712105/10?issue=S15.
Declarations
Acknowledgements
Some of this research has been supported by the EU FP7 DIESIS Project.
This article has been published as part of BMC Genomics Volume 10 Supplement 3, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/14712164/10?issue=S3.
Authors’ Affiliations
References
 Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of computational biology. 2000, 7 (34): 601620. 10.1089/106652700750050961.View ArticlePubMedGoogle Scholar
 Schafer J, Strimmer K: An empirical Bayes approach to inferring largescale gene association networks. Bioinformatics. 2005, 21 (6): 754764. 10.1093/bioinformatics/bti062.View ArticlePubMedGoogle Scholar
 Shmulevich I, Gluhovsky I, Hashimoto R, Dougherty E, Zhang W: Steadystate analysis of genetic regulatory networks modelled by probabilistic Boolean networks. Comparative and Functional Genomics. 2003, 4 (6): 601608. 10.1002/cfg.342.PubMed CentralView ArticlePubMedGoogle Scholar
 Ching W, Zhang S, Ng M, Akutsu T: An approximation method for solving the steadystate probability distribution of probabilistic Boolean networks. Bioinformatics. 2007, 23 (12): 151110.1093/bioinformatics/btm142.View ArticlePubMedGoogle Scholar
 Gelenbe E: Steadystate solution of probabilistic gene regulatory networks. J Theor Biol Phys Rev E. 2007, 76: 03190310.1103/PhysRevE.76.031903.View ArticleGoogle Scholar
 Gelenbe E: Productform queueing networks with negative and positive customers. Journal of Applied Probability. 1991, 656663. 10.2307/3214499.Google Scholar
 Gelenbe E: Gnetworks with triggered customer movement. Journal of Applied Probability. 1993, 742748. 10.2307/3214781.Google Scholar
 Arazi A, BenJacob E, Yechiali U: Bridging genetic networks and queueing theory. Physica A: Statistical Mechanics and its Applications. 2004, 332: 585616. 10.1016/j.physa.2003.07.009.View ArticleGoogle Scholar
 Gelenbe E, Timotheou S: Random neural networks with synchronized interactions. Neural Computation. 2008, 20 (9): 23082324. 10.1162/neco.2008.0407509.View ArticlePubMedGoogle Scholar
 Gillespie D, et al: Exact stochastic simulation of coupled chemical reactions. The journal of physical chemistry. 1977, 81 (25): 23402361. 10.1021/j100540a008.View ArticleGoogle Scholar
 Bratsun D, Volfson D, Tsimring L, Hasty J: Delayinduced stochastic oscillations in gene regulation. Proc Natl Acad Sci U S A. 2005, 102 (41): 1459314598. 10.1073/pnas.0503858102.PubMed CentralView ArticlePubMedGoogle Scholar
 Orlando D, Lin C, Bernard A, Wang J, Socolar J, Iversen E, Hartemink A, Haase S: Global control of cellcycle transcription by coupled CDK and network oscillators. Nature. 2008, 453 (7197): 944947. 10.1038/nature06955.PubMed CentralView ArticlePubMedGoogle Scholar
 McAdams H, Arkin A: Stochastic mechanisms in gene expression. 1997Google Scholar
 Paulsson J: Models of stochastic gene expression. Physics of life reviews. 2005, 2 (2): 157175. 10.1016/j.plrev.2005.03.003.View ArticleGoogle Scholar
 Ribeiro A, Zhu R, Kauffman S: A general modeling strategy for gene regulatory networks with stochastic dynamics. Journal of Computational Biology. 2006, 13 (9): 16301639. 10.1089/cmb.2006.13.1630.View ArticlePubMedGoogle Scholar
 Thattai M, van Oudenaarden A: Intrinsic noise in gene regulatory networks. Proc Natl Acad Sci U S A. 2001, 98 (15): 86148619. 10.1073/pnas.151588598.PubMed CentralView ArticlePubMedGoogle Scholar
 Buchler N, Gerland U, Hwa T: Nonlinear protein degradation and the function of genetic circuits. Proc Natl Acad Sci U S A. 2005, 102 (27): 95599564. 10.1073/pnas.0409553102.PubMed CentralView ArticlePubMedGoogle Scholar
 Goeddel D, Yansura D, Caruthers M: Binding of synthetic lactose operator DNAs to lactose repressors. Proc Natl Acad Sci U S A. 1977, 74 (8): 32923296. 10.1073/pnas.74.8.3292.PubMed CentralView ArticlePubMedGoogle Scholar
 Bloom J, Cross F: Multiple levels of cyclin specificity in cellcycle control. Nature Reviews Molecular Cell Biology. 2007, 8 (2): 149160. 10.1038/nrm2105.View ArticlePubMedGoogle Scholar
 Fink G, Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown P, Botstein D, Futcher B: Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular biology of the cell. 1998, 9 (12): 32733297.View ArticleGoogle Scholar
 Kim H, Lee J, Park T: Inference of Large Scale Gene Regulatory Networks Using Regressionbased Approach. Journal of Bioinformatics and Computational Biology. 2009,Google Scholar
 Golding I, Paulsson J, Zawilski S, Cox E: Realtime kinetics of gene activity in individual bacteria. Cell. 2005, 123 (6): 10251036. 10.1016/j.cell.2005.09.031.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.