Inferring gene regulatory networks by thermodynamic modeling
© Chen and Zhong. 2008
Published: 16 September 2008
Skip to main content
© Chen and Zhong. 2008
Published: 16 September 2008
To date, the reconstruction of gene regulatory networks from gene expression data has primarily relied on the correlation between the expression of transcription regulators and that of target genes.
We developed a network reconstruction method based on quantities that are closely related to the biophysical properties of TF-TF interaction, TF-DNA binding and transcriptional activation and repression. The Network-Identifier method utilized a thermodynamic model for gene regulation to infer regulatory relationships from multiple time course gene expression datasets. Applied to five datasets of differentiating embryonic stem cells, Network-Identifier identified a gene regulatory network among 87 transcription regulator genes. This network suggests that Oct4, Sox2 and Klf4 indirectly repress lineage specific differentiation genes by activating transcriptional repressors of Ctbp2, Rest and Mtf2.
Transcriptional control is a key regulatory mechanism for cells to direct their destinies. A large number of transcription factors (TFs) could simultaneously bind to a regulatory sequence. With the constellation of TFs bound, the expression level of a target gene is usually determined by the combinatorial control of a number of TFs. The interactions among regulatory proteins and their regulatory sequences collectively form a regulatory network. A major challenge in the study of gene regulation is to identify the interaction relationships within a regulatory network.
A number of analytical methods have been proposed to reconstruct gene regulatory networks from gene expression and protein-DNA binding data. Association rule mining , Boolean Network , temporal models [3, 4], ARACNE  and Bayesian networks [6–8] are among the most popular routes. For example, the Module Networks approach built a probabilistic model for the gene expression correlations between regulators and target genes and iteratively searched for the most compatible partition of targets genes to their respective regulators . The correlation of gene expression patterns of regulators and the target genes is often the essential piece of information utilized by the current procedures. It is widely recognized that the statistical correlation of the regulators and the targets is often an inaccurate representation of the regulator-target relationship [10, 11]. This is because the quantity of a TF's mRNA does not necessarily correlate to its active protein concentration, and even the active protein concentration does not necessarily correlate to its transcriptional efficiency on every target gene. Using correlation, or some transformed version of correlation measure as the basis for reconstructing regulatory networks is an approximation made for convenience of modeling and analysis, with a sacrifice of making spurious findings (see examples in ). A network reconstruction method based on quantities that closely represent the biophysical properties of TF-DNA binding, transcription activation and repression is still missing.
Thermodynamic models of TF-DNA and TF-RNA polymerase (RNAP) interactions were pioneered by Buchler et al.  in prokaryotes. These models brought the stochastic interactions of TFs, regulatory sequences and RNAP into a statistical mechanics framework, and enabled a quantitative model for the transcription rate. Recently, Segal et al.  and our team  attempted to employ thermodynamic models in the study of eukaryotic gene regulation. Under a fix time point in drosophila development, Segal et al. demonstrated a thermodynamic model could predict the spatial expression patterns of segmentation genes in Drosophila . In differentiating embryonic stem cells (ESCs), we showed that the interaction types of the TFs could be predicted from the temporal response of the target gene . These successes made it tempting to experiment novel methods for reconstructing regulatory networks based on more biophysically appropriate metrics than correlation.
We describe here a computational framework, called Network-Identifier, for inferring regulatory networks from time course gene expression data. The gene expression values at each time point are supposed to be at an equilibrium state, which is a general setting for most of time course data available. Applying to the analysis of five datasets of differentiation of murine ESCs, we identified a transcription network composed of 34 TF-TF interactions and 185 TF-target relationships. Data from RNAi  and chromatin immunoprecipitation coupled with microarray (ChIP-chip) data [16, 17] independently validated a statistically highly significant fraction of these regulatory relationships.
ESCs are derived from early mammalian embryos. ESCs possess two important characteristics that distinguish their importance in scientific and medical fields. First, they are capable of self-renewal through apparently unlimited, undifferentiated proliferation in cultured cell lines [18–20]; second, they have remarkable pluripotency potentials  to give rise to many different cell types in the body, which may contribute to the study of body development and regenerative medicine.
Validation by ChIP-chip data
# of target genes
# of genes verified in ChIP
Validation by RNA interference data
# of target genes
# of genes verified in RNAi
87 regulators and target genes were reported in the ESC transcription network (Figure 1). In particular, the mutual regulation of Klf2 and Klf4 were recently shown to be an important module for maintaining the undifferentiated state of ESCs . Utf1 and Myc are known to be key ESC transcription factors. The result that they are under the control of Oct4 and Klf4 underscores the importance of Klf4 in promoting self-renewal. Mtf2 has only recently been implied to inhibit differentiation by recruiting the polycomb group of transcription repressors . This analysis indicates that Klf4 and Sox2 could synergistically activate Mtf2 in ESCs. The regulatory relationships for a number of genes involved in lineage specific differentiation were also identified. These include Gata6, Gata3, Sox17 and FoxA2. Inhibiting these lineage specific differentiation genes in ESCs is critical to maintain an undifferentiated state. Among the predicted network, there were a number of transcription repressors, including Ctpb2 and Rest. Ctpb2 was predicted to be activated by Oct4. Rest was predicted to be jointly regulated by Oct4 and Sox2. These results suggest that Oct4 and Sox2 could indirectly inhibit differentiation genes by activating transcription repressors such as Ctpb2 and Rest.
Network-Identifier is proposed to reconstruct transcription network based on biophysical models of transcription regulation. Multiple temporal gene expression datasets are used as inputs to Network-Identifier. ChIP-chip and RNAi data can also be utilized by Network-Identifier as independent validation datasets to further improve the predicted networks. Moreover, Network-Identifier has great flexibility in incorporating independent datasets other than ChIP-chip or RNAi data to reinforce the strength of validation.
It should be recognized that there are still a number of simplifications made in the modeling of the biophysical properties of gene regulation. A number of molecular events are not included in the model. These include: 1) the interactions of more than two TFs, 2) long range interaction of enhancer binding TFs and RNAP, 3) DNA methylation and 4) chromatin structure and state. Future work that takes these molecular features and events into account will potentially improve the accuracy of network reconstruction.
Thermodynamic models are based on the assumption that the level of gene expression is proportional to the equilibrium probability that RNAP binds to the promoter of interest; and these probabilities can be computed in a statistical mechanics framework. The Interaction-Identifier method follows recent efforts [12, 25, 26] to translate TF concentrations into the equilibrium probability of RNAP binding using thermodynamic models.
A TF can serve as either an activator or a repressor, or simply it does not interact with the RNAP, represented by different w TFp . These effects can be simulated by choosing appropriate w TFp . If w is set to 1, it represents that RNAP and the TF independently bind to the promoter. If w is set to 10~100, it represents that the TF helps to recruit RNAP to the promoter. The larger w is the higher the synergism is. If w is set to 0 or close to 0, it represents that the TF blocks the RNAP binding, and thus the TF is a repressor.
Under the statistical mechanics framework, similar expressions can be derived for genes with two regulatory TFs capable of binding to a promoter together with RNAP. By adjusting the interaction factors w, we can obtain an analytical form for the probability of RNAP binding under different forms of interactions among RNAP and the two TFs.
The equilibrium concentration of the transcripts of a target gene is governed by its synthesis and degradation rates. Empirical data show that mRNA synthesis rate is proportional to the equilibrium probability of RNAP binding to promoters [12, 25, 26]. Interaction-Identifier further assumes that mRNA degradation rate is proportional to its concentration, which seems to be a reasonable assumption based on recent studies [27, 28]. However the authors can always relax this assumption into more general forms at the model evaluation and model improvement stages. Empirical data on eukaryotic mRNA synthesis and degradation are available to estimate these rates [27–29]. Ordinary equations are used to implement this kinetic model.
This work is supported by National Center for Supercomputing Applications (SZ).
This article has been published as part of BMC Genomics Volume 9 Supplement 2, 2008: IEEE 7th International Conference on Bioinformatics and Bioengineering at Harvard Medical School. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/9?issue=S2
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.