Volume 11 Supplement 4
Ninth International Conference on Bioinformatics (InCoB2010): Computational Biology
Functional data analysis for identifying nonlinear models of gene regulatory networks
 Georg Summer^{1, 2} and
 Theodore J Perkins^{1, 3}Email author
DOI: 10.1186/1471216411S4S18
© Summer and Perkins; licensee BioMed Central Ltd. 2010
Published: 2 December 2010
Abstract
Background
A key problem in systems biology is estimating dynamical models of gene regulatory networks. Traditionally, this has been done using regression or other adhoc methods when the model is linear. More detailed, realistic modeling studies usually employ nonlinear dynamical models, which lead to computationally difficult parameter estimation problems. Functional data analysis methods, however, offer a means to simplify fitting by transforming the problem from one of matching modeled and observed dynamics to one of matching modeled and observed time derivatives–a regression problem, albeit a nonlinear one.
Results
We formulate a functional data analysis approach for estimating the parameters of nonlinear dynamical models and evaluate this approach on data from two real systems, the gap gene system of Drosophila melanogaster and the synthetic IRMA network, which was created expressly as a test case for genetic network inference. We also evaluate the approach on simulated data sets generated by the GeneNetWeaver program, the basis for the annual DREAM reverse engineering challenge. We assess the accuracy with which the correct regulatory relationships within the networks are extracted, and consider alternative methods of regularization for the purpose of overfitting avoidance. We also show that the computational efficiency of the functional data analysis approach, and the decomposability of the resulting regression problem, allow us to explicitly enumerate and evaluate all possible regulator combinations for every gene. This gives deeper insight into the the relevance of different regulators or regulator combinations, and lets one check for alternative regulatory explanations.
Conclusions
Functional data analysis is a powerful approach for estimating detailed nonlinear models of gene expression dynamics, allowing efficient and accurate estimation of regulatory architecture.
Background
A key problem in systems biology is estimating dynamical models of gene regulatory networks. The mathematical modeling of expression dynamics, combined with model parameter estimation, has been crucial to unraveling complex regulatory programs [1], to recognizing the robustness of the regulatory architecture of the segment polarity genes to variations in initial conditions and parametric variation [2–4], to studying mechanisms of robustness and evolution of the control of the cell cycle in yeast [5, 6], to identifying surprising shifts in the expression domains of the gap genes and the regulatory interactions responsible [7, 8], and to numerous other studies (e.g., [9–16]).
where x is the vector of expression levels of the N genes, and f produces a vector of time derivatives of expression depending on the current expression levels and on some adjustable parameters θ. These parameters typically encode such features as kinetic rates for the production and decay of gene products, and regulatory influences between the genes. The regulatory architecture of the system–that is, which genes' expression derivatives depend on which other genes' expression–may be made explicit in the function f (e.g., [2]), or it may be implicit in the parameters (e.g., [7, 8]), in which case optimizing the parameters implicitly determines network architecture. Extensions of our work to modeling both mRNA and protein levels of expression, for example, are straightforward, as would be extensions to functions f that depend on time or to delay differential equations, where the derivatives depend on the state of the system in the past. We will also assume that the expression data is collected from the wild type network, though initial conditions may vary. Knockout or overexpression data has also proven useful in genetic network inference, both in theory [17] and in practice [18]. However, wildtype data is far more common and easier to generate than genetic perturbation data.
where x(t_{ i }) denotes the solution to the ODE (Eq. 1) with parameters θ. The initial conditions are often taken from the observed data, x(t_{0}) = y(t_{0}), or they may be part of the parameters θ. Even when the ODE model is linear, so that for some matrix A, this optimization is not trivial. The solution to such an ODE is given by the matrix exponential x(t) = e^{ At }x(t_{0}), so that the dependence of the error on the parameters (A) is not straightforward. Still, linear differential equation models have been fit efficiently to expression data by various means, most prominently by recasting the problem into other more convenient forms [9, 19, 20]. When f is nonlinear, as is typically the case when trying to make more detailed models of network dynamics, then solving the minimization (Eq. 2) is all the more difficult.
There is another major approach to fitting ODE models, however, via functional data analysis (FDA) [21]. The fundamental idea of FDA is to transform a data series (e.g., the time series y(t_{ i })) into a continuous function (ŷ(t)). This transformation often involves "denoising" the data, using smoothing splines or some other basis function approximation. Various estimation problems can then be solved in terms of these functions. FDA approaches have made some inroads in the literature on gene expression analysis. It has particularly appeared in papers on dimensionality reduction, clustering or classification for microarray expression timeseries data (e.g., [22–24]). More relevantly to the present paper, several works have proposed estimating linear dynamical models from (microarray) expression data [25, 26]. As will be explained in greater detail below, this approach to dynamical modeling allows the estimation problem to be reduced to one of regression, which carries both statistical and computational advantages.
Here, denotes the element of the vector corresponding to gene g, and similarly f_{ g } denotes the element of f pertaining to gene g. Thus, if there are N genes and, say, K free parameters per gene, the fitting problem is reduced from a single nonlinear N × Kdimensional problem to a set of N independent Kdimensional optimization problems. Such a reduction in dimensionality is typically very favorable when solving nonlinear optimization problems. Finally, although the derivativebased error criterion is still a nonlinear optimization problem for arbitrary dynamics functions f, informally, the optimization tends to be "less nonlinear" than for the trajectorybased error. In part, this is because the error involves only the evaluation of the dynamics function rather than solutions to the dynamics equation. Typically, f is not taken to be anything more complicated than a generalized linear model [28], so that minimizing derivatedbased error is a generalizedlinear least squares regression problem–a type of problem routinely solved in statistical analyses.
Despite the potential advantages of the FDA approach, we believe it has not been seriously evaluated on the problem of estimating nonlinear models of gene expression dynamics. In particular, neither its efficiency nor its ability to correctly estimate regulatory network architecture have been evaluated. Here, we formulate and test FDA approaches on data from two different real networks, the gap gene system of Drosophila melanogaster [7, 8] and the synthetic IRMA network [29], and on simulated data generated by the GeneNetWeaver program [30]. We show that the FDA approach is extremely efficient at fitting nonlinear dynamical models of these data sets. In fact, it is so efficient that we can explicitly enumerate and test all possible regulatory architectures, which, to our knowledge, has never been achievable before for this type of modeling. These enumerations clarify the key regulatory factors, as well as interactions between factors, that explain the observed expression dynamics. We also assess the accuracy with which regulatory relationships are correctly extracted from the data, and compare it to other stateoftheart fitting approaches. In general, the approach seems as successful as any other at determining which genes regulate which, and is very successful at discriminating the types of regulatory interactions–activation or repression.
Results and discussion
Systems and data
We apply FDA methods for fitting differential equation models of data from two real gene networks and simulated data from a set of in silico systems. Here we briefly describe these systems and the expression data upon which our fits are based.
The trunk gap gene system of Drosophila melanogaster
Reinitz and colleagues have made detailed measurements of the protein expression of these seven genes during development of the embryo [31] by confocal imaging of fluorescent antibodylabelled preparations. We use a data set that includes measurements at 8 different times spanning approximately one hour of actual time, and covering the trunk region of the embryo at a resolution of 1% of embryo length. This includes 7 genes × 8 times × 58 space points = 3248 total expression measurements. The data for the Hb gene are shown in Figure 1B.
A synthetic gene network in yeast
Cantone et al.[29] reported on the construction and mRNA expression measurement of a synthetic gene network created in yeast called IRMA (for in vivo benchmarking of reverseengineering and modeling approaches). The network was constructed from five genes: SWI5, CBF1, GAL4, GAL80 and ASH1. Each gene was given a known and wellcharacterized promoter responsive to one or more of the other genes in the network, as shown in Figure 1C. In their paper, Cantone et al. describe GAL80 as repressing GAL4, but this is via their natural protein interaction properties. At the mRNA level, GAL80 does not affect GAL4, and so the effect of GAL80 is seen only at the target of GAL4 protein, which is the promoter of SWI5. Hence, our canonical model has GAL80 repressing SWI5. The endogenous transcription factors were deleted from the organism, to limit the impact of external factors. The dependence of GAL80/GAL4 binding on galactose is used as an on/off switch for the network. In the presence of galactose, the SWI5 gene is activated to subsequently trigger expression in the rest of the network. We use five different timeseries that they generated by switching the network on using galactose and measuring the mRNA levels every 20 min over a five hour interval by quantitative realtime RTPCR (see Figure 1D).
Test problems generated by GeneNetWeaver
GeneNetWeaver [30] can be used to generate in silico datasets of the expression dynamics of gene networks, and has been the basis for part of the DREAM network reverse engineering challenge for several years running [32]. The tool allows one to generate data from estimated yeast or E. coli networks, or subnetworks thereof. The program generates a kinetic model of gene expression, and can output timeseries or steady state data for wildtype and genetic perturbation conditions. We used GeneNetWeaver to generate four in silico networks, two of which are sparsely connected like the IRMA network, which we denote S1 and S2 (see Figure 1E for an example), and two of which are more densely connected like the gap gene network, which we denote D1 and D2. We generated 20 wildtype expression time series for each network as the basis for model estimation (see Figure 1F).
Unconstrained modelfitting by FDA
We smoothed and transformed the time series into continuous functions of time using the cubic spline functions built into the Matlab programming language (see Methods for details). For each of the data sets, this results in a set of functions ŷ^{ i }(t), where the superscript i indicates it is the i^{ th } such time series–one of 58 for the Drosophila data, corresponding to each space point, one of 5 for the Cantone data, and one of 20 for each GeneNet Weaver network. With the cubic spline representation, the temporal derivatives can be directly obtained from the spline coefficients, so that is readily computed.
where R_{ g } is the maximum rate of production of gene g's protein or mRNA, is a sigmoidal function ranging between zero and one, T_{ gg }′ is the regulatory weight describing the effect of gene g′ on the production of g's protein or mRNA, h_{ g } is a bias term, and λ_{g} is the decay rate.
where E_{0} is the original error function of Equation 6 and c is a parameter determining the relative import of fitting the data accurately and using "small" weights. The L_{1} penalty is often used in an attempt to eliminate excess parameters in regression problems. If one is only concerned about prediction accuracy, and if one has statistically independent data points, then crossvalidation can be used to choose a value of c that appropriately trades off model complexity and model accuracy on the training data. In our case, the data come from time series, so derivative estimates at different times are certainly not statistically independent. Nor is our primary concern the accuracy of the regression model. This is only a conduit to determining regulatory architecture. Thus, we experimented with a range of c values, as described in more detail below. Regulatory weights that remain nonzero for large values of c are the most important for explaining derivatives, and we give these the highest "confidence".
For the IRMA and GeneNetWeaver data sets, we fit models without autoregulatory links, as these systems do not include autoregulation. For the gap gene system, however, where autoregulation is believed to occur, we allowed autoregulatory links in the model. Three of the Drosophila genes and some of the genes in the GeneNet Weaver networks do not have any regulatory inputs–at least, not among the genes considered. We did not model these genes, restricting our modeling efforts (and accuracy assessments) to those genes that are regulated.
Results on the Drosophila data
The gap gene network is densely connected, with 22 of the 28 possible links present in the gold standard model. Adding regularization to the optimization criterion risks eliminating true positives. Nevertheless, we tried optimizing the L_{1}regularized error function E_{1} for regularization constant c ranging from 0 to 10 in increments of 0.1. The results are summarized in Figure 2C. As one would hope, increasing c increases the number of true negatives from 1 (at c = 0) to 4 (at c = 10). At the same time, however, the number of true positives, and total correct links, drops drastically. The positive predictive value does not improve with c, as both true positives as well as false positives are dropped from the model (data not shown).
Results on the IRMA data
Figure 2D shows the network obtained by optimization of the E_{0} criterion for the IRMA data. The IRMA network is sparse compared to the gap gene network, having only seven links among the five genes. The optimization correctly identifies six of those links, including their correct sign. It misses only the activation of GAL4 by CBF1, perhaps because the model also has the true regulators of CBF1 connected to GAL4a case of mistaking direct versus indirect regulation. Without regularization, however, there are many false positive links in the estimated regulatory architecture. Figure 2E compares the performance of the E_{0} optimization against the TSNI algorithm, which fits a linear differential equation model that is limited to at most two inputs per gene. This algorithm performed the best of several alternatives tested by Cantone et al.[29]. The TSNI algorithm detected four of the seven true links in the network, attributing the correct sign to three of those links. Its overall fraction of correct links is much higher than that for the FDA fit, perhaps in part because the limitation to two inputs per gene ensures many true negatives–links absent in both the gold standard and the estimated model. (In the next section, we will see how FDA performs when limited to two inputs per gene.) Because of the large number of false positives in the FDA fit, it also has significantly lower PPV than TSNI. However, the FDA fit enjoys greater sensitivity and correct sign fraction.
Because the unregularized fit includes a large number of false positives, we hoped that adding the L_{1}regularization would improve the accuracy of the estimated network architecture. Figure 2F shows the results for regularization constant c ranging between 0 and 10. Regularization was partly successful. For c ranging from roughly 3 to 5, one of the true positives was lost, but five false positives were also trimmed away, approximately halving their number. This still left seven false positives, however, which could only be eliminated by losing most of the true positives. From both our experience and the results of Cantone et al.[29], the IRMA dataset appears much more challenging than the gap gene data, perhaps due to the much smaller number of time series (only 5, compared to 58) or to noise in the data (the gap gene data incorporates significant smoothing and averaging across embryos, to eliminate observation noise and other sources of variability).
Results on the GeneNetWeaver data
Broadly speaking, our results on the two sparse GeneNetWeaver networks mimicked our results on IRMA, and our results on the two dense GeneNetWeaver networks mimicked our results on the gap gene network. Figure 2GJ show the estimated network structures. For the sparse networks all (S2) or nearly all (S1) true links are detected and all are correctly signed. However, there are significant numbers of false positives–albeit less than in the IRMA fit. Conversely, the estimates for the dense networks include no (D1) or few (D2) false positives, but miss out identifying some true links. One particularly interesting case is gene G1 in network D1. This gene has five regulators, all of which act positively, and only two of which are identified by the FDA fit. The other regulators are difficult to detect because the gene is nearly always being activated, and so intuitively it appears almost as if it is unregulated–it is rare to observe the gene in a state that reveals anything about its regulation. The basic trends in true and false positives are reflected in the summary statistics shown in Figure 2K. PPV is moderate for the sparse network and high for the dense networks, while sensitivity is the opposite. One difference in comparision with the gap gene and IRMA results is that FDA obtains higher fractions of correctly signed links (activation, repression or no effect) on the sparse networks than on the dense networks. In all cases performance is significantly better than chance, which would only be right 1/3 of the time.
For the two sparse networks, where false positives were a concern, we evaluated the L_{1}regularization approach to improving accuracy. The results are shown in Figure 2L. For both networks, regularization was able to eliminate the majority of the false positives, with little loss in true positives. For S1, the number of correct links (Corr) reached as high as 33 out of 36 links for regularization constant c around 4 or 5. For S2, the number of correct links was as high as 24 out of 30.
Explicit enumeration of possible network structures
As mentioned above, the FDA approach to model fitting is computationally efficient. Part of its speed is due simply to the greater ease of evaluating the derivativebased error (Eq. 3) as opposed to the more traditional trajectorybased error (Eq. 2). We tested this in Matlab, comparing our implementation of the derivativebased error against a trajectorybased error function that uses the builtin ode45 function to solve the dynamics equation. Over a range of testing conditions, we found that the derivativebased error could be computed 300 ±40 times faster than trajectorybased error.
One of the advantages of the speed with which the FDA fits can be done is that we do not need to limit ourselves to unconstrained network architectures. We can explicitly test alternative architectures and, in fact, we are able to enumerate them all if the number of genes in the network is not too large. For the gap gene network, where all seven of the measured genes can act as input to any of the gap genes, there are 2^{7} = 128 possible input combinations for any gene. Because each gene's model is fit independently, we can test all possible regulatory architectures with a total of 4 × 2^{7} = 512 fits. This begins to be a significant computation, but on a 32core computing cluster, it amounted to an overnight job. By enumerating all possible inputs for every gene, we are able to explicitly assess which regulators or combinations of regulators are most important for explaining each gene's observed expression. Enumeration also gives us another way to regularize the fit, by limiting the number of inputs per gene.
Figure 3DI give summary statistics for the accuracy of the regulatory networks when limited to the best singleinput, twoinput, etc. models of each gene. By necessity, the fraction of correct links (CF) is low for the dense networks when only one or a few inputs are allowed, but increases as more inputs are allowed. The reverse happened for the IRMA network, where limiting to one or two inputs gave much better CF and PPV scores, though still not quite as high as achieved by TSNI. Interestingly, for the sparse networks S1 and S2, CF did not vary significantly with the number of regulators allowed. When extra regulators were allowed, the optimization could often drive superfluous weights to near zero, so that they fell below our significance threshold and were counted as zero links.
Conclusions
Our computational studies show that functional data analysis is a powerful approach to estimating nonlinear models of gene expression dynamics, and in particular, to estimating the regulatory relationships between genes. The accuracy of FDA was comparable to state of the art approaches on both the gap gene [7, 8] and IRMA [29] data sets, which have previously been analyzed by a number of methods, and it performed well on synthetic data sets generated by the GeneNetWeaver program [30]. The FDA approach is computationally efficient because it transforms the estimation problem into a decomposable multiple regression problem. This efficiency enables indepth analysis of the influences of different factors, as well as explicit exploration of all possible regulatory input combinations.
As with any estimation problem, overfittingavoidance is an important consideration. We explored L_{1}regularization as well as explicitly limiting the number of regulators allowed for each gene.
L_{1}regularization was partly successful on the sparse IRMA network, and much more successful on the sparse GeneNetWeaver networks. L_{1}regularization requires a constant, c, which determines the relative importance of accuracy of fit to the data versus model complexity (in the sense of summed absolute values of the regulatory weights). Testing a range of values for c allows us to identify links that are most important for accounting for the data. Large c means regulatory weights are highly penalized. Weights that remain significantly different from zero at large c are the best predictors, and thus represent the links in which we have the most confidence. Although there is no standard procedure for choosing a "best" value of c for this sort of data and fitting problem, empirically a value around c = 4 or c = 5 resulted in the highest accuracy of network reconstruction. That a common value worked for all networks is, no doubt, partly due to the similar scales of the expression data (after normalization) and the similar numbers of candidate regulators per gene. Still, it is surprising that common values of c emerged despite quite different numbers of time series for each network and different densities of regulatory links in the networks.
Explicitly evaluating all possible combinations of regulators allows one to see which combinations are the best predictors. In particular, this allows one to identify the best 1input model of each gene, the best 2input model, and so on. So, it provides another means for determining which candidate regulators are most important. At the same time, it reveals whether there are alternative solutions of nearly equal quality, and generally gives a more in depth view of the contributions of different regulators, especially when used in conjunction with visualizations methods, as shown in the Results section.
The approach that we have described for using FDA to estimate nonlinear differential equation models of gene expression dynamics can be extended in various ways. One important extension would be to accommodate genetic perturbation data, such as knockouts, knockdowns or overexpression conditions. In the case of a complete knockout, this is readily handled by hardwiring expression of the knockedout gene to zero in the model and otherwise fitting the data as usual. However, for partial knockdowns or overexpressions of unknown or timevarying magnitudes, more sophisticated procedures are needed. Another relevant extension would be to allow for delays in the differential equations. Cantone et al., for example, suggest that delays may be relevant to modeling their system [29]. For known delays, the FDA framework extends trivially to accommodate delay differential equations, but when delays are unknown, more elaborate extensions are needed [21]. Finally, another natural generalization to explore would be to Bayesian parameter estimation frameworks. Because FDA reduces the parameter estimation problem to one of nonlinear regression, standard methods for approximate computation of posteriors over parameters in nonlinear regression could be applied [34]. Alternatively, if one is interested in Markovchain Monte Carlo [35] or reversiblejump Markovchain Monte Carlo [36] approaches to Bayesian parameter and/or network structure estimation for genetic networks, then the efficiency of evaluating the data likelihood under an FDA model, and the decomposition of the problem into separate genes, should be of great advantage.
Methods
Data smoothing
To obtain the temporal derivatives of the time series data, it is necessary to obtain a functional representation of the data. We constructed continuoustime series by interpolating the data with cubic splines, as implemented in the Matlab Spline Toolbox. This toolbox also includes a function to compute the derivatives from the spline. Cubic splines are not wholly defined by the data, but also depend on assumptions at or near the boundaries–in our case, the start of the time series and the end of the time series. The default approach taken by the Matlab's spline function is to use the "notaknot" assumption, which states that the third derivative of the spline function should be continuous at the second knot point and the nexttolast knot point [37]. Matlab offers other approaches for completing cubic splines. In pilot studies, we tried the default (notaknot) approach, natural cubic splines (which have second derivatives equal to zero at the endpoints; Matlab calls this the "variational" approach), and Matlab's "complete" approach (which sets first derivatives at the endpoints based on an estimate from the function values at the nearest four knots). We found that these different methods for completing the cubic splines had only small effects on the interpolated curves and negligible effects on parameter estimates for our models. So, throughout this paper we used the default notaknot approach.
For the IRMA and GeneNetWeaver data sets, we also experimented with smoothing the data first, using the smooth function of Matlab, but this did not affect results significantly. For the Drosophila data, we smoothed/simplified the data by eliminating certain time and space points before interpolating with cubic splines. The space and time points that we eliminated were selected by an evolutionary strategy that sought to minimize a criterion that combined the squared error between the original data and the cubicsplined interpolation at the same point and a penalty for small fluctuations in the derivative.
Fitting details
Minimization of the E_{0} or E_{1} criteria was done by the Matlab function fmincon. For each optimization, we did 1000 runs from different randomized starting conditions, initializing parameters uniformly within their allowed intervals. For Drosophila the expression data ranged between 0 and 255, regulatory weights were constrained to [–0.1, 0.1], production rates to [0, 25], and decay rates to [0,10]. The bias term was fixed at –3.5, following previous work [8]. For the IRMA and GeneNetWeaver networks, the expression data was multiplied by 100, so that it fell in the range [0, 100]. Weights were constrained to [–0.2,0.2], production terms to [0, 25], decay terms to [0,10] and bias terms to [–25, 25]. Of the 1000 runs, the one resulting in the lowest error was reported as the solution. (Typically many runs found solutions with nearly the same weights and nearly the same error. We never observed two clearly distinct solutions of equal or near equal quality.) Weights less than 0.006 in magnitude were considered zero, and were otherwise considered positive or negative, depending on their sign. For the enumerations, we used the same fitting procedure except that links that were not part of the optimization were contrained to be zero.
GeneNetWeaver
For the GeneNetWeaver [30] experiments, we used the program 30 times to generate networks of seven genes. From these, we chose the two sparse networks and the two dense networks to generate data. The data were produced according to the rules of the DREAM5 contest, except without noise. Numerous papers have addressed the removal of noise from this data as part of inference (e.g., [32, 38]). The program generates time series by applying a perturbation from the steady state for a period of time T, and then removing the perturbation and letting the network relax back towards the steady state for an equal period of time. We used the second half of the time series, as it describes the wildtype (unperturbed) behavior of the network.
List of abbreviations used
 ODE:

= Ordinary differential equation
 FDA:

= Functional data analysis
 CSF:

= Correct sign fraction
 PPV:

= Positive predictive value
 IRMA:

= A synthetic gene network created in yeast and reported by Cantone et al.[29]
 TSNI:

= A fitting algorithm identified as the bestperforming among several alternatives investigated by Cantone et al.[29] on the IRMA data
 DREAM:

= An annual contest on reverse engineering
Declarations
Acknowledgements
We thank Gareth Palidwor for computing and Matlab support. This work was funded in part by grants from the National Sciences and Engineering Research Council of Canada, and the Ottawa Hospital Research Institute. GS was supported in part by a training grant from the Ontario Ministry of Research and Innovation, through its Ontario Research Fund  Research Excellence program.
This article has been published as part of BMC Genomics Volume 11 Supplement 4, 2010: Ninth International Conference on Bioinformatics (InCoB2010): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/14712164/11?issue=S4.
Authors’ Affiliations
References
 Yuh CH, Bolouri H, Davidson EH: Genomic cisregulatory logic: experimental and computational analysis of a sea urchin gene. Science. 1998, 279: 18961902. 10.1126/science.279.5358.1896.View ArticlePubMed
 Von Dassow G, Meir E, Munro EM, Odell GM: The segment polarity network is a robust developmental module. Nature. 2000, 406: 188192. 10.1038/35018085.View ArticlePubMed
 Albert R, Othmer HG: The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. Journal of Theoretical Biology. 2003, 223: 118. 10.1016/S00225193(03)000353.View ArticlePubMed
 Ma W, Lai L, Ouyang Q, Tang C: Robustness and modular design of the Drosophila segment polarity network. Molecular Systems Biology. 2006, 2: 10.1038/msb4100111.
 Li F, Long T, Lu Y, Ouyang Q, Tang C: The yeast cellcycle network is robustly designed. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (14): 478110.1073/pnas.0305937101.PubMed CentralView ArticlePubMed
 Fauré A, Thieffry D: Logical modelling of cell cycle control in eukaryotes: a comparative study. Mol. BioSyst. 2009, 5: 15691581. 10.1039/b907562n.View ArticlePubMed
 Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, Manu , Myasnikova E, VanarioAlonso CE, Samsonova M, Sharp DH, Reinitz J: Dynamic control of positional information in the early Drosophila embryo. Nature. 2004, 430: 368371. 10.1038/nature02678.View ArticlePubMed
 Jaeger J, Blagov M, Kosman D, Kozlov KN, Manu , Myasnikova E, Surkova S, VanarioAlonso CE, Samsonova M, Sharp DH, Reinitz J: Dynamical analysis of regulatory interactions in the gap gene system of Drosophila melanogaster. Genetics. 2004, 167: 17211737. 10.1534/genetics.104.027334.PubMed CentralView ArticlePubMed
 D'Haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear Modeling of mRNA Expression Levels During CNS Development and Injury. In Proceedings of the Pacific Symposium on Biocomputing. 1999, 4152.
 De Hoon M, Imoto S, Kobayashi K, Ogasawara N, Miyano S: Inferring gene regulatory networks from timeordered gene expression data of Bacillus subtilis using differential equations. Pacific Symposium on Biocomputing 2003: Kauai, Hawaii, 37 January 2003. 2002, World Scientific Pub Co Inc, 17full_text.View Article
 Janssens H, Hou S, Jaeger J, Kim A, Myasnikova E, Sharp D, Reinitz J: Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene. Nature Genetics. 2006, 36: 11591165. 10.1038/ng1886.View Article
 Jaeger J, Sharp DH, Reinitz J: Known maternal gradients are not sufficient for the establishment of gap domains in Drosophila melanogaster. Mechanisms of Development. 2007, 124: 108128. 10.1016/j.mod.2006.11.001.View ArticlePubMed
 Nahmad M, Glass L, Abouheif E: The dynamics of developmental system drift in the gene network underlying wing polyphenism in ants: a mathematical model. Evolution & Development. 2008, 10 (3): 360374. 10.1111/j.1525142X.2008.00244.x.View Article
 Mendoza L, AlvarezBuylla ER: Dynamics of the Genetic Regulatory Network for Arabidopsis thaliana Flower Morphogenesis. Journal of Theoretical Biology. 1998, 193: 307319. 10.1006/jtbi.1998.0701.View ArticlePubMed
 Huang S, Eichler G, BarYam Y, Ingber D: Cell fates as highdimensional attractor states of a complex gene regulatory network. Physical review letters. 2005, 94 (12): 12870110.1103/PhysRevLett.94.128701.View ArticlePubMed
 Huang S, Guo Y, May G, Enver T: Bifurcation dynamics in lineagecommitment in bipotent progenitor cells. Developmental Biology. 2007, 305 (2): 695713. 10.1016/j.ydbio.2007.02.036.View ArticlePubMed
 Akutsu T, Kuhara S, Maruyama O, Miyano S: Identification of Gene Regulatory Networks by Strategic Gene Disruptions and Gene Overexpressions. In Proceedings of the Ninth A CMSIAM Symposium on Discrete Algorithms. 1998, 695702.
 Avery L, Wasserman S: Ordering gene function: the interpretation of epistasis in regulatory hierarchies. Trends in Genetics. 1992, 8 (9): 312316. 10.1016/01689525(92)902634.PubMed CentralView ArticlePubMed
 Yeung MKS, Tegner J, Collins JJ: Reverse engineering gene networks using singular value decomposition and robust regression. Proceedings of the National Academy of Sciences of the USA. 2002, 99: 61636168. 10.1073/pnas.092576199.PubMed CentralView ArticlePubMed
 Bansal M, di Bernardo D: Inference of gene networks from temporal gene expression profiles. IET Syst. Biol. 2007, 1 (5): 306312. 10.1049/ietsyb:20060079.View ArticlePubMed
 Ramsay JO, Silverman BW: Functional Data Analysis. 1997, SpringerView Article
 Yao F, Lee T: Penalized spline models for functional principal component analysis. Royal Statistical Society Series B Statistical Methodology. 2006, 68: 310.1111/j.14679868.2005.00530.x.View Article
 Leng X, Muller H: Classification using functional data analysis for temporal gene expression data. Bioinformatics. 2006, 22: 6810.1093/bioinformatics/bti742.View ArticlePubMed
 Song J, Lee H, Morris J, Kang S: Clustering of timecourse gene expression data using functional data analysis. Computational biology and chemistry. 2007, 31 (4): 265274. 10.1016/j.compbiolchem.2007.05.006.PubMed CentralView ArticlePubMed
 Ando T, Imoto S, Miyano S: Functional data analysis of the dynamics of gene regulatory networks. 2005, Springer, 6983.
 OpgenRhein R, Strimmer K: Inferring gene dependency networks from genomic longitudinal data: a functional data approach. REVSTATStatistical Journal. 2006, 4: 5365.
 Ramsay J, Hooker G, Campbell D, Cao J: Parameter estimation for differential equations: a generalized smoothing approach. Journal of the Royal Statistical SocietySeries B. 2007, 69 (5): 741796. 10.1111/j.14679868.2007.00610.x.View Article
 Hastie TJ, Tibshirani RJ: Generalized Additive Models. 1990, Chapman & Hall/CRC
 Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, di Bernardo M, di Bernardo D, Cosma MP: A Yeast Synthetic Network for In Vivo Assessment of ReverseEngineering and Modeling Approaches. Cell. 2009, 137: 172181. 10.1016/j.cell.2009.01.055.View ArticlePubMed
 Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology. 2009, 16 (2): 229239. 10.1089/cmb.2008.09TT.View ArticlePubMed
 Poustelnikova E, Pisarev A, Blagov M, Samsonova M, Reinitz J: A database for management of gene expression data in situ. Bioinformatics. 2004, 20: 22122221. 10.1093/bioinformatics/bth222.View ArticlePubMed
 Marbach D, Prill R, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences. 2010, 107 (14): 628610.1073/pnas.0913357107.View Article
 Perkins TJ, Jaeger J, Reinitz J, Glass L: Reverse Engineering the Gap Gene Network of Drosophila melanogaster. PLoS Computational Biology. 2006, 2 (5): e5110.1371/journal.pcbi.0020051.PubMed CentralView ArticlePubMed
 Bishop CM: Pattern Recognition and Machine Learning. 2007, Springer
 Doucet A, de Freitas N, Gordon NJ: Sequential Monte Carlo Methods in Practice. 2001, SpringerVerlagView Article
 Green P: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1998, 82: 711732. 10.1093/biomet/82.4.711.View Article
 De Boor C: A practical guide to splines. 2001, Springer Verlag
 Yip K, Alexander R, Yan K, Gerstein M: Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS ONE. 2010
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.