Finding gene regulations is an important objective of systems biology [1, 2]. Causal gene regulatory interactions are widely described using gene regulatory networks. Estimating gene regulatory networks can help reveal complicated regulations.

Recently, microarray [3, 4] has rapidly produced a wealth of information about gene expression activities. The volume of data necessitates computational methods to identify and analyze the underlying gene regulatory networks [5]. A number of analytical methods have been proposed to estimate gene regulatory networks from gene expression profiles. Boolean networks, graphical Gaussian models (GGM), differential equation models, and Bayesian networks (BNs) are widely used models.

A Boolean network is a discrete dynamical network [6, 7]. In a Boolean network, the state of a gene is represented by a Boolean variable (ON or OFF) and interactions between the genes are represented by Boolean functions that determine the state of a gene on the basis of the states of certain other genes. Hence, continuous gene expression data must be transformed into binary data before a Boolean network can be estimated, and much information is lost in this binary encoding. As gene expression cannot be described adequately by only two states, Boolean networks are limited by their definition.

A GGM is an undirected probabilistic graphical model [8]. This model allows the identification of conditional independence relations among the nodes under the assumption of a multivariate Gaussian distribution of the data. In a GGM, regulations between genes are estimated by calculating the correlation between pairs of variables. Therefore, the GGM does not identify the direction of regulatory relationships between two genes, but rather only calculates the correlations between their gene expression data.

A differential equation model describes gene expression changes as a function of the expression of other genes and environmental factors [9–11]. Their flexibility allows the complex relations among components to be described. In a differential equation model, a gene regulation is described as the function of several gene expression levels. When the input data includes experimental noise, this model cannot estimate the gene regulatory network accurately. Also, if there is not sufficient data input, overfitting occurs.

BN is a graphical model for representing probabilistic relationships among a set of random variables [12–16]. These relationships are encoded in the structure of a directed acyclic graph whose nodes are the random variables. The relationships between the variables are described by a joint probability distribution. In a BN, causal interactions between more than three genes can be estimated. BN has advantages over the above models in applications where BN deals better with the experimental noise.

Using a BN, it is hard to estimate a large-scale network because the search space grows exponentially as the number of genes increases. Therefore, overcoming this problem has been the focus of much research. The proposed solutions to this problem can be divided into three types. The first type limits the number of estimated genes. Even when estimating a large-scale network, part of the network is often attracted. The second type parallelizes the estimation by supercomputer or other high-performance computer. Effective parallelizing makes it possible to estimate large-scale networks. The third type improve the algorithm itself. These methods reduce computational time and estimate the network by a heuristic.

An example of the first type of solution is proposed by Peña *et al.* [17]. This method overcomes the problem of the user having to decide in advance which genes are included in or excluded from the learning process. The method receives a seed gene *S* and a positive integer *R* from the user, and returns a BN. It starts the BN from *S* genes, then adds the parents and children of all the genes in the BN *R + 1* times, and prunes some genes. In this way, the user avoids deciding in advance which genes to include.

A solution of the second type proposed by Tamada *et al.* [18] can estimate gene regulatory networks consisting of more than 20,000 genes from gene expression data. The method uses a supercomputer, and it is massively parallelized. It repeatedly estimates subnetworks by hill climbing in parallel for genes selected by *neighbor node sampling*. The method high-handedly overcomes the problem of the BN by using the supercomputer. Even if a supercomputer can effectively provide a large-scale network, an estimation method designed to run on a workstation is also required.

A solution of the third type for estimating gene regulatory networks was implemented by Bøttcher *et al.* [19]: the greedy hill climbing (GHC) method. By comparing networks that differ only by a single directed edge, either added, removed, or reversed, a GHC method can estimate networks of larger scale than a search of all possible networks and do so on a workstation rather than a supercomputer, thus overcoming two problems at once. However, the estimation accuracy of this method is not high, because the method tends to produce only local optimal solutions.

In this paper, we present a novel BN-based deterministic method with reduced computational time to overcome the above-mentioned problems. The proposed method can estimate a network as large-scale as those estimated by the GHC method, run on a workstation, and estimate more accurately than the GHC method. We take another approach to estimate more accurately than the GHC method. First, our method generates all the combinational subsets with three genes. Then, we estimate all possible networks for each subset using the BN method and unite the networks into a single network including all genes. This approach enables us to estimate more accurately for the same computational time than the GHC method.

In order to verify the effectiveness of the proposed method, we perform two experiments, to evaluate scalability and accuracy: i.e., one to verify the proposed method can estimate networks as large-scale as those estimated by the GHC method, and one to verify it can estimate more accurately than the GHC method. These experiments are performed using randomly sampled genes. In addition, we conduct a third experiment to confirm that our method outperforms the GHC method using real data.