 Research
 Open Access
 Published:
A novel algorithm for finding optimal driver nodes to target control complex networks and its applications for drug targets identification
BMC Genomics volume 19, Article number: 924 (2018)
Abstract
Background
The advances in target control of complex networks not only can offer new insights into the general control dynamics of complex systems, but also be useful for the practical application in systems biology, such as discovering new therapeutic targets for disease intervention. In many cases, e.g. drug target identification in biological networks, we usually require a target control on a subset of nodes (i.e., diseaseassociated genes) with minimum cost, and we further expect that more driver nodes consistent with a certain wellselected network nodes (i.e., priorknown drugtarget genes).
Results
Therefore, motivated by this fact, we pose and address a new and practical problem called as target control problem with objectivesguided optimization (TCO): how could we control the interested variables (or targets) of a system with the optional driver nodes by minimizing the total quantity of drivers and meantime maximizing the quantity of constrained nodes among those drivers. Here, we design an efficient algorithm (TCOA) to find the optional driver nodes for controlling targets in complex networks. We apply our TCOA to several realworld networks, and the results support that our TCOA can identify more precise driver nodes than the existing controlfucus approaches. Furthermore, we have applied TCOA to two bimolecular expertcurate networks. Source code for our TCOA is freely available from http://sysbio.sibcb.ac.cn/cb/chenlab/software.htm or https://github.com/WilfongGuo/guoweifeng.
Conclusions
In the previous theoretical research for the full control, there exists an observation and conclusion that the driver nodes tend to be lowdegree nodes. However, for target control the biological networks, we find interestingly that the driver nodes tend to be highdegree nodes, which is more consistent with the biological experimental observations. Furthermore, our results supply the novel insights into how we can efficiently target control a complex system, and especially many evidences on the practical strategic utility of TCOA to incorporate prior drug information into potential drugtarget forecasts. Thus applicably, our method paves a novel and efficient way to identify the drug targets for leading the phenotype transitions of underlying biological networks.
Backgroud
One final goal of our efforts is to control the complex systems in our daily life. In the past decades, plenty of attentions [1,2,3,4,5] have been paid into the study of the structures and dynamics of complex networked systems, especially biological systems. A frontier area of the research in network science and engineering is controlling complex networks, such as biological molecule networks. Since the law of some systems is hidden, it is difficult to study the controllability of the nonlinear systems directly, especially for the large scale biological systems. However, it is possible to obtain local analytical results for the controllability of nonlinear systems by developing control schemes of linear dynamic systems. Nearly decades of efforts on the controllability of linear dynamic networks, not only review a sufficient condition for “local controllability” of a nonlinear system about a trim point but also result in tremendous advances in our understanding of the problem of controlling complex networked dynamical systems [5,6,7,8,9]. In a recent breakthrough, an efficient algorithm with low polynomial time was provided for computing the minimal quantity of input nodes needed to control any given largescale directed network [6]. But, it was also shown that in the case of sparse inhomogeneous networks, such as most of the networks emerging from biochemical and biomedical applications, controlling the entire system is expensive. On the other hand, in terms of practical applications in many cases, it is enough to control only a certain wellselected portion of the network’s nodes, such as the set of essential genes, in order to impose a certain overall behaviour over the system. Thus, an interesting question, known as target control problem of complex networks, is posed that how can we chose the driver variables from the system to control a subset of the whole nodes (or a subsystem) about a trim point [8].
However, the traditional framework of network control can only be applicable for the simple networks, and it can not address the target control problem of the large scale of networks. To solve the problem, Wu et al. has proposed a method to solve the target control problem by constructing a weighted bipartite network [10]. But this method may fail when there does not exist a perfect matching in most cases. Meanwhile, Gao et al. proposed another method which offers an approximation on the minimum set of input nodes for target controlling the networks [8]. However, the above researches only focus on controlling the system through any minimum drivernode set and ignore the existence of multiple candidate drivernode sets for control a targeted subset of the network. When we actually expect to control the system with objectives optimization, the different drivernode sets may not participate in target control equally. This consideration prompts us to study how to find the desired solution for target controlling complex networks with objectives optimization. A practice of this consideration can come from our aim for combinatorial drug target identification: we not only consider how to control the diseaseassociated genes with the minimum driver nodes, but also expect that more driver nodes can be consistent with the set of wellknown drugtarget genes. Here, we pose a new target control problem with the objectivesguided for finding the optimal driver nodes that minimize the total quantity of drivers and also maximize the quantity of constrained nodes within the drivers.
In this paper, we develop a novel algorithm (TCOA) to identify the drivers for efficiently controlling targets in complex networks. Our algorithm consists of three steps: We first construct the target control tree of the network by finding the maximum matching in the constructed iterated bipartite graph or “linking and dynamic graph” and identify the controllable targets of each node by obtaining its reachable target nodes in the control tree; Then we find the set of optional driver nodes by using the integer linear programming to optimize a regulated factor, which is introduced to balance the quantity of driver nodes and the quantity of driver nodes within the set of constrained (or preselected) nodes; Finally we define the maximum matching of the constructed iterated bipartite graph or “linking and dynamic graph” as a Markov chain and use a Markov chain Monte Carlo (MCMC) approach to sample from the sets of all possible maximum matching. We have evaluated TCOA on several realworld networks, and the experiment results support that TCOA outperform existing controlfocus approaches. Especially, we have also applied TCOA algorithm to analyze the PPI signaling transduction networks in pancreatic cancer, and Inflammatory bowel disease network from KEGG. The results further illustrate that our TCOA can efficiently identify the driver nodes with more optional property to guarantee the system target controllable, compared to several controlfocus approaches. In addition, the experiment results on the two biological cases can also supply an efficient bioinformatics tool to identify the drug targets for leading the phenotype transitions of underlying biological networks.
Methods
Problem formulation
Since the law of the some network dynamics, such as the biological networks is hidden, it is difficult to directly study the controllability of the nonlinear networks. Most complex systems are characterized by nonlinear interactions between the components, and usually local properties can be verified [11, 12]. Thus, it is possible to obtain local analytical results for the controllability of nonlinear systems [13, 14]. Here, we review a sufficient condition for “local controllability” of a nonlinear system about a trim point. A system is “locally controllable” if there exists a neighborhood in the state space such that all initial conditions in that neighborhood are controllable to all other elements in the neighborhood with locally bounded trajectories [15]. Considering a dynamic system governed by a set of ordinary differential equations,
where the function f(x) denotes the system’s dynamics. x ∈ R^{N} and y ∈ R^{NO} represent nodes state and target (output) nodes state; The element B_{ ij } in B∈ R^{N*NC} represents whether the node v_{ i } among V = {v_{1},v_{2},…,v_{ N }} is inputed by the jth signal [16,17,18]. C∈R^{NO*N} represents the output matrix.
We are interested in how to find proper matrix B to gurantee the system (1) locally target (or output) controllable through the input u = [u(1),u(2),..,u(NC)]. Let x_{0} be defined as, f(x_{ 0 }) = 0,A(x_{ 0 })=∂f(x_{ 0 })/∂x and G(x_{ 0 }) = [CB CAB …CA^{N1}B], where f(x_{ 0 }) = 0 provides the system’s steady state, A(x_{ 0 }) represents the system’s local (linear) dynamics around a trim point x_{0} and G(x_{ 0 }) guarantees the system local target controllable. The dynamics in (1) are locally target controllable around x_{ 0 } if rank (G(x_{ 0 })) = NO [13, 14]. Therefore the local target controllability analysis of (1) about a trim point therefore reduces to the linear target controllability analysis of (2),
The dynamics in Eq. 1 are deemed “locally structurally target controllable” if the linearized dynamics in Eq. 2 are structurally target controllable. And the linearized dynamics in Eq. 2 is structurally target controllable if the follow equation is satisfied when we can choose the nonzero values in A and B,
In a given directed network with nodes V = {v_{1},v_{2},…,v_{ N }}, let O and D be the set of target nodes and the driver nodes, assuming that we expect more driver nodes could be constrained in a set Q, where both O and D and Q are the subset of V. The output matrix and the input matrix can be set as C = [I(1); I(2);…I(NO)] and B = [I(b_{1}); I(b_{2});…I(b_{ d })]; I(i) represents ith row of N*N unit matrix I, {b_{1},b_{2},…,b_{ k }} is the index of identified driver nodes set D. For the purposes of this work, the adjacency matrix of the network is used to find the structure of A in Eq. 2. Given the constrained nodes set Q, we focus on how to find a suitable driver nodes set D such that
where D denotes the quantity of nodes in the identified driver nodes set D and ‖Q ∩ D‖ represents the quantity of drivers in the constrained or preselected nodes set Q; the objective functions f_{1} and f_{2} aim to find the optional drivers with minimum quantity of drivers D and maximum quantity of drivers in the preselected nodes set Q respectively.
However, there are no existing methods to efficiently solve the problem. For example, in Fig. 1, for simple network 1 we want to control the target nodes O = {v_{3},v_{4}, v_{6},v_{7}} with the minimum the quantity of driver nodes and also expect to maximum the identified driver nodes within the constrained nodes Q = {v_{2},v_{4}}. But Liu’s approach [6] and Gao’s approach [8] fail to find the optional driver nodes set for target controlling the O = {v_{3},v_{4},v_{6},v_{7}} (see Fig. 1).To overcome the limitations of the existed approaches for the target control problem in complex networks, we develop a novel objective optimization algorithm (TCOA). The key consideration of our TCOA to solve the problem (4) is that 1) find the controllable targets of each network node without destroying target controllable of the whloe system; 2) to extract the optimal driver nodes of the network by using objectivesguided optimization with integer linear programming and Markov chain Monte Carlo (MCMC). The former is to guarantee the identified subset satisfying the constrained condition. And the latter is to guarantee the subset to be optimal for the two objectives.
The framework of our TCOA
Our TCOA presents an algorithm for detecting driver nodes that can best control a network. We adopt the paradigm of local (linear) controllability. Different from the related works, our goal is not only to minimize the quantity of drivers, but also to maximize the quantity of drivers within a given preselected subset. The consideration for our TCOA is in the adding of different and more efficient strategies to find the optimal driver nodes of the graph. The TCOA algorithm consists of three steps: i) Identifying the controllable subsystem by constructing target control tree; ii) Finding the optional drivers with the Integer Linear Programming (ILP); iii) Further optimizing the driver set by using MCMC samplings. The details of TCOA are introduced in bellows.
Step 1 Identifying target controllable subspace by constructing target control tree
Definition 1 The target control tree is defined as a multilayers network where the nodes in the bottom layer represent the target nodes and the nodes in the other layers consists of the upstream control nodes.
Note that in our previous reseach [19] we have defined the upstream control nodes as the nodes which can control the target nodes in network G(V,E).To efficiently obtain the target control tree for controlling targets in a complex network, a algorithm (greedy algorithm) is designed to construct the target control tree as is listed in Table 1. Note that in our greedy algorithm, we use the “linking and dynamic graph” to represent the iterated bipartite graph. The relationship between maximum matching in the “linking and dynamic graph” and “the target control tree” can be explained as follow: at each iterated bipartite graph among the “linking and dynamic graph”, we can obtain the maximum matching, which result in a subgraph; in the subgraph, the maximum matching determines which paired nodes can be connected in “the target control tree”. The maximum matching of a given general bipartite graph can be efficiently obtained by using HopcroftKarp algorithm [20, 21]. Since HopcroftKarp algorithm runs in O(\( \sqrt{\mid \left\mathrm{V}\right\mid } \)*E) time in the network G(V,E), the whole maximum complexity of the greedy algorithm is in O(r*\( \sqrt{\mid \left\mathrm{V}\right\mid } \)*E), where r is the iteration times when we obtain the updated bipartite graph.
Theorem (Target controllable subsystem identification theorem) In the target control tree, the target node v _{ j } among the bottom layer could be controlled by the node v _{ i } among the up layer if node v _{ j } is accessible from node v _{ i } .
This result of our theorem (the details of proof are in supplementary note 1 of Additional file 1), can allow us to find the controllable targets of node v_{ i }, denoted by TCS(v_{ i }). The term “control” means that when we act control signals on the node v_{ i }, the nodes state in TCS(v_{ i }) can be changed to any stable or unstable targte state from any initial state at a finite time.Following on our theorem, in the target control tree, we can apply the Breadth First Search algorithm (BFS) [22, 23] to obtain the controllable targets of node v_{ i },TCS(v_{ i }).The procedure of BFS is shown in Table 2, Obviously, the maximum complexity of the BFS algorithm for identifying the controllable targets of all nodes in the target control tree TCT ≡ (V_{ TCT }, E_{ TCT }), where V_{TCT} are the nodes and E_{ TCT } are the edges in TCT, is in the order of O(V_{ TCT }*E_{ TCT }) [22, 23].
On the other hand, Liu et al. determined the controllable subsystem of any node in a network via linear programming [24], while Wang et al. propose a concept called control range to identify the controllable subsystem [25]. However, the existing two methods are still not efficient to identify the target controllable subsystem. In Figs. 2 and 3 we give an intuitive explanation to explain how we find the controllable targets of each network node. As shown in Fig. 2, we want to control the state transition of {v_{3}, v_{4}, v_{6}}. By using Liu’s approach [24] and Wang’s approach [25], they both identify the controllable targets of v_{1} is {v_{3}, v_{4} }. However, by using our method, the identified target controllable subsystem of node v_{1} is {v_{3}, v_{4}, v_{6}} (see Figs. 2 and 3).
Step 2 Identifying the optional driver nodes by using integer linear programming
We first introduce an outlier measurement on a set of driver nodes that quantifies the quantity of driver nodes outside the preselected nodeset,
where x_{ i } = 1 when a node v_{ i } belongs to the driver set and ∑x_{ i } denotes the quantity of driver nodes; y_{ j } = 1 when node v_{ j } belongs to the preselected set D and ∑y_{ j } denotes the quantity of driver nodes in the preselected nodeset. To take into account both the quantity of driver nodes and the quantity of driver nodes in the preselected set, we define the weight
Note that W(M) is only one candidate measurement of the tradeoff between the quantity of driver nodes and the quantity of driver nodes in the preselected set. After obtaining the controllable targets of each network node and the weight of the driver nodeset, the optional driver nodes guaranteeing the constrained target controllable can be approximately determined by the following Integer Programming model,
In the problem (5), the function (5a) is to get the optional driver nodeset with the minimum quantity of driver nodes and the maximum quantity of driver nodes in the preselected nodes set. The constraint (5b) aims that at least one driver node could control a target node. The constraint (5c) points that the value of y_{ j } is equal to that of x_{ j } if node v_{ j } belongs to the preselected set Q. In fact, the problem (5) could be efficiently solved for thousands of variables with a LPbased classic branch and bound method [26, 27].
Step 3 Optimizing the driver nodes by MCMC samplings
Here we define the maximum matching of the “linking and dynamic graph” as a Markov chain. The state space of the Markov chain is the set of all the possible maximum matching of the “linking and dynamic graph”. In our greedy algorithm, for a given node different sets of controllable targets could be found when we obtain different maximum matchings (e.g. the red edges in the Fig. 3). That is, for different Markov chains, the different sets of driver nodes with different weight W(M) could be obtained. The optimal different Markov chains need to be sampled from the state space, so that, a Markov Chain Mont Carlo method (MCMC) [28] is used. The MCMC approach samples sets of maximum matching, with the probability of sampling a set M proportional to the weight W(M) of the set. Thus, the frequencies of the maximum matching sets in the MCMC method provides a ranking of maximum matching sets, in which the sets are ordered by decreasing sampling frequency. The advantages will prove useful in analysis of real network data below.
The basic idea of MCMC, implying on our objective optimization, is to build a Markov chain whose states are the collections of k adjoin paths connecting to the target nodes in the “linking and dynamic graph” and to define transitions between states that differ by one target node. With the MetropolisHastings algorithm [29], we sample sets of maximum matching M⊆G of k adjoin paths in the iterated bipartite graph with a stationary distribution that is proportional to exp.(c*W(M)) for some c > 0, which gives a desired stationary distribution on the state space.The advantage will prove useful in analysis of real mutation data below. The MCMC method is described as follows:
Initialization: By using greedy algorithm, obtain the initial Markov chain M_{ 0 };
Iteration: For t = 1, 2,…, obtain M_{t + 1} from M_{ t } as follows,

Choosing a path w uniformly at random in M_{t}. Then choose randomly an edge e_{k0} inside the path. Delete the edge e_{k0} in the chosen bipartite graph B_{k0} among the “linking and dynamic graph”. For the augmenting path, when we alternate unmatched and matched links (Supplementary note 2 in Additional file 1), we can obtain a new matched edge e’_{k0} in the chosen iteration k_{ 0 }, which result in a new matched path v among the “linking and dynamic graph”. Then a new Markov Chain M_{t+1} = M_{t} \ {w} ∪ {v} has been obtained.

Accepting the new Markov chain M_{t + 1} with the probability p(M_{ t },M_{ t + 1 }) = min[1,exp.(c*W(M_{ t + 1 })c*W(M_{ t })), else reject it.
We will terminate the procedure of the MCMC sampling when the value of W(M) converge within 100 iteration time units. Otherwise, the search process is terminated if the iteration time exceeds the fixed default value N_{max} = 1000. We have explored different values of c, and also use c = 10 in numerical experiments, which we found empirically to give the best tradeoff between the exploration of different sets and the convergence to sets with high weight W(M) on the simulated data and the pancreatic cancer data. The effect of parameter on the convergence of weight W(M) on the simulated datasets and two biological networks are provided in Supplementary note 3 and Supplementary note 4 of Additional file 1.
The complexity analysis of TCOA
The TCOA method contains three parts:
(i) For constructing the target control tree for the network G(V,E), we apply the developed greedy algorithm, to find the maximum matching of the “linking and dynamic graph”. In fact, the developed greedy algorithm runs in the order of \( \mathrm{O}\left({\mathrm{r}}^{\ast }{\sqrt{\left\Vert \mathrm{V}\right\Vert}}^{\ast}\left\Vert \mathrm{E}\right\Vert \right) \), where r denotes the iteration times in the iterated bipartite graph.
(ii) In phase of finding the controllable targets of all network nodes, we apply the BFS algorithm to the constructed target control tree. Therefore the maximum complexity of the BFS algorithm for finding the controllable targets of all nodes is in the order of O(V_{ TCT }*E_{ TCT }), where V_{ TCT } denotes the nodes and E_{ TCT } represents the edges in target control tree TCT (V_{ TCT } = V, E_{ TCT } < =E) .
(iii) For the phase of finding the optional driver nodes set, the optional driver nodes set is obtained by using the integer linear programming. Specifically, we used a branchandbound algorithm, an automatic method with a greedy O(log(V)) of solving discrete programming problems, as implemented by function intlinprog of the MATLAB programming language to solve our binary integerprogramming problem [30]. To find more optional solutions, MCMC sampling method has been adopted, resulting in the overall complexity of our TCOA approach \( \mathrm{O}\Big({\mathrm{m}}^{\ast}\left({\mathrm{r}}^{\ast }{\sqrt{\left\Vert \mathrm{V}\right\Vert}}^{\ast}\left\Vert \mathrm{E}\right\Vert +{\left\Vert \mathrm{V}\right\Vert}^{\ast}\left\Vert {\mathrm{E}}_{\mathrm{TCT}}\right\Vert +\log \left\Vert \mathrm{V}\right\Vert \right) \), which can be approximately considered as O(m^{∗}‖V‖^{∗}‖E_{TCT}‖) where m is sampling number, and E_{ TCT } is the edges set of the target control tree.
Results
Experiment results of realworld networks
To evaluate the target control efficiency on an arbitrary network, we first introduce two factors αand β, which represent the ratio of the target nodes and constrained (or preselected) nodes to the whole network nodes respectively; To target control the target nodes O(α), our TCOA can identify the optional nodes set D(α, β) with the minimum driver nodes and the maximum quantity of driver nodes contained in a given constrained set Q(β).‖D(α, β)‖/‖O(α)‖ denotes the ratio of the quantity of identified drivers to the quatity of targets, and ‖D(α, β) ∩ Q‖/‖D(α, β)‖ denotes the ratio of the quantity of identified drivers among constrained (or preselected) nodes to the quantity of all the identified drivers.
Then we introduce two target controllability parameters. One is the average ratio of drivers to targets,
which reflects the cost of controlling targets in the complex network. And another parameter is the average ratio of constrained (or preselected) nodes in the drivers to all the drivers,
which reflects the verifiability of identified driver nodes in target controlling the network.
Note that whenα = 1, β = 1, both of E_{1} and E_{2} can be reduced to the fraction of drivers to control the full networks. When 0 < α < 1, β = 1, E_{1} is reduced to the fraction of driver nodes to control the target nodes, and E_{2} = 1. However in our paper, we focus on the problem that when 0 < α < 1, 0 < β < 1, how to identify optional driver nodes to minimize the measure E_{1} and to maximize the measure E_{1}. In fact, we have selected α = 0.1,0.2,…,1 andβ = 0.1,0.2,…,1, and applied our TCOA to calculate the two target controllability parameters E_{1} and E_{2}. We have obtained the data of the realworld networks from [7, 8], and for the convenience we provide the data description in the (Additional file 2: Table S1). The results on these real networks are listed in the Table 3.
Obviously, we can conclude that TCOA can efficiently identify the driver nodes with optional property to guarantee the system target controllable, compared to the existing method. However with increasing size of Q, our TCOA is receiving more and more guidance, and is expected to outperform Gao’s method, which does not take a constrained set as input. From the Table 3, we find our TCOA can not only find more driver nodes contained in the constrained nodes set Q but also detects the less quantity of driver nodes.
The novelty of our TCOA is the proposed analysis framework consisting of target control tree, ILP model and MCMC sampling for improving efficiency. In addition to the algorithm comparision between TCOA and other existing methods, we have also carried on more comparisons to investigate the contribution of ILP and MCMC in TCOA.To evaluate the advantage of the ILP and MCMC sampling, we list the result of our TCOA without MCMC sampling (only with ILP) and the result of our TCOA. From Table 3, we find that our TCOA can perform better than the TCOA only with ILP but without MCMC sampling, supporting strongly the efficiency of the MCMC sampling. We also find that our TCOA can achieve better results than Gao’s method even without MCMC sampling.
Case studies on PPI signaling transduction networks in pancreatic cancer
As further evidences of the applicability of TCOA, we have carried TCOA on PPI signaling transduction networks in pancreatic cancer. The main cause of cancer is genetic and epigenetic alterations, which allow normal cells to overproliferate as tumor cells [31]. To comprehensively understand the specificity in signaling networks, we have to understand how distinct pathways communicate with each other and how proteins of one pathway make interactions with related signaling components. Here, to understand the various informationprocessing abilities employed during the molecular alteration of the cancerous cells [32], we obtain directed PPI network of 1569 interactions from 991 nodes in pancreatic cancer. The directed PPI cancer data, uses SIGNOR (SIGnaling Network Open Resource) database [33], which outputs binary matrix representations for the usedprovided protein lists and allows us to create directed graphs between signaling entities. The networks are available in Network Controlability Project [34] or seen (Additional file 3: Table S2). In our paper, in total 1507 approved proteins (or genes) by the Food and Drug Administration (FDA) have been selected as the constrained (or preselected) nodes in the directed PPI network which will have a priorknown molecule mechanism, see (Additional file 4: Table S3). As is well known, only a subset of these alterations called essential proteins, from the hundreds of genomic alterations in various biological pathways [35, 36], are driving the disease initiation and progression. In the Ref. [34], researchers collected essential gene data for all cancer from the COLTCancer database [37]. In particular, they considered the HPAFII cell lines for pancreatic cancer, and follow the GARP (Gene Activity Rank Profile) and GARPP value of corresponding proteins mentioned in the database. Since previous studies showed that proteins with lower GARP score are more essential and directly associated with oncogenesis [38], they selected only those essential proteins whose GARP value is in the negative range, and moreover, whose GARPP value is less than 0.05 (p < = 0.05). Following the above criteria, they identified 168 essential proteins available in the SIGNOR PPI network database in pancreatic cancer are selected as the targets to be controlled by the input signals.The essential proteins data can be seen in (Additional file 5: Table S4).
Our TCOA focus on how to identify the optional driver proteins with the minimum quantity of drivers and the maximum of the constrained FDAapproved proteins, to control the essential target proteins. We have also applied Liu’s method [6] to control the full network and apply Gao’s method [8] and our method to obtain the driver nodes to target control the network. The results seen in Table 4, indicate that we can identify less quantity of driver nodes by using TCOA compared to Liu’s method [6] and Gao’s method [8]. Furthermore, among the driver nodes, we can also obtain more drug targetable nodes.
Furthermore, in Supplementary note 5 of Additional file 1., we also give the capacity [39] and the corresponding clinical information of the identified driver proteins by using our algorithm TCOA. In the Supplementary note 5 of Additional file 1, we have analyzed the drugtargetable proteins identified by TCOA as part of the strategies to control the cancer essential proteins, and we found that most of the TOP20 proteins could be a direct target in cancer therapy. We also look for anticancer drugs for the drug target proteins identified by TCOA, whose results are also listed in Supplementary note 5 of Additional file 1. We find that in some cases they have been used in current cancer typespecific drugs and drugtherapies. Among the 42 identified driver genes, 34 of them have not been previously reported as the drug targets. This suggests that our TCOA will be very useful in identifying potential drug targets.
Case studies on inflammatory bowel disease network from KEGG
The causes of the common forms of idiopathic Inflammatory bowel disease (IBD) remain unclear though considerable progress [40]. Here, we utilize the network, in KEGG [41] as is listed in (Additional file 6: Table S5). The network consits of 4798 nodes and 105,606 directed and undirected edges (or bidirected edges). To identify the drug targets, in total 702 approved proteins (or genes) by the Food and Drug Administration (FDA) have been selected in the network from the Drug Bank database [42], see (Additional file 7: Table S6) as the constrained nodes set in our TCOA.
In this study, we consider the genes in the Inflammatory bowel disease (IBD) pathways which is listed in (Additional file 6: Table S5) as the target nodes in our TCOA. We apply both Gao’s algorithm [8] and our algorithm to analyze the target controllability of the network related with Inflammatory bowel disease (IBD). The results are shown in Table 4, and we find that the driver nodes for control the whole network is more than the target nodes and is not necessary to control the full network by using Liu’s algorithm [6]. Furthermore the quantity of driven nodes needed for the control of target genes is actually much smaller than that of Gao’s method [8] according to our method analysis. Our TCOA also found sets of driver nodes containing more drug targetable nodes, meanwhile Liu’s method [6] and Gao’s method [8] cannot detect drug targetable nodes as drivers, which indicate the applicability of TCOA. In addition, we calculate the frequency that each network node acts as a driver in the phase of MCMC sampling (or the control capacity [39]) as shown in Fig. 4. As is seen, STAT3, IL22, MAF and TLR5 has higher probability to be potential drivers to change the states of diseaserelated genes. Furthermore the existed researches have reported that IL22 and TLR5 can be a therapy target for IBD [43]. These results suggest STAT3 and MAF can be future drug targets for IBD therapy.
General topological properties of driver nodes in inflammatory pancreatic cancer network and bowel disease network
We have analyzed several topological properties of the drugtarget proteins included by TCOA in the set of driven nodes. We calculated the average degree, the average betweenness centrality of these drugtarget proteins by using our target control scheme, which are compared with Liu’s full control scheme [6] and the average values over the entire networks. We find that, in the disease networks, the drugtarget driver nodes would have higher average degree than the average values over the entire networks as shown in Fig. 5a. This shows that the driver nodes tend to be highdegree nodes for target control the networks; In addition to the summarized results on the two biological networks, we have illustrated the network information of the validated results in (supplementary note 6 of Additional file 1: Figure S5). From Additional file 1: Figure S5, we also found that the nodes with higher capacity have higher node degrees in the network and it also proved our statistic results in Fig. 5.
By using the full control scheme in the previous theoretical research [6], there exists an observation and conclusion that the driver nodes tend to be lowdegree nodes as shown in Fig. 5a. However, for target control the biological networks, we interestingly find that the driver nodes tend to be highdegree nodes as shown in Fig. 5a, which is more consistent with the biological experimental results. The drugtarget driven nodes also tend to have a higher average betweeness centrality as shown in Fig. 5b, which indicates that driver nodes would act as highlytraversed bridges in networks. By contrast, driver nodes display different weights on closeness centrality on particular disease network, as shown in Fig. 5c, which would mean the modularity around driver nodes would have significant rewiring in different conditions [32].
Discussions
In fact, in our previous research, we studied another target control problem, called constrained target control (CTC) problem [19], which focuses on how to choose minimal drivers only within the set of constrained control nodes to change the states of the maximal targets. Different from CTC, we do not require that all the selected driver nodes must be in the constrained nodes set and our consideration for TCO has a double optimization to minimize the total quantity of driven nodes (on which a subsequent intervention is needed) and to maximize the percentage of constrained nodes among them (on which the findings are consistent with priorknown knowledge). Our results supply the novel insights into how we can efficiently target control a complex system, and especially many evidences on the practical strategic utility of TCOA to incorporate prior drug information into potential drugtarget forecasts. However, this study is limited to focus on how to obtain the state transitions of the linear networks. It is more practical and necessary to target control the system with nonlinear dynamic in the future.
Conclusions
It is rather difficult to study how to control a complex network, because we often do not know the true functional form of the underlying dynamics, such as biological networks. However, most systems operate near homeostasis, so in this study, we pose target control problem with objectivesguided optimization (TCO) and also provide a novel algorithm (TCOA) to study the local structural control of inherent nonlinear networks, which is more practical to target control the complex networks than the existing methods. In this work, the application of our new control tool TCOA provides more precision predictions compared to the existing methods on the study of structural target control of networks. Particularly, our work supports that the target control tools actually provide an efficient way to control a network through known drugtarget nodes, in the cases of diseaseassociated networks. In addition, this work supplies a better understanding of the diseaseassociated biochemical networks and opens a new way to recover the drugtarget based control mechanisms. This in turn could advance the future studies of various disease diagnostic techniques based on network, e.g., network biomarkers [44,45,46,47] and dynamic network biomarkers [48,49,50,51,52], efficient therapeutic approaches and personalized medicine [53].
Abbreviations
 CTC:

Constrained target control
 IBD:

Inflammatory bowel disease
 ILP:

Integer linear programming
 KEGG:

Kyoto encyclopedia of genes and genomes
 MCMC:

Markov chain Monte Carlo
 TCO:

Target control problem with objectivesguided optimization
References
Newman M, Barabasi AL, Watts DJ. The structure and dynamics of networks. Princeton: Princeton University Press; 2011.
Newman ME. The structure and function of complex networks. SIAM Rev. 2003;45(2):167–256.
Barabási A. The network takeover. Nat Phys. 2011;8(8):14–6.
Watts DJ, Strogatz SH. Collective dynamics of ‘smallworld’ networks. Nature. 1998;393(6684):440.
Guo WF, Zhang SW. A general method of community detection by identifying community centers with affinity propagation. Physica A. 2016;447:508–19.
Liu YY, Slotine JJ, Barabási AL. Controllability of complex networks. Nature. 2011;473(7346):167–73.
Wu FX, Wu L, Wang J, Liu J, Chen L. Transittability of complex networks and its applications to regulatory biomolecular networks. Sci Rep. 2014;4:4819. https://doi.org/10.1038/srep04819.
Gao J, Liu YY, D'Souza RM, Barabási AL. Target control of complex networks. Nat Commun. 2014;5:5415. https://doi.org/10.1038/ncomms6415.
Ruths J, Ruths D. Control profiles of complex networks. Science. 2014;343(6177):1373–6.
Wu L, Shen Y, Li M, Wu FX. Network output controllabilitybased method for drug target identification. IEEE Trans Nanobioscience. 2015;14(2):184–91.
Hermann R, Krener AJ. Nonlinear controllability and observability. IEEE Trans Autom Control. 1977;22(5):728–40.
Slotine JJE, Li W. Applied nonlinear control, vol. 199. Englewood Cliffs: Prenticehall; 1991.
Sontag ED, Boyd S. Mathematical control theory: deterministic finitedimensional. vol. 6. Germany: Springer Science & Business Media; 2013.
Sastry SS. Nonlinear systems: analysis, stability, and control, vol. 10. Germany: Springer Science & Business Media; 2013.
Vinayagam A, Gibson TE, Lee HJ, Yilmazel B, Roesel C, Hu Y, Kwon Y, Sharma A, Liu YY, Perrimon N. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc Natl Acad Sci. 2016;113(18):4976–81.
Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126(4):663–76.
Kim JB, Sebastiano V, Wu G, AraúzoBravo MJ, Sasse P, Gentile L, Ko K, Ruau D, Ehrich M, van den Boom D. Oct4induced pluripotency in adult neural stem cells. Cell. 2009;136(3):411–9.
Vierbuchen T, Ostermeier A, Pang ZP, Kokubu Y, Südhof TC, Wernig M. Direct conversion of fibroblasts to functional neurons by defined factors. Nature. 2010;463(7284):1035–41.
Guo WF, Zhang SW, Wei ZG, Zeng T, Liu F, Zhang J, Wu FX, Chen L. Constrained target controllability of complex networks. J Stat Mech Theory Exp. 2017;2017(6):063402.
Hopcroft JE, Karp RM. An n^5/2 algorithm for maximum matchings in bipartite graphs. SIAM J Comput. 1973;2(4):225–31.
Kuhn HW. The Hungarian method for the assignment problem. Nav Res Logist Q. 1955;2(1–2):83–97.
Zhu Y, Cheung TY. A new distributed breadthfirstsearch algorithm. Inf Process Lett. 1987;25(5):329–34.
Park J, Tokura N, Masuzawa T, Hagihara KI. An efficient distributed algorithm for constructing a breadthfirst search tree. Syst Comput Japan. 1989;20(10):15–30.
Liu YY, Slotine JJ, Barabási AL. Control centrality and hierarchical structure in complex networks. PLoS One. 2012;7(9):e44459.
Wang B, Gao L, Gao Y. Control range: a controllabilitybased index for node significance in directed networks. J Stat Mech Theory Exp. 2012;2012(04):P04011.
Wolsey LA, Nemhauser GL. Integer and combinatorial optimization. USA: Wiley; 2014.
Wolsey LA. Integer programming, vol. 42. New York: Wiley; 1998.
Vandin F, Upfal E, Raphael BJ. De novo discovery of mutated driver pathways in cancer. Genome Res. 2012;22(2):375–85.
Chib S, Greenberg E. Understanding the metropolishastings algorithm. Am Stat. 1995;49(4):327–35.
Lund C, Yannakakis M. On the hardness of approximating minimization problems. J ACM. 1994;41(5):960–81.
Sever R, Brugge JS. Signal transduction in cancer. Cold Spring Harb Perspect Med. 2015;5(4):a006098.
Zeng T, Wang DC, Wang X, Xu F, Chen L. Prediction of dynamical drug sensitivity and resistance by module network rewiringanalysis based on transcriptional profiling. Drug Resist Updat. 2014;17(3):64–76.
Perfetto L, Briganti L, Calderone A, Perpetuini AC, Iannuccelli M, Langone F, Licata L, Marinkovic M, Mattioni A, Pavlidou T. SIGNOR: a database of causal relationships between biological entities. Nucleic Acids Res. 2015;44(Database issue):D548–54.
Kanhaiya K, Czeizler E, Gratie C, Petre I. Controlling directed protein interaction networks in cancer. Sci Rep. 2017;7(1):10327.
Kolch W, Halasz M, Granovskaya M, Kholodenko BN. The dynamic control of signal transduction networks in cancer cells. Nat Rev Cancer. 2015;15(9):515–27.
Zañudo JG, Albert R. Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput Biol. 2014;11(4):e1004193.
Koh JLY, Brown KR, Sayad A, Kasimer D, Ketela T, Moffat J. COLTcancer: functional genetic screening resource for essential genes in human cancer cell lines. Nucleic Acids Res. 2012;40(Database issue):957–63.
Marcotte R, Brown KR, Suarez F, Sayad A, Karamboulas K, Krzyzanowski PM, Sircoulomb F, Medrano M, Fedyshyn Y, Koh JL. Essential gene profiles in breast, pancreatic, and ovarian cancer cells. Cancer Discov. 2012;2(2):172.
Jia T, Barabási AL. Control capacity and a random sampling method in exploring controllability of complex networks. Sci Rep. 2013;3:2354.
Podolsky DK. Inflammatory bowel disease. N Engl J Med. 1991;325(13):928–37.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014;42(D1):D1091–7.
Melmed GY, Targan SR. Future biologic targets for IBD: potentials and pitfalls. Nat Rev Gastroenterol Hepatol. 2010;7(2):110–7.
Zhang W, Zeng T, Liu X, Chen L. Diagnosing phenotypes of singlesample individuals by edge biomarkers. J Mol Cell Biol. 2015;7(3):231.
Liu X, Wang Y, Ji H, Aihara K, Chen L. Personalized characterization of diseases using samplespecific networks. Nucleic Acids Res. 2016;44(22):e164.
Liu X, Liu ZP, Zhao XM, Chen L. Identifying disease genes and module biomarkers by differential interactions. J Am Med Inform Assoc. 2012;19(2):241–8.
Zhao J, Zhou Y, Zhang X, Chen L. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci U S A. 2016;113(18):5130.
Chen L, Liu R, Liu ZP, Li M, Aihara K. Detecting earlywarning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep. 2012;2(7391):342.
Liu R, Wang X, Aihara K, Chen L. Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers. Med Res Rev. 2014;34(3):455.
Liu R, Chen P, Aihara K, Chen L. Identifying earlywarning signals of critical transitions with strong noise by dynamical network markers. Sci Rep. 2015;5:17501.
Liu X, Chang X, Liu R, Yu X, Chen L, Aihara K. Quantifying critical states of complex diseases using singlesample dynamic network biomarkers. PLoS Comput Biol. 2017;13(7):e1005633.
Li M, Li C, Liu WX, Liu C, Cui J, Li Q, Ni H, Yang Y, Wu C, Chen C. Dysfunction of PLA2G6 and CYP2C44associated network signals imminent carcinogenesis from chronic inflammation to hepatocellular carcinoma. J Mol Cell Biol. 2017; https://doi.org/10.1093/jmcb/mjx021.
Zeng T, Zhang W, Yu X, Liu X, Li M, Chen L. Bigdatabased edge biomarkers: study on dynamical drug sensitivity and resistance in individuals. Brief Bioinform. 2016;17(4):576–92.
Acknowledgements
We thank Professor Wu FangXiang from University of Saskatchewan, Saskatoon, Canada for giving our useful and valuable comments.
Funding
The publication costs of this article was funded by the National key research and development program of China (No. 2017YFA0505500), the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (No. XDB13040700), the National Key R&D Program  Special Project on Precision Medicine (2016YFC0903400), the National Natural Science Foundation of China (91,430,111, 61,473,232, 91,439,103, 91,529,303, 31,200,987, 81,471,047 and 61,170,134), and the Natural Science Foundation of Shanghai (17ZR1446100).
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information files.
About this supplement
This article has been published as part of BMC Genomics Volume 19 Supplement 1, 2018: 16th International Conference on Bioinformatics (InCoB 2017): Genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume19supplement1.
Author information
Affiliations
Contributions
WFG, TZ and LNC developed the methodology. WFG and QQS and CMZ executed the experiments, WFG and TZ wrote this paper. WFG, SWZ, TZ and LNC revised the manuscript. SWZ and LNC supervised the work, made critical revisions of the paper, and approved the submission of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Supplementary material of A novel algorithm for finding optimal driver nodes to target control complex networks and its applications for drug targets identification. (DOC 2101 kb)
Additional file 2:
Descriptions of real world networks. (XLSX 12 kb)
Additional file 3:
FDA genes in directed PPI network. (XLSX 27 kb)
Additional file 4:
Directed PPI interaction informations. (XLS 37 kb)
Additional file 5:
Targets genes related with pancreatic. (XLSX 12 kb)
Additional file 6:
KEGG 244pathways network related with IBD disease. (XLSX 2411 kb)
Additional file 7:
FDA genes in pathways network. (XLSX 18 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Guo, WF., Zhang, SW., Shi, QQ. et al. A novel algorithm for finding optimal driver nodes to target control complex networks and its applications for drug targets identification. BMC Genomics 19, 924 (2018). https://doi.org/10.1186/s128640174332z
Published:
DOI: https://doi.org/10.1186/s128640174332z
Keywords
 Target control
 Objectivesguided optimization
 Drug targets
 Dynamics of complex networks