A novel algorithm for finding optimal driver nodes to target control complex networks and its applications for drug targets identification

Background The advances in target control of complex networks not only can offer new insights into the general control dynamics of complex systems, but also be useful for the practical application in systems biology, such as discovering new therapeutic targets for disease intervention. In many cases, e.g. drug target identification in biological networks, we usually require a target control on a subset of nodes (i.e., disease-associated genes) with minimum cost, and we further expect that more driver nodes consistent with a certain well-selected network nodes (i.e., prior-known drug-target genes). Results Therefore, motivated by this fact, we pose and address a new and practical problem called as target control problem with objectives-guided optimization (TCO): how could we control the interested variables (or targets) of a system with the optional driver nodes by minimizing the total quantity of drivers and meantime maximizing the quantity of constrained nodes among those drivers. Here, we design an efficient algorithm (TCOA) to find the optional driver nodes for controlling targets in complex networks. We apply our TCOA to several real-world networks, and the results support that our TCOA can identify more precise driver nodes than the existing control-fucus approaches. Furthermore, we have applied TCOA to two bimolecular expert-curate networks. Source code for our TCOA is freely available from http://sysbio.sibcb.ac.cn/cb/chenlab/software.htm or https://github.com/WilfongGuo/guoweifeng. Conclusions In the previous theoretical research for the full control, there exists an observation and conclusion that the driver nodes tend to be low-degree nodes. However, for target control the biological networks, we find interestingly that the driver nodes tend to be high-degree nodes, which is more consistent with the biological experimental observations. Furthermore, our results supply the novel insights into how we can efficiently target control a complex system, and especially many evidences on the practical strategic utility of TCOA to incorporate prior drug information into potential drug-target forecasts. Thus applicably, our method paves a novel and efficient way to identify the drug targets for leading the phenotype transitions of underlying biological networks. Electronic supplementary material The online version of this article (doi: 10.1186/s12864-017-4332-z) contains supplementary material, which is available to authorized users.


Backgroud
One final goal of our efforts is to control the complex systems in our daily life. In the past decades, plenty of attentions [1][2][3][4][5] have been paid into the study of the structures and dynamics of complex networked systems, especially biological systems. A frontier area of the research in network science and engineering is controlling complex networks, such as biological molecule networks. Since the law of some systems is hidden, it is difficult to study the controllability of the nonlinear systems directly, especially for the large scale biological systems. However, it is possible to obtain local analytical results for the controllability of nonlinear systems by developing control schemes of linear dynamic systems. Nearly decades of efforts on the controllability of linear dynamic networks, not only review a sufficient condition for "local controllability" of a nonlinear system about a trim point but also result in tremendous advances in our understanding of the problem of controlling complex networked dynamical systems [5][6][7][8][9]. In a recent breakthrough, an efficient algorithm with low polynomial time was provided for computing the minimal quantity of input nodes needed to control any given large-scale directed network [6]. But, it was also shown that in the case of sparse inhomogeneous networks, such as most of the networks emerging from biochemical and biomedical applications, controlling the entire system is expensive. On the other hand, in terms of practical applications in many cases, it is enough to control only a certain wellselected portion of the network's nodes, such as the set of essential genes, in order to impose a certain overall behaviour over the system. Thus, an interesting question, known as target control problem of complex networks, is posed that how can we chose the driver variables from the system to control a subset of the whole nodes (or a subsystem) about a trim point [8].
However, the traditional framework of network control can only be applicable for the simple networks, and it can not address the target control problem of the large scale of networks. To solve the problem, Wu et al. has proposed a method to solve the target control problem by constructing a weighted bipartite network [10]. But this method may fail when there does not exist a perfect matching in most cases. Meanwhile, Gao et al. proposed another method which offers an approximation on the minimum set of input nodes for target controlling the networks [8]. However, the above researches only focus on controlling the system through any minimum driver-node set and ignore the existence of multiple candidate drivernode sets for control a targeted subset of the network. When we actually expect to control the system with objectives optimization, the different driver-node sets may not participate in target control equally. This consideration prompts us to study how to find the desired solution for target controlling complex networks with objectives optimization. A practice of this consideration can come from our aim for combinatorial drug target identification: we not only consider how to control the diseaseassociated genes with the minimum driver nodes, but also expect that more driver nodes can be consistent with the set of well-known drug-target genes. Here, we pose a new target control problem with the objectives-guided for finding the optimal driver nodes that minimize the total quantity of drivers and also maximize the quantity of constrained nodes within the drivers.
In this paper, we develop a novel algorithm (TCOA) to identify the drivers for efficiently controlling targets in complex networks. Our algorithm consists of three steps: We first construct the target control tree of the network by finding the maximum matching in the constructed iterated bipartite graph or "linking and dynamic graph" and identify the controllable targets of each node by obtaining its reachable target nodes in the control tree; Then we find the set of optional driver nodes by using the integer linear programming to optimize a regulated factor, which is introduced to balance the quantity of driver nodes and the quantity of driver nodes within the set of constrained (or pre-selected) nodes; Finally we define the maximum matching of the constructed iterated bipartite graph or "linking and dynamic graph" as a Markov chain and use a Markov chain Monte Carlo (MCMC) approach to sample from the sets of all possible maximum matching. We have evaluated TCOA on several real-world networks, and the experiment results support that TCOA outperform existing control-focus approaches. Especially, we have also applied TCOA algorithm to analyze the PPI signaling transduction networks in pancreatic cancer, and Inflammatory bowel disease network from KEGG. The results further illustrate that our TCOA can efficiently identify the driver nodes with more optional property to guarantee the system target controllable, compared to several control-focus approaches. In addition, the experiment results on the two biological cases can also supply an efficient bioinformatics tool to identify the drug targets for leading the phenotype transitions of underlying biological networks.

Problem formulation
Since the law of the some network dynamics, such as the biological networks is hidden, it is difficult to directly study the controllability of the nonlinear networks. Most complex systems are characterized by nonlinear interactions between the components, and usually local properties can be verified [11,12]. Thus, it is possible to obtain local analytical results for the controllability of nonlinear systems [13,14]. Here, we review a sufficient condition for "local controllability" of a nonlinear system about a trim point. A system is "locally controllable" if there exists a neighborhood in the state space such that all initial conditions in that neighborhood are controllable to all other elements in the neighborhood with locally bounded trajectories [15]. Considering a dynamic system governed by a set of ordinary differential equations, where the function f(x) denotes the system's dynamics.
x ∈ R N and y ∈ R NO represent nodes state and target (output) nodes state; The element B ij in B∈ R N*NC represents whether the node v i among V = {v 1 ,v 2 ,…,v N } is inputed by the j-th signal [16][17][18]. C∈R NO*N represents the output matrix.
We are interested in how to find proper matrix B to gurantee the system (1) locally target (or output) controllable through the input u = [u(1),u(2),..,u(NC)]. Let x 0 be defined as, where f(x 0 ) = 0 provides the system's steady state, A(x 0 ) represents the system's local (linear) dynamics around a trim point x 0 and G(x 0 ) guarantees the system local target controllable. The dynamics in (1) are locally target controllable around x 0 if rank (G(x 0 )) = NO [13,14]. Therefore the local target controllability analysis of (1) about a trim point therefore reduces to the linear target controllability analysis of (2), The dynamics in Eq. 1 are deemed "locally structurally target controllable" if the linearized dynamics in Eq. 2 are structurally target controllable. And the linearized dynamics in Eq. 2 is structurally target controllable if the follow equation is satisfied when we can choose the non-zero values in A and B, max rank CB; CAB; CA 2 B; …; In a given directed network with nodes V = {v 1 ,v 2 ,…,v N }, let O and D be the set of target nodes and the driver nodes, assuming that we expect more driver nodes could For the purposes of this work, the adjacency matrix of the network is used to find the structure of A in Eq. 2. Given the constrained nodes set Q, we focus on how to find a suitable driver nodes set D such that where ||D|| denotes the quantity of nodes in the identified driver nodes set D and ‖Q ∩ D‖ represents the quantity of drivers in the constrained or pre-selected nodes set Q; the objective functions f 1 and f 2 aim to find the optional drivers with minimum quantity of drivers D and maximum quantity of drivers in the pre-selected nodes set Q respectively. However, there are no existing methods to efficiently solve the problem. For example, in Fig. 1, for simple network 1 we want to control the target nodes O = {v 3 ,v 4 , v 6 ,v 7 } with the minimum the quantity of driver nodes and also expect to maximum the identified driver nodes within the constrained nodes Q = {v 2 ,v 4 }. But Liu's approach [6] and Gao's approach [8] fail to find the optional driver nodes set for target controlling the O = {v 3 ,v 4 ,v 6 ,v 7 } (see Fig. 1).To overcome the limitations of the existed approaches for the target control problem in complex networks, we develop a novel objective optimization algorithm (TCOA). The key consideration of our TCOA to solve the problem (4) is that 1) find the controllable targets of each network node without destroying target controllable of the whloe system; 2) to extract the optimal driver nodes of the network by using objectives-guided optimization with integer linear programming and Markov chain Monte Carlo (MCMC). The former is to guarantee the identified subset satisfying the constrained condition. And the latter is to guarantee the subset to be optimal for the two objectives.
The framework of our TCOA Our TCOA presents an algorithm for detecting driver nodes that can best control a network. We adopt the paradigm of local (linear) controllability. Different from the related works, our goal is not only to minimize the quantity of drivers, but also to maximize the quantity of drivers within a given pre-selected subset. The consideration for our TCOA is in the adding of different and more efficient strategies to find the optimal driver nodes of the graph. The TCOA algorithm consists of three steps: i) Identifying the controllable subsystem by constructing target control tree; ii) Finding the optional drivers with the Integer Linear Programming (ILP); iii) Further optimizing the driver set by using MCMC samplings. The details of TCOA are introduced in bellows.
Step 1 Identifying target controllable subspace by constructing target control tree Definition 1 The target control tree is defined as a multi-layers network where the nodes in the bottom layer represent the target nodes and the nodes in the other layers consists of the upstream control nodes.
Note that in our previous reseach [19] we have defined the upstream control nodes as the nodes which can control the target nodes in network G(V,E).To efficiently obtain the target control tree for controlling targets in a complex network, a algorithm (greedy algorithm) is designed to construct the target control tree as is listed in Table 1. Note that in our greedy algorithm, we use the "linking and dynamic graph" to represent the iterated bipartite graph. The relationship between maximum matching in the "linking and dynamic graph" and "the target control tree" can be explained as follow: at each iterated bipartite graph among the "linking and dynamic graph", we can obtain the maximum matching, which result in a sub-graph; in the sub-graph, the maximum matching determines which paired nodes can be connected in "the target control tree". The maximum matching of a given general bipartite graph can be efficiently obtained by using Hopcroft-Karp algorithm [20,21]. Since where r is the iteration times when we obtain the updated bipartite graph.
Theorem (Target controllable subsystem identification theorem) In the target control tree, the target node v j among the bottom layer could be controlled by the node v i among the up layer if node v j is accessible from node v i .
This result of our theorem (the details of proof are in supplementary note 1 of Additional file 1), can allow us Demonstration of the limitations of the existed methods for the target control problem with objectives-guided optimization. a Two simple networks. In the two networks, the target set is {v 3 ,v 4 , v 6 ,v 7 } and {v 3 ,v 4 , v 6 } respectively (highlighted in green) and the constrained nodes set is {v 2 ,v 4 } and {v 1 } respectively (shape in hexagon). Here we want to minimum the quantity of driver nodes to control the target nodes set {v 3 ,v 4 , v 6 ,v 7 } and {v 3 ,v 4 , v 6 } (i.e., disease-associated genes) and maximum the identified driver nodes within the constrained nodes (i.e., practical constraints as prior known drug targets). b By applying full control of Liu's method to the two networks, we can identify the unmatched nodes {v 1 ,v 3 ,v 5 } and {v 1 ,v 2 } (nodes within the blue circle) in the right side of the bipartite graph transferred from the directed network, as the driver nodes(more details seen in ref. [6]). c By using target control of Gao's method, they first obtain the updated bipartite graph by choosing the nodes in the left side in the previous matching (highlighted in grape) as the nodes in the right side of the current matching (highlighted in green) and then calculate a maximum matching in the updated bipartite graph. Finally they add unmatched nodes (nodes within the blue circle) in right side of the updated bipartite graph to the set of driver nodes (more details seen in ref. [8]), which identify the set of driver nodes of the two networks as {v 1 ,v 3 ,v 4 } and {v 1 ,v 2 }. In the simple example 1, according to the k-walk theory in Re. [8], it is easy to know that node v 2 can control v 3 and v 6 and v 4 can control v 4 and v 6 . For the simple example 2, based on the fact that when we remove a link it will not decrease the quantity of driver nodes. For example, when we remove the link from node v 4 to node v 3 in Fig. 2, according to the k-walk theory in Re. [8], it is easy to know that node v 1 can control v 3 , v 4 and v 6 . Therefore, the nodes {v 2 ,v 4 } and {v 1 } as the optional driver nodes are ignored by the existed methods for the two networks to find the controllable targets of node v i , denoted by TCS(v i ). The term "control" means that when we act control signals on the node v i , the nodes state in TCS(v i ) can be changed to any stable or unstable targte state from any initial state at a finite time.Following on our theorem, in the target control tree, we can apply the Breadth First Search algorithm (BFS) [22,23] to obtain the controllable targets of node v i ,TCS(v i ).The procedure of BFS is shown in Table 2, Obviously, the maximum complexity of the BFS algorithm for identifying the controllable targets of all nodes in the target control tree TCT ≡ (V TCT , E TCT ), where V TCT are the nodes and E TCT are the edges in TCT, is in the order of O(||V TCT ||*||E TCT ||) [22,23].
On the other hand, Liu et al. determined the controllable subsystem of any node in a network via linear programming [24], while Wang et al. propose a concept called control range to identify the controllable subsystem [25]. However, the existing two methods are still not efficient to identify the target controllable subsystem. In Figs. 2 and 3 we give an intuitive explanation to explain how we find the controllable targets of each network node. As shown in Fig. 2, we want to control the state transition of {v 3 , v 4 , v 6 }. By using Liu's approach [24] and Wang's approach [25], they both identify the controllable targets of v 1 is {v 3 , v 4 }. However, by using our method, the identified target controllable subsystem of node v 1 is {v 3 , v 4 , v 6 } (see Figs. 2 and 3).
Step 2 Identifying the optional driver nodes by using integer linear programming We first introduce an outlier measurement on a set of driver nodes that quantifies the quantity of driver nodes outside the pre-selected node-set, where x i = 1 when a node v i belongs to the driver set and ∑x i denotes the quantity of driver nodes; y j = 1 when node v j belongs to the pre-selected set D and ∑y j denotes the quantity of driver nodes in the pre-selected node-set.
To take into account both the quantity of driver nodes and the quantity of driver nodes in the pre-selected set, we define the weight Note that W(M) is only one candidate measurement of the trade-off between the quantity of driver nodes and the quantity of driver nodes in the pre-selected set. After obtaining the controllable targets of each network node and the weight of the driver node-set, the optional driver nodes guaranteeing the constrained target controllable can be approximately determined by the following Integer Programming model, s:t: In the problem (5), the function (5a) is to get the optional driver node-set with the minimum quantity of driver nodes and the maximum quantity of driver nodes in the pre-selected nodes set. The constraint (5b) aims that at least one driver node could control a target node. The constraint (5c) points that the value of y j is equal to that of x j if node v j belongs to the pre-selected set Q. In Table 1 Our greedy algorithm for constructing target control tree for network G(V, E) Input: Network G(V, E), target nodes O Initialize: B 0 ←(R 0 , L 0 ) //the right side R 0 is made of target nodes O, and the left side L 0 is made of nodes from which the targes could be reachable. m 0 ←Matching(B 0 ) //Find maximum matching While L n ≠ ∅ (n ≥ 1) do R n ← L n-1 //Let the set of nodes in L n-1 to be the new R n set B n ←(R n , L n ) //get a new bipartite graph B n . m n ←Matching(B n ) //Calculate a maximum matching m n in B n V n = V n-   fact, the problem (5) could be efficiently solved for thousands of variables with a LP-based classic branch and bound method [26,27].
Step 3 Optimizing the driver nodes by MCMC samplings Here we define the maximum matching of the "linking and dynamic graph" as a Markov chain. The state space of the Markov chain is the set of all the possible maximum matching of the "linking and dynamic graph". In our greedy algorithm, for a given node different sets of controllable targets could be found when we obtain different maximum matchings (e.g. the red edges in the Fig. 3). That is, for different Markov chains, the different sets of driver nodes with different weight W(M) could be obtained. The optimal different Markov chains need to be sampled from the state space, so that, a Markov Chain Mont Carlo method (MCMC) [28] is used. The MCMC approach samples sets of maximum matching, with the probability of sampling a set M proportional to the weight W(M) of the set. Thus, the frequencies of the maximum matching sets in the MCMC method provides a ranking of maximum matching sets, in which the sets are ordered by decreasing sampling frequency. The advantages will prove useful in analysis of real network data below.
The basic idea of MCMC, implying on our objective optimization, is to build a Markov chain whose states are the collections of k adjoin paths connecting to the target nodes in the "linking and dynamic graph" and to define transitions between states that differ by one target node. With the Metropolis-Hastings algorithm [29], we sample sets of maximum matching M G of k adjoin paths in the iterated bipartite graph with a stationary distribution that is proportional to exp.(c*W(M)) for some c > 0, which gives a desired stationary distribution on the state space.The advantage will prove useful in analysis of real mutation data below. The MCMC method is described as follows: Initialization: By using greedy algorithm, obtain the initial Markov chain M 0 ;

The complexity analysis of TCOA
The TCOA method contains three parts: (i) For constructing the target control tree for the network G(V,E), we apply the developed greedy algorithm, to find the maximum matching of the "linking and dynamic graph". In fact, the developed greedy algorithm runs in the order of O r Ã ffiffiffiffiffiffiffiffi ffi V k k p Ã E k k , where r denotes the iteration times in the iterated bipartite graph.
(ii) In phase of finding the controllable targets of all network nodes, we apply the BFS algorithm to the constructed target control tree. Therefore the maximum complexity of the BFS algorithm for finding the controllable targets of all nodes is in the order of O(||V TCT ||*||E TCT ||), where V TCT denotes the nodes and E TCT represents the edges in target control tree TCT (V TCT = V, ||E TCT || < =||E||) . (iii) For the phase of finding the optional driver nodes set, the optional driver nodes set is obtained by using the integer linear programming. Specifically, we used a branch-and-bound algorithm, an automatic method with a greedy O(log(||V||)) of solving discrete programming problems, as implemented by function intlinprog of the MATLAB programming language to solve our binary integer-programming problem [30]. To find more optional solutions, MCMC sampling method has been adopted, resulting in the overall complexity of our TCOA approach can be approximately considered as O(m * ‖V‖ * ‖E TCT ‖) where m is sampling number, and E TCT is the edges set of the target control tree.

Experiment results of real-world networks
To evaluate the target control efficiency on an arbitrary network, we first introduce two factors αand β, which represent the ratio of the target nodes and constrained (or pre-selected) nodes to the whole network nodes respectively; To target control the target nodes O(α), our TCOA can identify the optional nodes set D(α, β) with the minimum driver nodes and the maximum quantity of driver nodes contained in a given constrained set Q(β).‖D(α, β)‖/‖O(α)‖ denotes the ratio of the quantity of identified drivers to the quatity of targets, and ‖D(α, β) ∩ Q‖/‖D(α, β)‖ denotes the ratio of the quantity of identified drivers among constrained (or pre-selected) nodes to the quantity of all the identified drivers. Then we introduce two target controllability parameters. One is the average ratio of drivers to targets, which reflects the cost of controlling targets in the complex network. And another parameter is the average ratio of constrained (or pre-selected) nodes in the drivers to all the drivers, which reflects the verifiability of identified driver nodes in target controlling the network. Note that whenα = 1, β = 1, both of E 1 and E 2 can be reduced to the fraction of drivers to control the full networks. When 0 < α < 1, β = 1, E 1 is reduced to the fraction of driver nodes to control the target nodes, and E 2 = 1. However in our paper, we focus on the problem that when 0 < α < 1, 0 < β < 1, how to identify optional driver nodes to minimize the measure E 1 and to maximize the measure E 1 . In fact, we have selected α = 0.1,0.2,…,1 andβ = 0.1,0.2,…,1, and applied our TCOA to calculate the two target controllability parameters E 1 and E 2 . We have obtained the data of the real-world networks from [7,8], and for the convenience we provide the data description in the (Additional file 2: Table S1). The results on these real networks are listed in the Table 3.
Obviously, we can conclude that TCOA can efficiently identify the driver nodes with optional property to guarantee the system target controllable, compared to the existing method. However with increasing size of Q, our TCOA is receiving more and more guidance, and is expected to outperform Gao's method, which does not take a constrained set as input. From the Table 3, we find our TCOA can not only find more driver nodes contained in the constrained nodes set Q but also detects the less quantity of driver nodes.
The novelty of our TCOA is the proposed analysis framework consisting of target control tree, ILP model and MCMC sampling for improving efficiency. In addition to the algorithm comparision between TCOA and other existing methods, we have also carried on more comparisons to investigate the contribution of ILP and MCMC in TCOA.To evaluate the advantage of the ILP and MCMC sampling, we list the result of our TCOA without MCMC sampling (only with ILP) and the result of our TCOA. From Table 3, we find that our TCOA can perform better than the TCOA only with ILP but without MCMC sampling, supporting strongly the efficiency of the MCMC sampling. We also find that our TCOA can achieve better results than Gao's method even without MCMC sampling.

Case studies on PPI signaling transduction networks in pancreatic cancer
As further evidences of the applicability of TCOA, we have carried TCOA on PPI signaling transduction networks in pancreatic cancer. The main cause of cancer is genetic and epigenetic alterations, which allow normal cells to over-proliferate as tumor cells [31]. To comprehensively understand the specificity in signaling networks, we have to understand how distinct pathways communicate with each other and how proteins of one pathway make interactions with related signaling components. Here, to understand the various informationprocessing abilities employed during the molecular alteration of the cancerous cells [32], we obtain directed PPI network of 1569 interactions from 991 nodes in pancreatic cancer. The directed PPI cancer data, uses SIGNOR (SIGnaling Network Open Resource) database [33], which outputs binary matrix representations for the used-provided protein lists and allows us to create directed graphs between signaling entities. The networks are available in Network Controlability Project [34] or seen (Additional file 3: Table S2). In our paper, in total 1507 approved proteins (or genes) by the Food and Drug Administration (FDA) have been selected as the constrained (or preselected) nodes in the directed PPI network which will have a prior-known molecule mechanism, see (Additional file 4: Table S3). As is well known, only a subset of these alterations called essential proteins, from the hundreds of genomic alterations in various biological pathways [35,36], are driving the disease initiation and progression. In the Ref. [34], researchers collected essential gene data for all cancer from the COLT-Cancer database [37]. In particular, they considered the HPAF-II cell lines for pancreatic cancer, and follow the GARP (Gene Activity Rank Profile) and GARP-P value of corresponding proteins mentioned in the database. Since previous studies showed that proteins with lower GARP score are more essential and directly associated with oncogenesis [38], they selected only those essential proteins whose GARP value is in the negative range, and moreover, whose GARP-P value is less than 0.05 (p < = 0.05). Following the above criteria, they identified 168 essential proteins available in the SIGNOR PPI network database in pancreatic cancer are selected as the targets to be controlled by the input signals.The essential proteins data can be seen in (Additional file 5: Table S4).
Our TCOA focus on how to identify the optional driver proteins with the minimum quantity of drivers and the maximum of the constrained FDA-approved proteins, to control the essential target proteins. We have also applied Liu's method [6] to control the full network and apply Gao's method [8] and our method to obtain the driver nodes to target control the network. The results seen in Table 4, indicate that we can identify less quantity of driver nodes by using TCOA compared to Liu's method [6] and Gao's method [8]. Furthermore, among the driver nodes, we can also obtain more drug targetable nodes.
Furthermore, in Supplementary note 5 of Additional file 1., we also give the capacity [39] and the corresponding clinical information of the identified driver proteins by using Here, we list the network types, network name, quantity of nodes in network (N), quantity of edges in network (L), the average degree of network <k>, the average ratio of drives to the targets by using Gao's method (E 1 1 ) and our TCOA without MCMC samplings (E 2 1 ) and our TCOA (E 3 1 ) respectively and average ratio of drivers within the set of constrained (or pre-selected) nodes to all the identified drivers by using Gao's method (E 1 2 ) and our TCOA without MCMC samplings (E 2 2 ) and our TCOA (E 3 2 ) respectively. The more detail descriptions of the real-world networks including the network types, names and references, quantity of nodes and edges and brief description and the downloaded websites, are shown in Additional file 2: Table S1 our algorithm TCOA. In the Supplementary note 5 of Additional file 1, we have analyzed the drug-targetable proteins identified by TCOA as part of the strategies to control the cancer essential proteins, and we found that most of the TOP-20 proteins could be a direct target in cancer therapy. We also look for anti-cancer drugs for the drug target proteins identified by TCOA, whose results are also listed in Supplementary note 5 of Additional file 1. We find that in some cases they have been used in current cancer typespecific drugs and drug-therapies. Among the 42 identified driver genes, 34 of them have not been previously reported as the drug targets. This suggests that our TCOA will be very useful in identifying potential drug targets.

Case studies on inflammatory bowel disease network from KEGG
The causes of the common forms of idiopathic Inflammatory bowel disease (IBD) remain unclear though considerable progress [40]. Here, we utilize the network, in KEGG [41] as is listed in (Additional file 6: Table S5). The network consits of 4798 nodes and 105,606 directed and undirected edges (or bi-directed edges). To identify the drug targets, in total 702 approved proteins (or genes) by the Food and Drug Administration (FDA) have been selected in the network from the Drug Bank database [42], see (Additional file 7: Table S6) as the constrained nodes set in our TCOA.
In this study, we consider the genes in the Inflammatory bowel disease (IBD) pathways which is listed in (Additional file 6: Table S5) as the target nodes in our TCOA. We apply both Gao's algorithm [8] and our algorithm to analyze the target controllability of the network related with Inflammatory bowel disease (IBD). The results are shown in Table 4, and we find that the driver nodes for control the whole network is more than the target nodes and is not necessary to control the full network by using Liu's algorithm [6]. Furthermore the quantity of driven nodes needed for the control of target genes is actually much smaller than that of Gao's method [8] according to our method analysis. Our TCOA also found sets of driver nodes containing more drug targetable nodes, meanwhile Liu's method [6] and Gao's method [8] cannot detect drug targetable nodes as drivers, which indicate the applicability of TCOA. In addition, we calculate the frequency that each network node acts as a driver in the phase of MCMC sampling (or the control capacity [39]) as shown in Fig. 4. As is seen, STAT3, IL22, MAF and TLR5 has higher probability to be potential drivers to change the states of disease-related genes. Furthermore the existed researches have reported that IL-22 and TLR5 can be a therapy target for IBD [43]. These results suggest STAT3 and MAF can be future drug targets for IBD therapy.
General topological properties of driver nodes in inflammatory pancreatic cancer network and bowel disease network We have analyzed several topological properties of the drug-target proteins included by TCOA in the set of driven nodes. We calculated the average degree, the average betweenness centrality of these drug-target proteins by using our target control scheme, which are compared with Liu's full control scheme [6] and the average values over the entire networks. We find that, in the disease networks, the drug-target driver nodes would have higher average degree than the average values over the entire networks as shown in Fig. 5a. This shows that the driver nodes tend to be high-degree nodes for target control the networks; In addition to the summarized results on the two biological networks, we have illustrated the network information of the validated results in (supplementary note 6 of Additional file 1: Figure S5). From Additional file 1: Figure  S5, we also found that the nodes with higher capacity have higher node degrees in the network and it also proved our statistic results in Fig. 5. The columns represent the following information per disease network: Different methods, the fraction of the quantity of drivers vs. the target nodes (f1), the fraction of the driver nodes within drug target nodes in FDA vs. the quantity of driver nodes (f2) Fig. 4 The frequency (fd) of the identified driver nodes within the constrained nodes set for MCMC samplings in IBD disease network By using the full control scheme in the previous theoretical research [6], there exists an observation and conclusion that the driver nodes tend to be low-degree nodes as shown in Fig. 5a. However, for target control the biological networks, we interestingly find that the driver nodes tend to be high-degree nodes as shown in Fig. 5a, which is more consistent with the biological experimental results. The drug-target driven nodes also tend to have a higher average betweeness centrality as shown in Fig. 5b, which indicates that driver nodes would act as highly-traversed bridges in networks. By contrast, driver nodes display different weights on closeness centrality on particular disease network, as shown in Fig. 5c, which would mean the modularity around driver nodes would have significant rewiring in different conditions [32].

Discussions
In fact, in our previous research, we studied another target control problem, called constrained target control (CTC) problem [19], which focuses on how to choose minimal drivers only within the set of constrained control nodes to change the states of the maximal targets. Different from CTC, we do not require that all the selected driver nodes must be in the constrained nodes set and our consideration for TCO has a double optimization to minimize the total quantity of driven nodes (on which a subsequent intervention is needed) and to maximize the percentage of constrained nodes among them (on which the findings are consistent with prior-known knowledge). Our results supply the novel insights into how we can efficiently target control a complex system, and especially many evidences on the practical strategic utility of TCOA to incorporate prior drug information into potential drug-target forecasts. However, this study is limited to focus on how to obtain the state transitions of the linear networks. It is more practical and necessary to target control the system with nonlinear dynamic in the future.