# Robust mutant strain design by pessimistic optimization

- Meltem Apaydin
^{1}, - Liang Xu
^{2}, - Bo Zeng
^{2}and - Xiaoning Qian
^{1}Email author

**Published: **3 October 2017

## Abstract

### Background

Flux Balance Analysis (FBA) based mathematical modeling enables *in silico* prediction of systems behavior for genome-scale metabolic networks. Computational methods have been derived in the FBA framework to solve bi-level optimization for deriving “optimal” mutant microbial strains with targeted biochemical overproduction. The common inherent assumption of these methods is that the surviving mutants will always cooperate with the engineering objective by overproducing the maximum desired biochemicals. However, it has been shown that this optimistic assumption may not be valid in practice.

### Methods

We study the validity and robustness of existing bi-level methods for strain optimization under uncertainty and non-cooperative environment. More importantly, we propose new pessimistic optimization formulations: P-ROOM and P-OptKnock, aiming to derive robust mutants with the desired overproduction under two different mutant cell survival models: (1) ROOM assuming mutants have the minimum changes in reaction fluxes from wild-type flux values, and (2) the one considered by OptKnock maximizing the biomass production yield. When optimizing for desired overproduction, our pessimistic formulations derive more robust mutant strains by considering the uncertainty of the cell survival models at the inner level and the cooperation between the outer- and inner-level decision makers. For both P-ROOM and P-OptKnock, by converting multi-level formulations into single-level Mixed Integer Programming (MIP) problems based on the strong duality theorem, we can derive exact optimal solutions that are highly scalable with large networks.

### Results

Our robust formulations P-ROOM and P-OptKnock are tested with a small *E. coli* core metabolic network and a large-scale *E. coli* iAF1260 network. We demonstrate that the original bi-level formulations (ROOM and OptKnock) derive mutants that may not achieve the predicted overproduction under uncertainty and non-cooperative environment. The knockouts obtained by the proposed pessimistic formulations yield higher chemical production rates than those by the optimistic formulations. Moreover, with higher uncertainty levels, both cellular models under pessimistic approaches produce the same mutant strains.

### Conclusions

In this paper, we propose a new pessimistic optimization framework for mutant strain design. Our pessimistic strain optimization methods produce more robust solutions regardless of the inner-level mutant survival models, which is desired as the models for cell survival are often approximate to real-world systems. Such robust and reliable knockout strategies obtained by the pessimistic formulations would provide confidence for in-vivo experimental design of microbial mutants of interest.

## Keywords

## Introduction

Whole-genome high-throughput profiling techniques have enabled the systematic study of biological systems at the genome scale [1, 2]. In particular, systems models and computational methods for analyzing and controlling genome-scale metabolic networks have greatly advanced the field of metabolic engineering [3]. With the better understanding of the systems behavior of microbial metabolism, metabolic engineering based on the metabolic network models can help predict metabolic phenotypes [4, 5] and derive engineering strategies for strain design by manipulating the native microbial pathways to produce chemicals of commercial and biomedical benefits [6–11].

Mathematical modeling of metabolism often focuses on steady-state behavior, especially when long-term metabolic dynamics is of interest, as accurate reconstruction of genome-scale kinetic models is challenging when considering the large model space and parameter uncertainties. Constraint-based approaches based on the reaction stochiometry, notably Flux Balance Analysis (FBA) by Linear Programming (LP) formulations, study genome-scale dynamics by mass-balance equations at steady states to understand and predict macro-level microbial behavior in the presence of perturbation, for example caused by mutations or environmental changes [12–15]. By adding thermodynamic and flux capacity constraints, FBA often models steady-state behavior by assuming that the cell growth flux needs to be maximized based on biomass composition [16].

Many computational approaches have been proposed in this computational framework for *in silico* prediction of potentially feasible metabolic phenotypes and evaluation of theoretical limits of metabolic performance after knocking out certain genes or reactions for strain design [14, 16]. Metabolic engineering with such computationally derived knockout strategies has shown to be effective to achieve overproduction of biochemicals of interest [17]. Those strain optimization methods are generally modeled as bi-level optimization problems that seek for maximum overproduction of a desired biochemical at the outer level under the condition that mutant cells are still alive, modeled as the inner-level optimization problem. For example, OptKnock [17], ROOM (Regulatory On/Off Minimization) [18], and MOMAKnock [19] all fall under this category with different mutant cell survival models at the inner level detailed in the “Background” section.

These existing bi-level formulations for microbial strain design have inherent assumptions that nature will always cooperate with the human desire to select the mutants that serve the outer-level engineering objective the best, namely they assume that a cooperative environment exists between the outer-level (human) and inner-level (microbial strain) decision makers. However, in practice, surviving mutant strains may not always fit the best with the engineering goal. In addition, the model assumptions for the cell survival at the inner level are often the approximations to the real-world scenarios, which may result in the derived knockout strategies not overproducing the predicted amount of desirable biochemicals. Therefore, the robustness and viability of these predicted knockout strategies may need to be re-evaluated. For example, if the perturbed strains do not cooperate with the knockout implementer, they may do very opposite to the engineering objective and these optimistic bi-level strategies may not work in practice. In [20], we have shown that OptKnock-derived knockout strategies may produce the desired chemicals at levels much lower than the expected ones, when the aforementioned two assumptions are relaxed.

*E. coli*metabolic network and a genome-scale iAF1260 network [22] show that our pessimistic formulations can generate more robust knockout strategies compared to the existing bi-level optimization methods for strain design. Regardless of the choice of inner-level surrogate functions for cell survival, P-ROOM and P-OptKnock produce consistent and stable knockout strategies for the targeted biochemical overproduction as the formulations are specifically designed to take care of the model uncertainty.

## Background

Before we present pessimistic optimization formulations to derive robust knockout strategies for strain design, we briefly summarize the background and important mathematical notations for FBA and the bi-level optimization models for deriving knockouts, including OptKnock [17] and ROOM [18].

### FBA

**v**. The LP formulation of FBA can be expressed as:

Under the steady-state assumption, mass-balance constraints constitute a system of linear equations where the weighted sum of fluxes, based on stoichiometric coefficients given in a matrix form **S**, is 0. Thermodynamic and capacity constraints are defined as lower and upper bounds on reaction fluxes. In the FBA framework, maximization of biomass growth is often adopted as the objective function for modeling cell survival, where **c** becomes a vector with all values of 1 for the reactions corresponding to the biomass formation.

### OptKnock

where *I* and *J* are the sets of metabolites and reactions, respectively. In this model, *v*
_{
j
} represents the flux of reaction *j* in *J*. Each element of the matrix **S** is the stoichiometric coefficient *S*
_{
ij
} of metabolite *i* in reaction *j*. The outer-level binary decision variable *z*
_{
j
} is 1 if the corresponding reaction flux *v*
_{
j
} is active and 0 otherwise. The overproduction of a chemical of interest is maximized at the outer level allowing *K* possible knockout reactions. The inner-level cell survival model is based on steady-state analysis with FBA constraints. Depending on the availability of nutrients or the maximal fluxes that can be supported by enzymatic pathways [13], \(v_{j}^{min}\) and \(v_{j}^{max}\) are the lowest and highest possible reaction fluxes for the reaction *j*, respectively. The glucose consumption rate *v*
_{
glc
} is often set to a fixed value as \(v_{glc\_uptake}\), and \(v_{biomass}^{target}\) is the minimum level of biomass production. As the biomass growth is a linear objective function of metabolic reaction fluxes, strong duality of the inner-level optimization helps to convert the original bi-level optimization problem into a Mixed Integer Linear Programming (MILP) problem, which can be solved efficiently for large-scale metabolic networks [17, 23].

### ROOM

Observed experimental flux values reported in [24] showed that maximizing biomass reaction flux as a surrogate function for the most probable physiological state of the metabolic model is indeed effective for wild-type strains as they have been exposed to long-term evolutionary pressure. However, it may not be valid for the engineered mutants without such a long-term progress. Therefore, it calls for other realistic cell survival models for mutants. ROOM [18] is one of such models, in which binary variables *y*
_{
j
}’s are introduced to capture significant (up/down regulated) or insignificant reaction flux changes. Instead of maximizing the biomass growth, ROOM [18] aims to minimize the number of significant flux changes. Relaxing the binary variables *y*
_{
j
}’s in the original MILP formulation leads to the LP variant of ROOM:

in which *w*
_{
j
} is the wild-type flux values that can be solved by FBA. In this bi-level optimization model (2), the inner-level cellular objective is to minimize the flux changes. The inequalities with *y*
_{
j
}’s constrain the flux values not to deviate from the wild-type flux values. We again can employ the strong duality condition to convert the bi-level optimization problem into a MILP to solve (2).

## Methods

- 1.
*How robust are the derived knockout strategies based on these optimistic bi-level optimization formulations?* - 2.
*Does the robustness depend on how the inner-level optimization problem approximates mutant cell objectives?* - 3.
*Can we formulate new optimization problems that derive more robust solutions?*

We answer the first two questions by allowing the inner-level problem to take non-optimistic solutions as we have done in [20] to evaluate the knockout strategies derived by OptKnock and ROOM. More importantly, we propose a novel pessimistic bi-level optimization framework in this paper to derive robust knockout strategies, which consider the uncertainty introduced by the inner-level models from the aforementioned non-cooperative and non-unique issues.

We now present our pessimistic optimization formulations for deriving robust mutant strains and provide the detailed derivation of the optimization solutions to our pessimistic models, in which we maximize the desired overproduction for the worst-case scenario due to uncertainty and non-cooperative environment with different inner-level mutant cell survival models.

### Pessimistic bi-level optimization

where **x** and **y** are the outer-level and inner-level decision variables with the corresponding feasible sets \(\mathbb {X}\) and \(\mathbb {Y}\). In the above formulation, **F** represents the outer-level objective function, \(\mathbf {F}:\mathbb {X}\times \mathbb {Y}\rightarrow \mathbb {R}\), and **f** represents inner-level objective function, \(\mathbf {F}:\mathbb {X}\times \mathbb {Y}\rightarrow \mathbb {R}\). **G** and **g** represent the inequality constraint functions at the outer and inner levels, respectively. *Y*(**x**) denotes the set of optimal solutions to the inner-level problem for a given **x**, which may contain multiple elements. Solving such a pessimistic bi-level optimization problem is computationally very difficult. Even a linear optimistic bi-level problem, where both the outer-level and the inner-level problems are linear programs, is theoretically NP-hard [25]. In our study, to solve the more difficult pessimistic bi-level optimization problem, we first employ the LP duality theory based on the fact that the inner-most level problem is a linear program. It converts the pessimistic bi-level model into a standard bi-level problem, which enables us to fully make use of existing bi-level optimization algorithms. For example, we transform the standard bi-level model into a single level mixed integer programming (MIP) problem using the strong duality theory, which can be solved through the commercial MIP solver CPLEX. As those operations are rather simple, we make a challenging pessimistic bi-level problem practically solvable even for large-scale cases.

We mention that for P-ROOM (4), there are |*J*| binary variables in the outer level, 2|*J*| continuous variables and *O*(|*I*|+5|*J*|) constraints in the inner level. By taking the dual of the inner-most problem, we convert (4) into a standard bi-level problem (7), whose inner-level problem has *O*(|*I*|+7|*J*|) variables and *O*(|*I*|+7|*J*|) constraints. Comparing to the traditional bi-level ROOM model (2), which has the identical outer-level problem, and 2|*J*| continuous variables and *O*(|*I*|+5|*J*|) constraints in the inner level, (7) does not have a much larger or more complicated structure. Hence, it can be expected that the computational complexity of (7) will not be drastically more than that of the traditional ROOM model. Moreover, our numerical study shows that the single level MIP reformulation of (7) for a larger-scale network can be computed by CPLEX in a reasonable time, which indicates the efficiency of our proposed solution strategy.

Inspired by this pessimistic optimization framework, we propose pessimistic models for mutant strain optimization under model uncertainty. We note that the notations appearing in the following pessimistic formulations have the same definitions as in the “Background” section.

### Pessimistic strain optimization I: P-ROOM

*minimax*flavor as in (3):

Originally introduced in [21], the *ε*-approximation extension of the above pessimistic formulation is flexible when model uncertainty needs to be considered for bi-level decision making. Specifically, the *ε*-approximation of the pessimistic problem is to allow a proportional gap of *ε* for the inner-level objective function value from the optimal value that the actual knockout solutions would take. When *ε*=0, we have the original pessimistic formulations assuming that the inner-level models are faithful, but inner-level decision variables may act against outer-level engineering objectives. Higher *ε* values reflect that the inner-level mutant strains have a higher tolerance level for inner-level model uncertainty when a non-cooperative decision is taken against the outer-level engineering objectives. Such an *ε*-approximation of the pessimistic problem can be written as follows:

*φ*(

**v**

^{∗},

**y**

^{∗}) is the inner-level function value changing with respect to the inner-level decision variables

**v**

^{∗}and

**y**

^{∗}. As the inner-level problem is LP, we can convert the problem into its dual representation by the strong duality theorem as in [17, 26], which is:

where *u*
_{
i
} denotes the dual variable associated with the mass-balance constraint, \(\sum _{j} S_{ij}v_{j} = 0\) for metabolite *i*, *glc* is the dual variable associated with the glucose uptake constraint, *μ*
_{
biom
} is the dual variable associated with the minimum biomass threshold constraint, and *μ*
_{
m
i
n,j
} and *μ*
_{
m
a
x,j
} are the dual variables for the two directions of the inequality constraint \(v_{j}^{min}z_{j} \leqslant v_{j} \leqslant v_{j}^{max}z_{j}\) for each reaction *j*. Finally, *μ*
_{
m
i
n2,j
} and *μ*
_{
m
a
x2,j
} are the dual variables associated with the constraints \(v_{j}-y_{j}\left (v_{j}^{min}-w_{j}\right) \geq w_{j}\) and \(v_{j}-y_{j}\left (v_{j}^{max}-w_{j}\right)\leq w_{j}\), denoting the flux changes with respect to wild-type flux values, respectively, and *a*
_{
j
} is the dual variable associated with the upper bound constraint on *y*
_{
j
}. Note that the constraint \(v_{j}^{min}z_{j} \leqslant v_{j} \leqslant v_{j}^{max}z_{j}\) links the outer-most level binary decision variables *z*
_{
j
} and inner-level decision variables *v*
_{
j
}.

*ε*-approximation, the final pessimistic formulation P-ROOM aims to solve the following max-min problem;

Since some knockout strategies *z*
_{
j
} may not have corresponding feasible solutions to the inner problem of (7), causing the corresponding dual solutions to be unbounded, new continuous decision variables *s*
_{
j
} and *r*
_{
j
} are added here in order to enforce the feasibility of the inner primal problem of (7) to avoid such degenerated cases. The bilinear terms in the objective function and the constraints in (8) can be further linearized by using the commonly adopted big-M method. This single-level MILP problem can be solved efficiently for large-scale problems by available solvers.

### Pessimistic strain optimization II: P-OptKnock

*ε*-approximation to the pessimistic problem in (9) to incorporate the inner-level model uncertainty, we have the P-OptKnock formulation as:

where *φ*(**v**
^{∗}) is the inner-most level function value of a given inner-level decision variable **v**
^{∗}. Similar to the approach we have employed for P-ROOM, we can convert this optimization problem (10) to a single-level MIP. We note that the single-level MIP equivalent of the pessimistic formulation P-OptKnock is indeed similar to the work in [26] when the tolerance level *ε* is chosen as 0. However, we emphasize that our proposed pessimistic framework is more general and can be extended to different bi-level strain optimization formulations that employ various cell-survival models.

## Results and discussion

In this section, we first evaluate the knockout solutions by optimistic bi-level optimization methods on a core *E. coli* metabolic network [27] when the outer- and inner-level decision makers are not cooperative. We then test our robust strain optimization methods, namely P-ROOM and P-OptKnock, derived in the “Methods” section, on the core network and a large *E. coli* metabolic network, iAF1260 [22], for overproduction of Succinate. The models are optimized using the MILP solver CPLEX 12.6.3 [28].

### Succinate overproduction on AntCore metabolic network

We derive knockout solutions for a core *E. coli* metabolic network model [27] with 74 chemicals and 75 reactions for Succinate overproduction. The network model is from the OptKnock package [17], in which the glucose uptake is set at a fixed value of 100mmol/gDW.hr and the minimum biomass is set as 5 mmol/gDW.hr. Since the glucose-uptake rate is fixed, the Succinate yield is equivalent to the corresponding flux value considering steady-state stoichiometry constraint. The wild-type flux distribution is computed by maximizing the biomass in the FBA framework. In this set of experiments, we set the allowable knockout numbers *K*=3, 4, and 5. All the reported experiments are based on the aerobic condition.

We first evaluate the robustness of two optimistic bi-level programs OptKnock (1) and ROOM (2) by computing the performance of least favorable Succinate overproduction under uncertainty for the derived knockouts **z**
^{∗} by solving (1) and (2). As discussed in the “Methods” section, we introduce a parameter *ε* as the model tolerance level to capture how faithful the inner-level mutant survival model is and whether the mutants will cooperate with the outer-level overproduction objectives. By allowing the gap *ε* between the realistic responses and the optimal inner-level objective function value, the increase in *ε* reflects the higher model tolerance by approximating the inner-level model uncertainty and non-cooperativeness. The pessimistic Succinate overproduction rate evaluations of optimistic knockout solutions are derived by finding the worst-case solutions of reaction flux values in *ε*-approximation sets.

*ε*values (0≤

*ε*≤0.4) are plotted in cyan and red lines in Fig. 2 with different numbers of allowed knockouts. It is clear that with higher model uncertainty (larger

*ε*values), both ROOM and OptKnock strategies lead to smaller Succinate overproduction compared to the predicted optimistic overproduction rates. We also note that the Succinate rates drop very quickly even with small

*ε*, which implies that these optimistic knockout strategies are not robust. Finally, when the inner-level mutant survival models are not as faithful as desired, the derived knockout strategies may not be effective at all. For example, when

*K*=3 and 4, the knockouts suggested by OptKnock are too “optimistic” as the corresponding evaluation Succinate rates go to 0 as shown in Fig. 2a and b. Compared with the results from ROOM, this validates that the minimum flux change model may be a better mutant cell survival model than the maximum biomass growth model adopted in OptKnock. These experiments bring out that the cellular and engineering objectives may behave in opposite directions and the optimistic knockout strategies may not work in practice.

*ε*-approximation model uncertainty, we now compare the corresponding derived knockout strategies based on our proposed pessimistic models: P-OptKnock and P-ROOM. Figure 2 illustrates the

*ε*-optimal Succinate overproduction rates of the derived knockout solutions by P-OptKnock and P-ROOM in blue and black lines, respectively. Obviously, our pessimistic strategies derive more robust solutions, for which the worst-case or least favorable Succinate rates for different

*ε*values are consistently on top of the rates from optimistic strategies. The least favorable Succinate rates decrease monotonically with increasing

*ε*corresponding to higher model uncertainty. More interestingly, as

*ε*increases, both P-OptKnock and P-ROOM converge to the stable Succinate overproduction rates. In fact, they derive the same knockout reactions as provided in Table 1. This clearly shows that our pessimistic formulations can produce consistent and robust mutant strains with respect to the inner-level mutant survival model uncertainty. As a side note, when

*K*=5, the pessimistic Succinate values produced by P-ROOM and its optimistic counterpart are almost the same for different tolerance levels, probably because the surviving mutants are more restricted with the increasing number of knockout reactions. This also shows that the minimum flux change is a better mutant survival model.

Knockout strains derived by pessimistic and optimistic models on the core *E.coli* metabolic network

| Knockouts | Succinate production | |
---|---|---|---|

P-ROOM | 3 | 6pg →ru5p + co2 + nadph, oac + accoa →cit, ac →ac(ext) | 76.97 |

4 | g6p →6pg + nadph, oac + accoa →cit, suc ⇔fum + fadh2, ac →ac(ext) | 82.36 | |

5 | g6p →6pg + nadph, mal →pyr + co2 + nadph, 3pg + glu →ser + akg + nadh, fadh2 + 0.5o2 →2atp, nadh ⇔nadph | 107.07 | |

P-OptKnock | 3 | 6pg →ru5p + co2 + nadph, oac + accoa → cit, ac →ac(ext) | 76.97 |

4 | g6p →6pg + nadph, oac + accoa →cit, suc ⇔fum + fadh2, ac →ac(ext) | 82.36 | |

5 | g6p →6pg + nadph, mal →pyr + co2 + nadph, 3pg + glu →ser + akg + nadh, fadh2 + 0.5o2 →2atp, nadh ⇔nadph | 107.07 | |

ROOM | 3 | dhap ⇔gap, g6p →6pg + nadph, fadh2 + 0.5o2 →2atp | 93.92 |

4 | dhap ⇔gap, g6p →6pg + nadph, fadh2 + 0.5o2 →2atp, glyc(ext) → | 108.22 | |

5 | g6p →6pg + nadph, mal →pyr + co2 + nadph, 3pg + glu →ser + akg + nadh, fadh2 + 0.5o2 →2atp, nadh ⇔nadph | 115.58 | |

OptKnock | 3 | g6p →6pg + nadph, mal →pyr + co2 + nadph, nadh ⇔nadph | 110.179 |

4 | g6p →6pg + nadph, mal →pyr + co2 + nadph, 3pg + glu →ser + akg + nadh, nadh ⇔ nadph | 123.314 | |

5 | g6p →6pg + nadph, 3pg + glu →ser + akg + nadh, nadh ⇔nadph, glyc ⇔glyc(ext), ac(ext) → | 129.786 |

In Table 1, the succinate overproduction rates are given by the pessimistic formulations P-ROOM and P-OptKnock for the tolerance values *ε* in which both models became stable (*ε*=0.4) as shown in Fig. 2. Although the pessimistic models provide different Succinate values for different *ε* values in Fig. 2, the knockout strategies and the Succinate values for both P-ROOM and P-OptKnock are identical when they reach stability. When *K*=3, one of the knockouts suggested by P-ROOM and P-OptKnock involves competing byproduct metabolism pathways for succinate such as 6-Phospho-D-gluconate (6pg) and Ribulose 5-phosphate (ru5p). With *K*=4, the reaction decomposing succinate (suc) is also eliminated. As more knockouts are allowed, the resulting succinate production can be further increased. We should also note that the removal of an important reaction for Trans-hyrogenation (nadh ⇔nadph) may cause significant reduction in biomass flux value. For the knockout strategies suggested by ROOM and OptKnock, when assuming the optimistic environment, the succinate rates are larger than the values suggested by pessimistic knockouts as expected since optimistic formulation provides an upper bound for the pessimistic formulation. From all the experiments with different *K*’s, we demonstrate that, when we formulate the mutant optimization problem as pessimistic optimization by incorporating the model uncertainty, both P-OptKnock and P-ROOM can suggest consistent and robust knockouts, making the mutant strain optimization problem more reliable, accurate, and independent from the choice of inner-level surrogate functions for mutant cell objectives.

### Succinate overproduction on iAF1260 network

We have shown that the derived knockout strategies based on optimistic methods may not be robust and often “over-optimistic” by the experiments on the core *E. coli* metabolic network, while our proposed methods P-ROOM and P-OptKnock based on pessimistic optimization for mutant strain design have achieved robust and more practical knockout strategies. We further test P-ROOM and P-OptKnock to derive mutant strains for the overproduction of Succinate on a large *E. coli* metabolic network model, iAF1260 [22], which has 1658 metabolites and 2936 reactions including the pseudo reactions required for the computational model, also from the OptKnock package [17]. The glucose uptake rate is fixed at 100 mmol/gDW.hr, and the minimum biomass value is also set to 5 mmol/gDW.hr. The reported experiments are based on the anaerobic environment. We allow *K*=3 knockout reactions on this large network.

*ε*-optimal Succinate flux rates, which are the outer-level objective function values based on two pessimistic models of P-ROOM and P-OptKnock at different

*ε*values. When there is no model uncertainty (

*ε*=0), P-ROOM clearly provides higher Succinate rate than the value P-OptKnock suggests. This again suggests that modeling mutant cell survival by minimization of flux changes plays a role in favor of engineering objectives with improved targeted productions. At higher

*ε*values, P-ROOM and P-OptKnock reach stability, and they derive the same knockout reactions as provided in Table 2 as we also observed in the experiments on the

*E. coli*core metabolic network. If we consider the fact that the inner-level cell survival models are approximate to the real-world systems, incorporating this model uncertainty with

*ε*-approximation in our formulated pessimistic mutant strain optimization methods, P-ROOM and P-OptKnock, has a key role to achieve robust knockout solutions that are less affected by these approximate cell survival models.

Knockout strains derived by pessimistic and optimistic models on the iAF1260 *E.coli* metabolic network

Knockouts | min v | max v | ||
---|---|---|---|---|

ROOM | - | 0 | 0 | |

FBA | - | 0 | 0.372 | |

P-ROOM | ∙ mlthf + nadp ⇔methf + nadph |
| 8.09 | 8.09 |

∙ coa + pyr →accoa + for |
| 5.69 | 40.08 | |

∙ q8 + succ →fum + q8h2 | ||||

P-OptKnock | ∙ mlthf + nadp ⇔methf + nadph |
| 8.23 | 8.23 |

∙ coa + pyr →accoa + for |
| 4.78 | 80.52 | |

∙ q8 + succ →fum + q8h2 | ||||

ROOM | ∙ g6p + nadp ⇔6pgl + h + nadph |
| 31.36 | 31.36 |

∙ h2o + pser-L →pi + ser-L |
| 0 | 76.143 | |

∙ q8 + succ →fum + q8h2 | ||||

OptKnock | ∙ mlthf + nadp ⇔methf + nadph |
| 8.23 | 8.23 |

∙ coa + pyr →accoa + for |
| 4.78 | 80.52 | |

∙ q8 + succ →fum + q8h2 |

Table 2 provides the predicted knockouts by P-ROOM and P-OptKnock when *K*=3. The minimum and maximum Succinate flux rates are also given for the mutant and also wild-type strains. The Succinate flux rates of pessimistic models are given for two different *ε* values: *ε*=1 denoting the highest tolerance level for the inner-level model uncertainty, and *ε*=0 denoting 0 tolerance level. In Table 2, both P-ROOM and P-OptKnock predict three identical knockouts that yield a minimum production rate of the Succinate that is higher than the predicted production rate in the wild-type strains. It validates that our pessimistic mutant strain optimization methods guarantee a higher Succinate production rate than that of wild-type strains even in the worst-case scenario. Furthermore, minimum and maximum Succinate rates given the predicted knockouts define a larger range for the OptKnock model, which may indicate the inner-level mutant cell survival model by biomass maximization is less faithful than ROOM’s minimum flux change model.

We have also included the derived knockouts by ROOM and OptKnock and corresponding minimum and maximum succinate rates in Table 2. The mutant strain with the knockouts obtained by ROOM produces 0 minimal succinate flux value with the highest inner-level model uncertainty. However, P-ROOM succeeds in getting higher production rates even when *ε*=1. The minimal succinate rate when *ε*=0 in ROOM is greater than the minimal succinate rate for P-ROOM which is 8.09. This is because the reported pessimistic knockouts are identified when the model uncertainty is taken as *ε*=1. We note that the mutant with the pessimistic knockouts identified by P-ROOM when *ε*=0 actually produces the minimal succinate rate as 31.36. The knockouts captured by OptKnock are same as the ones identified by pessimistic formulations, leading to the same mutant with the same succinate production rates as P-OptKnock. Pessimistic formulations identified the succinate dehydrogenase reaction (SUCDi) as one of the suggested knockouts, which directly consumes succinate (succ), and the reaction pyruvate formate lyase (PFL). These have been reported as effective knockout strategies in [17, 29].

As demonstrated by the results from both the small core *E. coli* metabolic network and large iAF1260 network, our pessimistic formulations for strain optimization produce robust and stable knockout solution strategies under model uncertainty. Especially when comparing the results from P-ROOM and P-OptKnock, both models derive the same stable knockout solutions regardless of the inner-level surrogate functions for mutant cell survival. Such consistency is critical when considering the translation of computationally derived results into in vivo experiments in practical metabolic engineering.

## Conclusions

In this paper, we have proposed a new pessimistic optimization framework to identify optimal knockout strategies for maximum targeted bio-production under model uncertainty. We specifically have investigated two cell survival models and presented two corresponding pessimistic models to derive robust knockout reactions to achieve maximum biochemical overproduction: (1) P-ROOM under the minimum flux change condition; and (2) P-OptKnock under the maximum biomass condition. The experiments on both the core *E. coli* metabolic network [27] and large-scale iAF1260 network [22] have demonstrated that the pessimistic models for strain optimization derive robust and stable metabolic strain perturbation strategies through genome-scale steady-state stoichiometric modeling. Formulating the strain optimization problem based on pessimistic optimization considering the model uncertainty leads to consistent and reliable solutions regardless of the inner-level mutant survival models, which can provide high confidence for future in vivo experimental design of promising microbial mutants that benefit the human society.

## Declarations

### Acknowledgments

This work was partially supported by CCF-1553281 and CMMI-1642514 from the National Science Foundation.

### Funding

The publication costs of this article was funded by Award CCF-1553281 from the National Science Foundation.

### Availability of data and materials

The C++ source code and relevant network data are available upon request.

### About this supplement

This article has been published as part of *BMC Genomics* Volume 18 Supplement 6, 2017: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2016: genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-18-supplement-6.

### Authors’ contributions

Conceived and designed the experiments: MA, LX, BZ, XQ. Designed and Implemented the algorithm: MA, LX, BZ, XQ. Performed the experiments: MA. Analyzed the results: MA, LX, BZ, XQ. Wrote the paper: MA, LX, BZ, XQ. Supervised the study: BZ, XQ. All authors read and approved the final manuscript.

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Barrett CL, Kim TY, Kim HU, Palsson BØ, Lee SY. Systems biology as a foundation for genome-scale synthetic biology. Curr Opin Biotechnol. 2006; 17(5):488–92.View ArticleGoogle Scholar
- Esvelt KM, Wang HH. Genome-scale engineering for systems and synthetic biology. Mol Syst Biol. 2013; 9(1):641.Google Scholar
- Palsson BØ. Systems Biology: Constraint-based Reconstruction and Analysis. Cambridge: Cambridge University Press; 2015.View ArticleGoogle Scholar
- McCloskey D, Palsson BØ, Feist AM. Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli. Mol Syst Biol. 2013; 9(1):661.Google Scholar
- Terzer M, Maynard ND, Covert MW, Stelling J. Genome-scale metabolic networks. Wiley Interdiscip Rev Syst Biol Med. 2009; 1(3):285–97.View ArticleGoogle Scholar
- Chotani G, Dodge T, Hsu A, Kumar M, LaDuca R, Trimbur D, Weyler W, Sanford K. The commercial production of chemicals using pathway engineering. Biochim Biophys Acta Protein Struct Mol Enzymol. 2000; 1543(2):434–55.View ArticleGoogle Scholar
- Lee JW, Na D, Park JM, Lee J, Choi S, Lee SY. Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat Chem Biol. 2012; 8(6):536–46.View ArticleGoogle Scholar
- Broa C, Regenberga B, Förster J, Nielsen J.
*In silico*aided metabolic engineering of*saccharomyces cerevisiae*for improved bioethanol production. Metab Eng. 2006; 8:102–11.View ArticleGoogle Scholar - Lu J, Sheahan C, Fu P. Metabolic engineering of algae for fourth generation biofuels production. Energy Environ Sci. 2011; 4:2451–66.View ArticleGoogle Scholar
- Luengo JM, Garcia B, Sandoval A, Naharro G, Olivera ER. Bioplastics from microorganisms. Curr Opin Microbiol. 2003; 6:251–60.View ArticlePubMedGoogle Scholar
- Ohta K, Beall DS, Mejia JP, Shanmugam KT, Ingram LO. Metabolic engineering of
*klebsiella oxytoca m5a1*for ethanol production from xylose and glucose. Appl Environ Microbiol. 1991; 57:2810–1815.PubMedPubMed CentralGoogle Scholar - Burgard AP, Maranas CD. Probing the performance limits of the Escherichia coli metabolic network subject to gene additions or deletions. Biotech Bioeng. 2001; 74(5):364–75.View ArticleGoogle Scholar
- Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci. 2002; 99(23):15112–7.View ArticleGoogle Scholar
- Varma A, Palsson BØ. Metabolic flux balancing: Basic concepts, scientific and practical use. Nat Biotechnol. 1994; 12:994–8.View ArticleGoogle Scholar
- Lee JM, Gianchandani EP, Papin JA. Flux balance analysis in the era of metabolomics. Brief Bioinform. 2006; 7(2):140–50.Google Scholar
- Edwards JS, Palsson BØ. The escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci. 2000; 97(10):5528–33.View ArticlePubMed CentralGoogle Scholar
- Burgard AP, Pharkya P, Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotech Bioeng. 2003; 84(6):647–57.View ArticleGoogle Scholar
- Shlomi T, Berkman O, Ruppin E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc Natl Acad Sci USA. 2005; 102(21):7695–700.View ArticlePubMed CentralGoogle Scholar
- Ren S, Zeng B, Qian X. Adaptive bi-level programming for optimal gene knockouts for targeted overproduction under phenotypic constraints. BMC Bioinforma. 2013; 14(2):1.View ArticleGoogle Scholar
- Apaydin M, Zeng B, Qian X. A reliable alternative of OptKnock for desirable mutant microbial strains. In: IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).2016. p. 573–576. doi:10.1109/BHI.2016.7455962.
- Zeng B. Easier than we thought-a practical scheme to compute pessimistic bilevel optimization problem. 2015. Available at SSRN: https://ssrn.com/abstract=2658342 or http://dx.doi.org/10.2139/ssrn.2658342.
- Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007; 3(1):121.PubMed CentralGoogle Scholar
- Kim J, Reed JL, Maravelias CT. Large-scale bi-level strain design approaches and mixed-integer programming solution techniques. PLoS ONE. 2011; 6(9):24162.View ArticleGoogle Scholar
- Burgard AP, Maranas CD. Optimization-based framework for inferring and testing hypothesized metabolic objective functions. Biotech Bioeng. 2003; 82(6):670–7.View ArticleGoogle Scholar
- Deng X. Complexity issues in bilevel linear programming. In: Multilevel Optimization: Algorithms and Applications. US: Springer: 1998. p. 149–64.Google Scholar
- Tepper N, Shlomi T. Predicting metabolic engineering knockout strategies for chemical production: accounting for competing pathways. Bioinformatics. 2010; 26(4):536–43.View ArticleGoogle Scholar
- Antoniewicz MR, Kraynie DF, Laffend LA, Gonzalez-Lergier J, Kelleher JK, Stephanopoulos G. Metabolic flux analysis in a nonstationary system: fed-batch fermentation of a high yielding strain of E. coli producing 1, 3-propanediol. Metab Eng. 2007; 9(3):277–92.View ArticlePubMed CentralGoogle Scholar
- IBM ILOG CPLEX Optimizer 12.6.3. 2015, IBM Corporation.Google Scholar
- Stols L, Donnelly MI. Production of succinic acid through overexpression of NAD (+)-dependent malic enzyme in an Escherichia coli mutant. Appl Environ Microbiol. 1997; 63(7):2695–701.Google Scholar