The above optimistic bi-level optimization methods effectively model the interacting objectives of the outer-level knockout implementer and the inner-level microbial cells. However, the effectiveness of these optimistic formulations depends on the inherent assumption that the outer-level engineering objective and inner-level cellular fitness function behave cooperatively by selecting the inner-level solutions in favor of the outer-level optimization problem. In practice, when the inner-level problem either does not faithfully reflect cellular fitness objectives or has non-unique solutions, there is no guarantee that the cell response will be in cooperation with the engineering objective to maximize the desirable biochemical product formation as we recently found in [20]. The main questions that we would like to address here are:
-
1.
How robust are the derived knockout strategies based on these optimistic bi-level optimization formulations?
-
2.
Does the robustness depend on how the inner-level optimization problem approximates mutant cell objectives?
-
3.
Can we formulate new optimization problems that derive more robust solutions?
We answer the first two questions by allowing the inner-level problem to take non-optimistic solutions as we have done in [20] to evaluate the knockout strategies derived by OptKnock and ROOM. More importantly, we propose a novel pessimistic bi-level optimization framework in this paper to derive robust knockout strategies, which consider the uncertainty introduced by the inner-level models from the aforementioned non-cooperative and non-unique issues.
We now present our pessimistic optimization formulations for deriving robust mutant strains and provide the detailed derivation of the optimization solutions to our pessimistic models, in which we maximize the desired overproduction for the worst-case scenario due to uncertainty and non-cooperative environment with different inner-level mutant cell survival models.
Pessimistic bi-level optimization
As discussed, due to the inner-level model uncertainty and non-cooperative issues, the existing optimistic bi-level strain optimization methods may not be able to produce reliable results when there exist non-unique solutions to the inner-level problem. In order to overcome this limitation, we propose a pessimistic optimization framework to optimize for the worst-case scenario under uncertainty and non-cooperative environment. The mathematical programming formulation of pessimistic bi-level optimization can be represented in the following general form [21]:
$$ {}\begin{aligned} & \underset{\mathbf{x}\in\mathbb{X}}{\text{max}} \quad \underset{\mathbf{y}\in Y(\mathbf{x})}{\text{min}} \quad \mathbf{F}(\mathbf{x},\mathbf{y})\\ & \text{s.t.} \quad \mathbf{G}(\mathbf{x})\leq 0 \\ & \quad \quad~ Y(\mathbf{x}) = \text{arg min(max)}_{\mathbf{y}} \lbrace \mathbf{f}(\mathbf{x},\mathbf{y}) : \mathbf{g}(\mathbf{x},\mathbf{y})\leq0, \mathbf{y}\in\mathbb{Y}\rbrace, \end{aligned} $$
(3)
where x and y are the outer-level and inner-level decision variables with the corresponding feasible sets \(\mathbb {X}\) and \(\mathbb {Y}\). In the above formulation, F represents the outer-level objective function, \(\mathbf {F}:\mathbb {X}\times \mathbb {Y}\rightarrow \mathbb {R}\), and f represents inner-level objective function, \(\mathbf {F}:\mathbb {X}\times \mathbb {Y}\rightarrow \mathbb {R}\). G and g represent the inequality constraint functions at the outer and inner levels, respectively. Y(x) denotes the set of optimal solutions to the inner-level problem for a given x, which may contain multiple elements. Solving such a pessimistic bi-level optimization problem is computationally very difficult. Even a linear optimistic bi-level problem, where both the outer-level and the inner-level problems are linear programs, is theoretically NP-hard [25]. In our study, to solve the more difficult pessimistic bi-level optimization problem, we first employ the LP duality theory based on the fact that the inner-most level problem is a linear program. It converts the pessimistic bi-level model into a standard bi-level problem, which enables us to fully make use of existing bi-level optimization algorithms. For example, we transform the standard bi-level model into a single level mixed integer programming (MIP) problem using the strong duality theory, which can be solved through the commercial MIP solver CPLEX. As those operations are rather simple, we make a challenging pessimistic bi-level problem practically solvable even for large-scale cases.
We mention that for P-ROOM (4), there are |J| binary variables in the outer level, 2|J| continuous variables and O(|I|+5|J|) constraints in the inner level. By taking the dual of the inner-most problem, we convert (4) into a standard bi-level problem (7), whose inner-level problem has O(|I|+7|J|) variables and O(|I|+7|J|) constraints. Comparing to the traditional bi-level ROOM model (2), which has the identical outer-level problem, and 2|J| continuous variables and O(|I|+5|J|) constraints in the inner level, (7) does not have a much larger or more complicated structure. Hence, it can be expected that the computational complexity of (7) will not be drastically more than that of the traditional ROOM model. Moreover, our numerical study shows that the single level MIP reformulation of (7) for a larger-scale network can be computed by CPLEX in a reasonable time, which indicates the efficiency of our proposed solution strategy.
Inspired by this pessimistic optimization framework, we propose pessimistic models for mutant strain optimization under model uncertainty. We note that the notations appearing in the following pessimistic formulations have the same definitions as in the “Background” section.
Pessimistic strain optimization I: P-ROOM
In the following, we propose the pessimistic formulation based on the original ROOM model (2) in the same minimax flavor as in (3):
$$ {\begin{aligned} &\underset{z_{j},\forall j\in J}{\text{max}}~\underset{v_{j},y_{j}\in Y(z_{j}),\forall j\in J}{\text{min}}~v_{chemical} \\ &\text{s.t.}~\sum_{j\in J}\left(1- z_{j}\right) \leqslant K,~z_{j} \in \left\lbrace 0, 1\right\rbrace, \forall j\in J \\ &\quad Y(z_{j})=\text{argmin}_{v_{j},y_{j}} \left\lbrace \sum_{j\in J} y_{j} : \sum_{j\in J} S_{ij}v_{j} \,=\, 0, \forall i\in I, v_{glc} \,=\, v_{glc\_uptake}, \right.\\ &\qquad v_{biom}\!\geqslant\! v_{biomass}^{target},~v_{j}^{min}z_{j} \!\leqslant\!\! v_{j} \leqslant\! v_{j}^{max}z_{j},~v_{j}\,-\,y_{j}\left(v_{j}^{max}-w_{j}\right) \leq w_{j},\\ &\qquad \qquad~\left.v_{j}-y_{j}\left(v_{j}^{min}-w_{j}\right) \geq w_{j},~0\leq y_{j} \leq1, \forall j\in J {\vphantom{\sum_{j\in J}}}\right\rbrace. \end{aligned}} $$
(4)
Originally introduced in [21], the ε-approximation extension of the above pessimistic formulation is flexible when model uncertainty needs to be considered for bi-level decision making. Specifically, the ε-approximation of the pessimistic problem is to allow a proportional gap of ε for the inner-level objective function value from the optimal value that the actual knockout solutions would take. When ε=0, we have the original pessimistic formulations assuming that the inner-level models are faithful, but inner-level decision variables may act against outer-level engineering objectives. Higher ε values reflect that the inner-level mutant strains have a higher tolerance level for inner-level model uncertainty when a non-cooperative decision is taken against the outer-level engineering objectives. Such an ε-approximation of the pessimistic problem can be written as follows:
$$ \begin{aligned} &\underset{z_{j}:z_{j}\in \left\lbrace 0,1\right\rbrace,\forall j\in J, \sum_{j\in J}(1- z_{j}) \leqslant K}{\text{max}}~\underset{v_{j},y_{j},\forall j \in J}{\text{min}}~v_{chemical} \\ &\qquad \quad~\text{s.t.}~\sum_{j\in J} S_{ij}v_{j} = 0, \forall i\in I, v_{glc} = v_{glc\_uptake}, v_{biom}\geqslant v_{biom}^{target}\\ &\qquad \qquad\quad~v_{j}^{min}z_{j} \!\!\leqslant\! \!v_{j} \leqslant v_{j}^{max}z_{j},~v_{j}-y_{j}\left(v_{j}^{max}-w_{j}\right) \leq w_{j}, \forall j\in J \\ &\qquad \qquad\quad~v_{j}-y_{j}\left(v_{j}^{min}-w_{j}\right) \geq w_{j},~0\leq y_{j} \leq1, \forall j\in J \\ &\qquad \qquad\quad~\sum_{j\in J}{y_{j}}\leq \varphi(\mathbf{v}^{*},\mathbf{y}^{*})(1+\epsilon), \end{aligned} $$
(5)
where φ(v
∗,y
∗) is the inner-level function value changing with respect to the inner-level decision variables v
∗ and y
∗. As the inner-level problem is LP, we can convert the problem into its dual representation by the strong duality theorem as in [17, 26], which is:
$$ \begin{aligned} &\underset{glc,u_{i},\forall i\in I,\mu,a_{j},\forall j\in J} {\text{max}}~glc. v_{glc\_uptake} + \mu_{biom} v_{biomass}^{target} + \sum_{j\in J} v_{j}^{min} \mu_{min,j}z_{j} -\\ & \quad \quad\sum_{j\in J}v_{j}^{max} \mu_{max,j}z_{j} - \sum_{j\in J}\mu_{max2,j} w_{j} + \sum_{j\in J}\mu_{min2,j} w_{j} - \sum_{j\in J}a_{j} \\ & \text{s.t.}~\sum_{i\in I}S_{i,glc}u_{i} + glc + \mu_{min,glc} \,-\,\mu_{max,glc} - \mu_{max2,glc} + \mu_{min2,glc} = 0\\ & \quad~~\sum_{i\in I}S_{i,biom}u_{i} + \mu_{biom} + \mu_{min,biom} -\mu_{max,biom} - \mu_{max2,biom}\\ &\qquad\quad+ \mu_{min2,biom} = 0\\ & \quad~~\sum_{i\in I}S_{i,j}u_{i} + \mu_{min,j} -\mu_{max,j} - \mu_{max2,j} + \mu_{min2,j} = 0, \forall {j}\in J, j\\ &\qquad\quad\neq \text{biom, glc}\\ & \quad~~\mu_{max2,j}\left(v_{j}^{max}-w_{j}\right) - \mu_{min2,j}\left(v_{j}^{min}-w_{j}\right) - a_{j} \leq 1, \forall {j}\in J\\ & \quad~~glc,u_{i}\in\mathbb{R}, \forall {i}\in I \quad \mu_{biom},\mu_{min,j},\mu_{max,j},\mu_{max2,j}, \mu_{min2,j}, a_{j}\\ &\qquad\quad\geq0, \forall {j}\in J \end{aligned} $$
(6)
where u
i
denotes the dual variable associated with the mass-balance constraint, \(\sum _{j} S_{ij}v_{j} = 0\) for metabolite i, glc is the dual variable associated with the glucose uptake constraint, μ
biom
is the dual variable associated with the minimum biomass threshold constraint, and μ
m
i
n,j
and μ
m
a
x,j
are the dual variables for the two directions of the inequality constraint \(v_{j}^{min}z_{j} \leqslant v_{j} \leqslant v_{j}^{max}z_{j}\) for each reaction j. Finally, μ
m
i
n2,j
and μ
m
a
x2,j
are the dual variables associated with the constraints \(v_{j}-y_{j}\left (v_{j}^{min}-w_{j}\right) \geq w_{j}\) and \(v_{j}-y_{j}\left (v_{j}^{max}-w_{j}\right)\leq w_{j}\), denoting the flux changes with respect to wild-type flux values, respectively, and a
j
is the dual variable associated with the upper bound constraint on y
j
. Note that the constraint \(v_{j}^{min}z_{j} \leqslant v_{j} \leqslant v_{j}^{max}z_{j}\) links the outer-most level binary decision variables z
j
and inner-level decision variables v
j
.
By aggregating the inner-level constraints to the outer level with those introduced by its dual in (6) with the adopted ε-approximation, the final pessimistic formulation P-ROOM aims to solve the following max-min problem;
$$ \begin{aligned} & \underset{z_{j}:z_{j}\in \left\lbrace 0, 1\right\rbrace,\forall j\in J,\sum_{j\in J}(1- z_{j}) \leqslant K}{\text{max}} \quad \underset{v_{j}, y_{j},glc, u_{i}, a_{j}, \mu,\forall i\in I,\forall j\in J}{\text{min}} \quad v_{chemical}\\[-2pt] & \qquad \quad \text{s.t.} \quad \sum_{j\in J} S_{ij}v_{j} = 0, \forall {i}\in I\\[-2pt] & \qquad \qquad \quad v_{glc} = v_{glc\_uptake}, v_{biom} \geqslant v_{biomass}^{target},v_{j}^{min}z_{j} \leqslant v_{j}\\[-2pt] & \qquad \qquad \quad \quad\ \leqslant v_{j}^{max}z_{j}, \forall j\in J\\[-2pt] & \qquad\qquad \quad v_{j}-y_{j}\left(v_{j}^{max}-w_{j}\right) \leq w_{j}, v_{j}-y_{j}\left(v_{j}^{min}-w_{j}\right) \geq w_{j}, 0\\[-2pt] & \qquad \qquad \quad \quad\ \leq y_{j}\leq1, \forall {j}\in J\\[-2pt] & \qquad \qquad \quad \left(glc. v_{glc\_uptake} + \mu_{biom} v_{biomass}^{target} + \sum_{j\in J} v_{j}^{min} \mu_{min,j}z_{j}\right.\\[-2pt] & \qquad \qquad \qquad~-\sum_{j\in J}v_{j}^{max} \mu_{max,j}z_{j}\\[-2pt] & \left.\qquad \qquad \qquad~-\sum_{j\in J}\mu_{max2,j} w_{j} + \sum_{j\in J}\mu_{min2,j} w_{j} - \sum_{j\in J}a_{j} \right)(1+\epsilon) \\ & \qquad \qquad \qquad~- \sum_{j\in J}y_{j} \geq 0\\ & \qquad \qquad \quad \sum_{i\in I}S_{i,glc}u_{i} + glc + \mu_{min,glc} -\mu_{max,glc} - \mu_{max2,glc}\\ &\qquad \qquad \quad \quad + \mu_{min2,glc} = 0 \\ &\qquad \qquad \quad \sum_{i\in I}S_{i,biom}u_{i} + \mu_{biom} + \mu_{min,biom} -\mu_{max,biom}\\ &\qquad \qquad \quad \quad- \mu_{max2,biom} + \mu_{min2,biom} = 0\\ & \qquad \qquad \quad \sum_{i\in I}S_{i,j}u_{i} + \mu_{min,j} -\mu_{max,j} - \mu_{max2,j} + \mu_{min2,j}\\ &\qquad \qquad \quad \quad= 0, \forall j\in J, j\neq \text{biom, glc}\\ & \qquad \qquad \quad \mu_{max2,j}\left(v_{j}^{max} \!\,-\, \!w_{j}\right) \!\!- \! \mu_{min2,j}\left(v_{j}^{min} \!- \!w_{j}\right) \,-\, a_{j} \leq 1, \forall j\in J\\ & \qquad \qquad \quad v_{j}, glc,u_{i} \in \mathbb{R},\forall j\in J, \forall i\in I, \\ & \qquad \qquad \quad \mu_{biom},\mu_{min,j},\mu_{max,j},\mu_{max2,j}, \mu_{min2,j}, a_{j},y_{j}\geq 0,\forall j\in J \end{aligned} $$
(7)
In order to solve the max-min problem above, we use the strong duality theorem again to transform the minimization problem into its dual maximization representation, resulting in its equivalent single-level MIP as:
$$ \begin{aligned} & \underset{z_{j},\gamma, p_{j},q_{j},x,c_{j},t,s_{j},r_{j}}{\text{max}} \quad \gamma_{glc}v_{glc\_uptake} + \gamma_{biom}v_{biomass}^{target} + \sum_{j\in J}v_{j}^{min}x_{min,j}z_{j} - \\ &\qquad \qquad \quad \sum_{j\in J}v_{j}^{max}x_{max,j}z_{j} - \sum_{j\in J}x_{max2,j}w_{j} + \sum_{j\in J}x_{min2,j}w_{j}\\[-2pt] &\qquad\qquad\qquad-\sum_{j\in J}c_{j} - \sum_{j\in J}q_{j}\\[-2pt] & \text{s.t.}\qquad\sum_{j\in J}(1- z_{j}) \leqslant K, \\[-2pt] & \quad \qquad\sum_{j\in J}S_{i,j} s_{j} = 0, \forall i\in I,~s_{glc} = v_{glc\_uptake} \\[-2pt] & \quad \qquad s_{biom} \geqslant v_{biomass}^{target},~v_{j}^{min}z_{j} \leqslant s_{j} \leqslant v_{j}^{max}z_{j}, \forall j\in J \\[-2pt] & \quad \qquad s_{j}\,-\,r_{j}\left(v_{j}^{max}\,-\,w_{j}\right)\! \!\leq\! w_{j}, ~s_{j}\!\,-\,r_{j}\left(v_{j}^{min}\,-\,w_{j}\right)\! \!\geq\! w_{j},~0\!\!\leq\!\! r_{j} \!\leq\! 1, \forall {j}\in J\\[-2pt] & \quad \qquad \sum_{i\in I}S_{i,chem}\gamma_{i} + x_{min,chem} - x_{max,chem} - x_{max2,chem}\\[-2pt] & \quad \qquad\quad + x_{min2,chem} = 1\\[-2pt] & \quad \qquad \sum_{i\in I}S_{i,glc}\gamma_{i} + \gamma_{glc} + x_{min,glc} - x_{max,glc} - x_{max2,glc}\\[-2pt] & \quad \qquad \quad + x_{min2,glc} = 0\\[-2pt] & \quad \qquad \sum_{i\in I}S_{i,biom}\gamma_{i} + \gamma_{biom} + x_{min,biom} - x_{max,biom} - x_{max2,biom}\\[-2pt] & \quad \qquad \quad + x_{min2,biom} = 0\\[-2pt] & \quad \qquad \sum_{i\in I}S_{i,j}\gamma_{i} + x_{min,j} -x_{max,j} - x_{max2,j} + x_{min2,j} = 0, \forall {j}\in J, j\\[-2pt] & \quad \qquad \quad \neq \text{biom,glc,chem} \\[-2pt] & \quad \qquad x_{max2,j}\left(v_{j}^{max}\,-\,w_{j}\right) \,-\, x_{min2,j}\left(v_{j}^{min}-w_{j}\right) - c_{j} - t \leq 0, \forall {j}\in J\\[-2pt] & \quad \qquad \sum_{j\in J} S_{ij}p_{j} = 0, \forall {i}\in I \\[-2pt] & \quad \qquad (1+\epsilon)v_{glc\_uptake}t + p_{glc} = 0 \\[-2pt] & \quad \qquad -v_{j}^{max}(1+\epsilon)tz_{j} - p_{j} \leq 0, \forall j\in J \\[-2pt] & \quad \qquad v_{j}^{min}(1+\epsilon)tz_{j} + p_{j} \leq 0, \forall j \in J\\[-2pt] & \quad \qquad (1+\epsilon)v_{biomass}^{target}t + p_{biom} \leq 0 \\[-2pt] & \quad \qquad -w_{j}(1+\epsilon)t - p_{j} - \left(v_{j}^{max}-w_{j}\right)q_{j} \leq 0, \forall j\in J \\[-2pt] & \quad \qquad w_{j}(1+\epsilon)t + p_{j} + \left(v_{j}^{min}-w_{j}\right)q_{j} \leq 0, \forall j\in J \\[-2pt] & \quad \qquad q_{j} - (1 + \epsilon)t \leq 0, \forall j\in J\\[-2pt] & \quad \qquad s_{j}, \gamma_{glc}, \gamma_{i}, p_{j} \in \mathbb{R},\forall i\in I,\forall j\in J\\[-2pt] & \quad \qquad\gamma_{biom}, x_{min,j}, x_{max,j}, x_{max2,j}, x_{min2,j}, c_{j}, t, q_{j}, r_{j} \geq 0, z_{j} \\[-2pt] & \quad \qquad \quad \in\lbrace0,1\rbrace, \forall j\in J. \end{aligned} $$
(8)
Since some knockout strategies z
j
may not have corresponding feasible solutions to the inner problem of (7), causing the corresponding dual solutions to be unbounded, new continuous decision variables s
j
and r
j
are added here in order to enforce the feasibility of the inner primal problem of (7) to avoid such degenerated cases. The bilinear terms in the objective function and the constraints in (8) can be further linearized by using the commonly adopted big-M method. This single-level MILP problem can be solved efficiently for large-scale problems by available solvers.
Pessimistic strain optimization II: P-OptKnock
In OptKnock [17], the inner-level mutant survival model is approximated by the maximum biomass growth condition. In order to optimize for the overproduction of the target biochemicals in worse-case scenario, we can similarly write the corresponding pessimistic formulation as what we have derived for P-ROOM:
$$ \begin{aligned} & \underset{z_{j},\forall j\in J}{\text{max}} \quad \underset{v_{j}\in Y(z_{j})}{\text{min}} \quad v_{chemical} \\ & \text{s.t.} \qquad \sum_{j\in J}(1- z_{j}) \leqslant K \\ & \quad \qquad z_{j} \in \left\lbrace 0, 1\right\rbrace, \forall j\in J\\ & \quad \qquad Y(z_{j})=\text{argmax} \left\lbrace v_{biom}: \sum_{j\in J} S_{ij}v_{j} = 0, \forall {i} \in I,\right. \\ &\qquad \qquad \qquad v_{glc} = v_{glc\_uptake},\\ & \qquad \qquad \qquad \left.{\vphantom{\sum_{j\in J}}}v_{biom} \geqslant v_{biomass}^{target}, \quad v_{j}^{min}z_{j} \leqslant v_{j} \leqslant v_{j}^{max}z_{j}, \forall {j}\in J \right\rbrace. \end{aligned} $$
(9)
By adopting the same ε-approximation to the pessimistic problem in (9) to incorporate the inner-level model uncertainty, we have the P-OptKnock formulation as:
$$ {}\begin{aligned} &\underset{z_{j}: z_{j} \in \left\lbrace 0,1\right\rbrace,\forall j\in J,\sum_{j\in J}(1- z_{j}) \leqslant K}{\text{max}}~\underset{v_{j}, \forall j\in J}{\text{min}}~v_{chemical} \\ &\qquad \quad~\text{s.t.}~\sum_{j\in J} S_{ij}v_{j} = 0, \forall i\in I, v_{glc} = v_{glc\_uptake}, \\ &\qquad \qquad \quad\,\, v_{biom}\geqslant v_{biomass}^{target}\\ &\qquad \qquad\quad~v_{j}^{min}z_{j} \leqslant v_{j} \leqslant v_{j}^{max}z_{j}, \forall j\in J \\ &\qquad \qquad\quad~v_{biom}\geq \varphi(\mathbf{v}^{*})(1-\epsilon), \end{aligned} $$
(10)
where φ(v
∗) is the inner-most level function value of a given inner-level decision variable v
∗. Similar to the approach we have employed for P-ROOM, we can convert this optimization problem (10) to a single-level MIP. We note that the single-level MIP equivalent of the pessimistic formulation P-OptKnock is indeed similar to the work in [26] when the tolerance level ε is chosen as 0. However, we emphasize that our proposed pessimistic framework is more general and can be extended to different bi-level strain optimization formulations that employ various cell-survival models.