### Combinatorial drug optimization problem

Suppose we have *N* different types of drugs, where each drug can take one of pre-specified concentrations. Our goal is to predict the optimal drug cocktail, by mixing the available drugs, that maximizes the overall therapeutic effect. Let *x*_{
n
} be the concentration of the *n*-th drug, where *x*_{
n
} can take one of *M*_{
n
} distinct concentrations in the set {\mathcal{C}}_{n}=\left\{{c}_{n}^{1}\mathsf{\text{,}}{c}_{n}^{2}\mathsf{\text{,}}{c}_{n}^{3},\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}.\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}.\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}.\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}},\phantom{\rule{0.3em}{0ex}}{c}_{n}^{{M}_{n}}\right\}, where {c}_{n}^{k}<{c}_{n}^{k+1} for all *k*. The drug cocktail is represented by an *N*-dimensional vector **x** = (*x*_{1}, *x*_{2}, . . . , *x*_{
N
}), which consists of the *N* drug concentrations. Let *f*(**x**) be the normalized drug response that quantitatively measures the effectiveness of a given drug combination **x**. We assume the response has been normalized such that 0\le f\left(\mathbf{x}\right)\le 1\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{for}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\mathbf{x}\in \mathcal{X}, where \mathcal{X}={\mathcal{C}}_{1}\phantom{\rule{0.3em}{0ex}}\times \phantom{\rule{0.3em}{0ex}}{\mathcal{C}}_{2}\phantom{\rule{0.3em}{0ex}}\times \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\cdot \cdot \cdot \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\times \phantom{\rule{0.3em}{0ex}}{\mathcal{C}}_{N} is the set of all possible drug combinations. We denote the number of all possible drug combinations as M=\phantom{\rule{0.3em}{0ex}}\left|\mathcal{X}\right|\phantom{\rule{0.3em}{0ex}}={\prod}_{n=1}^{N}{M}_{n}.f\left(\mathbf{x}\right)=0 implies that the drug cocktail **x** is completely ineffective, and larger *f* (**x**) implies higher efficacy. In practical applications, *f* (**x**) may be obtained by monitoring the cell response to a drug intervention using fluorescent imaging, microarrays, or sequencing techniques. Under this setting, we aim to find the optimal drug combination **x*** that maximizes the normalized drug response:

\underset{\mathbf{x}\in \mathcal{X}}{{\mathbf{x}}^{*}=\text{arg}\text{max}\text{f}\left(\mathbf{x}\right)}.

As we can see, this is a combinatorial optimization problem, in which we have to find the optimal drug combination out of *M*_{1} *M*_{2} . . . *M*_{
N
} possible combinations. The total number of distinct drug combinations quickly grows as the number of drugs increases. Considering the practical cost of experimentally measuring the normalized drug response function *f* (**x**), it is apparent that we cannot test all drug combinations to find the most effective one.

### Stochastic search algorithms

Stochastic search algorithms [9, 11] aim to efficiently identify the potent drug combinations without exploring the entire combinatorial solution space. The basic idea is to randomly search through the solution space by iteratively updating the drug combination until an effective combination emerges. At each step, the current drug combination is incrementally updated towards the direction that is likely to improve the overall drug response. The updating decision is made in a stochastic manner, which allows the search to proceed towards directions that are deemed to be less likely to improve the response. This is an important characteristic of stochastic search algorithms, which is critical for keeping the search from being trapped in local maxima. Since a stochastic search algorithm tries to arrive at the optimal solution (i.e., the most effective drug combination) by performing iterative local searches, its overall performance depends on how it chooses the next solution state (i.e., an updated drug combination in \mathcal{X}, the set of all possible combinations) from a given state (i.e., the current drug combination). The two performance metrics of interest are: (i) the effectiveness of the predicted drug combination, in terms of how close its response is to the optimal response, and (ii) the number of search steps that the algorithm needs to take until an effective combination is found. Basically, we want to predict a potent drug cocktail by testing minimal number of drug combinations to minimize the actual experimental cost for measuring the cell response to combinatorial drugs. When choosing the next state, the search algorithm has to be as parsimonious as possible, in terms of the number of function evaluations, so that the overall experimental cost for identifying the optimal drug combination can be minimized. This has been one of the main design considerations of existing stochastic search algorithms that have been developed for combinatorial drug optimization [9, 11].

For example, the Gur Game algorithm adopted in [9] determines how to update the drug combination solely based on the current drug response. Suppose {\mathbf{x}}^{c}=\left({x}_{1}^{c},\phantom{\rule{0.3em}{0ex}}{x}_{2}^{c},\cdots ,{x}_{N}^{c}\right) is the current drug combination with a normalized drug response of *f* (**x**^{c}). The algorithm generates *N* random numbers *r*_{
n
} ∈ [0,1], one for each drug. Each *r*_{
n
} is compared against the current drug response {\mathbf{x}}^{c}=\left({x}_{1}^{c},\phantom{\rule{0.3em}{0ex}}{x}_{2}^{c},\cdots ,{x}_{N}^{c}\right), which is used to either "reward" or "penalize" the *n*-th drug. For example, the drug is rewarded if *f* (**x**^{c}) >*r*_{
n
}. Otherwise, it is penalized. This update process is repeated for each of the *N* drugs. According to this scheme, drug combinations with higher response are more likely to be rewarded, while combinations with lower response are more likely to be penalized. In the long run, the algorithm is expected to drive the concentration of each drug towards the one that maximizes the response. Now, an important relevant question is what is the right way of rewarding (or penalizing) the current concentration of a given drug. The Gur Game algorithm used in [9] uses a predetermined finite state automaton (FSA) for this purpose. For example, let {x}^{c}={c}_{n}^{k}\in {\mathcal{C}}_{n} be the current concentration of the *n*-th drug. Suppose we have *f* (**x**^{c}) >*r*_{
n
}, hence the current concentration {x}_{n}^{c} of the *n*-th drug should be rewarded. The FSA in [9] is designed such that the drug concentration is increased further if the current concentration {x}_{n}^{c} is larger than a reference concentration {c}_{n}^{\mathsf{\text{ref}}}, and decreased further if {x}_{n}^{c} is smaller than {c}_{n}^{\mathsf{\text{ref}}}. More specifically, if {x}_{n}^{c}={c}_{n}^{k}and *f*(**x**^{c}) >*r*_{
n
}, then the drug concentration is updated as follows.

{x}_{n}^{c}=\left\{\begin{array}{c}\hfill {c}_{n}^{k+1},\phantom{\rule{1em}{0ex}}\mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{c}_{n}^{\mathsf{\text{ref}}}\mathsf{\text{and}}k{M}_{n}\hfill \\ \hfill {c}_{n}^{k},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{c}_{n}^{\mathsf{\text{ref}}}\mathsf{\text{and}}k={M}_{n}\hfill \\ \hfill {c}_{n}^{k-1},\phantom{\rule{1em}{0ex}}\mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{c}_{n}^{\mathsf{\text{ref}}}\mathsf{\text{and}}k1\hfill \\ \hfill {c}_{n}^{k},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{c}_{n}^{\mathsf{\text{ref}}}\mathsf{\text{and}}k=1\hfill \end{array}\right.

(1)

The reference concentration {c}_{n}^{\mathsf{\text{ref}}} is typically chosen as the median of the set {\mathcal{C}}_{n}. As shown in (1), the drug concentration remains unchanged if it cannot be increased (or decreased) further. Now, suppose that *f* (**x**^{c}) <*r*_{
n
}, and therefore the current concentration {x}_{n}^{c}={c}_{n}^{k} should be penalized. In this case, the concentration is updated in the opposite direction:

{x}_{n}^{c}=\left\{\begin{array}{c}\hfill {c}_{n}^{k-1},\phantom{\rule{1em}{0ex}}\mathsf{\text{if}}\phantom{\rule{0.3em}{0ex}}{x}_{n}^{c}{c}_{n}^{\mathsf{\text{ref}}}\hfill \\ \hfill {c}_{n}^{k+1},\phantom{\rule{1em}{0ex}}\mathsf{\text{if}}\phantom{\rule{0.3em}{0ex}}{x}_{n}^{c}{c}_{n}^{\mathsf{\text{ref}}}\hfill \end{array}\right.

(2)

Note that penalization moves the current drug concentration closer to the reference concentration {c}_{n}^{\mathsf{\text{ref}}}. As previously discussed in [11], one of the weaknesses of the Gur Game algorithm is that it uses a predetermined FSA for updating (i.e., rewarding/penalizing) the drug concentrations and does not adapt to the drug response function at hand, which is not known in advance. As a result, the algorithm may perform poorly unless the drug response function *f*(**x**) is properly normalized and the reference concentration {c}_{n}^{\mathsf{\text{ref}}} is chosen adequately for each drug. For example, consider the one-dimensional drug response *f*(*x*) shown in Figure 1A. As we can see, the drug response has been over-normalized, hence *f*(*x*) < 0.5 for any allowed concentration *x* ∈ [*c*_{min}, *c*_{max}]. Since *f*(*x*) < 0.5, a uniformly distributed random number *r* ∈ [0,1] is more likely to be larger than *f*(*x*). This implies that the Gur Game algorithm always tends to penalize the current drug concentration (no matter what its value is), which will probabilistically drive the concentration towards *c*_{ref} although it is clearly not optimal. Figure 1B shows another drug response, for which the Gur Game algorithm will not work properly. In this example, we have *f*(*x*) > 0.5, hence the Gur Game algorithm is always more likely to reward the current drug concentration, which tends to drive the concentration away from the reference concentration *c*_{ref}. This will push the concentration either towards *c*_{min} or *c*_{max}, both of which are suboptimal.

The enhanced stochastic algorithm proposed in [11] addresses this problem by making the search algorithm adapt to a given drug response. The basic idea of this algorithm is to make an "informed-guess" about how to beneficially update a given drug concentration, instead of following a predetermined update rule. Unlike the Gur Game algorithm in [9], where all *N* drugs are simultaneously updated based on the (same) current drug response *f*(**x**^{c}), the enhanced algorithm updates the concentration of one drug at a time. As an example, suppose during the last update of the *n*-th drug, the drug combination has been updated as

\mathbf{x}=\left({x}_{1},\cdots \phantom{\rule{0.3em}{0ex}},\cdots \phantom{\rule{0.3em}{0ex}},{x}_{N}\right)\Rightarrow {\mathbf{x}}^{\prime}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{=}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{(}}{x}_{1}^{\prime},\cdots \phantom{\rule{0.3em}{0ex}},\cdots \phantom{\rule{0.3em}{0ex}},{x}_{N}^{\prime}\mathsf{\text{),}}

where **x** and {\mathbf{x}}^{\prime} are identical except for the concentration of the *n*-th drug, hence {x}_{i}={x}_{i}^{\prime}\left(i\ne n\right)and {x}_{i}\ne {x}_{i}^{\prime}\left(i=n\right). We assume that *x*_{
n
} and {x}_{n}^{\prime} differ only by a single concentration level, so that {x}_{n}={c}_{n}^{k}and{x}_{n}^{\prime}={c}_{n}^{k+1}, or {x}_{n}={c}_{n}^{k+1}and {x}_{n}^{\prime}={c}_{n}^{k}, for some *k*. The algorithm compares the two drug responses *f* (**x**) and f\left({\mathbf{x}}^{\prime}\right), thereby determine wether it would be more beneficial to further increase or decrease the concentration of the *n*-th drug. For example, we may have the following four cases.

\begin{array}{c}\left(\mathsf{\text{Case-1}}\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}{{x}^{\prime}}_{n}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}f\left(\mathbf{x}\right)f\left({\mathbf{x}}^{\prime}\right):\mathbf{i}\mathbf{n}\mathbf{c}\mathbf{r}\mathbf{e}\mathbf{a}\mathbf{s}\mathbf{i}\mathbf{n}\mathbf{g}\mathsf{\text{theconcentrationismorebeneficial}}\\ \left(\mathsf{\text{Case-}}2\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}{{x}^{\prime}}_{n}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}f\left(\mathbf{x}\right)f\left({\mathbf{x}}^{\prime}\right):\mathbf{i}\mathbf{n}\mathbf{c}\mathbf{r}\mathbf{e}\mathbf{a}\mathbf{s}\mathbf{i}\mathbf{n}\mathbf{g}\mathsf{\text{theconcentrationismorebeneficial}}\\ \left(\mathsf{\text{Case-}}3\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}{{x}^{\prime}}_{n}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}f\left(\mathbf{x}\right)f\left({\mathbf{x}}^{\prime}\right):\mathbf{d}\mathbf{e}\mathbf{c}\mathbf{r}\mathbf{e}\mathbf{a}\mathbf{s}\mathbf{i}\mathbf{n}\mathbf{g}\mathsf{\text{theconcentrationismorebeneficial}}\\ \left(\mathsf{\text{Case-}}4\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}{{x}^{\prime}}_{n}\mathsf{\text{and}}f\left(\mathbf{x}\right)f\left({\mathbf{x}}^{\prime}\right):\mathbf{d}\mathbf{e}\mathbf{c}\mathbf{r}\mathbf{e}\mathbf{a}\mathbf{s}\mathbf{i}\mathbf{n}\mathbf{g}\text{theconcentrationismorebeneficial}.\end{array}

(3)

The above rules allow the algorithm to adaptively determine *how* to reward (or penalize) a given drug concentration based on the observed drug response values. However, the decision *whether* to reward or penalize the current drug is made in a probabilistic manner. For this purpose, we evaluate the following function

g\left(\mathbf{x},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{x}}^{\prime}\right)=\frac{1}{2}\left\{1+\alpha \cdot \text{max}\left[f\left(\mathbf{x}\right),f\left({\mathbf{x}}^{\prime}\right)\right]\right\},

(4)

where *α* ∈ [0, 1] is a control parameter that adjusts the randomness of the algorithm [11]. This *g*(**x**, **x'**) is compared with a uniformly distributed random number *r*_{
n
} ∈ [0, 1]. If *g*(**x**, **x'**) *> r*_{
n
}, the *n*-th drug is rewarded, i.e., updated in such a way that appears to be more beneficial for enhancing the drug response according to the rules shown in (3). Otherwise, the *n*-th drug is penalized, i.e., updated in a way that appears to be less beneficial based on the past observations. It is not difficult to see that this algorithm is always more likely to reward, or beneficially update, a given drug. Since the algorithm proposed in [11] adaptively determines how to update the drug concentration based on previous observations, it can also effectively deal with drug response functions shown in Figures 1A and 1B, for which the Gur Game algorithm does not perform well. Despite its merits, this algorithm also has its own shortcomings. For example, as the update rule for a given drug is determined only based on the two observations that correspond to its latest update, not on a longer-range trend, the algorithm may be sensitive to small variations in the drug response. As a result, it may not show satisfactory search performance for drug response functions that are similar to the one in Figure 1C. Furthermore, considering that *f*(**x**) has to be experimentally estimated, where a certain level of measurement noise and small variations due to a number of practical factors may not be avoidable, such sensitivity may adversely affect the overall performance of the algorithm. Another weakness of the algorithm is that it only utilizes a very small part of the past observations without fully utilizing them. In the following section, we introduce a novel stochastic search algorithm that can effectively address the aforementioned issues.

### The adaptive reference update (ARU) stochastic search algorithm

In order to make the search algorithm robust to small variations in the observed drug response, the update rules have to be decided based on the general trend of the drug response change over a wide range of drug concentration, not just based on the response change resulting from a *single-level* concentration change. Based on this motivation, we propose a novel algorithm called the **adaptive reference update (ARU) algorithm**. In this algorithm, we compare the current drug response *f*(**x**^{c}) with the response *f*(**x**^{ref}) of a *reference drug combination* **x**^{ref}, which is adaptively updated based on past observations. In the beginning, **x**^{ref} is set to the initial drug concentration, where we start the search process. During the search, when the algorithm encounters a local maximum, the current reference combination is replaced by the corresponding drug combination. As an example, let us consider the one-dimensional drug response function *f*(*x*) in Figure 2. For illustration, we consider the following hypothetical search process, where the drug concentration is constantly updated from left to right, starting from the lowest concentration *c*_{min} to the highest concentration *c*_{max}. Suppose the search begins at the concentration *x* = *c*_{min}. Initially the reference concentration is also set to this initial drug concentration *x*^{ref} ← *c*_{min}. As the search reaches the first local maximum, the reference concentration is updated to this local maximum solution *x*^{ref} ← *x*^{ref1}. As the search continues to the right, this reference concentration is used until we reach the next local maximum. After passing the second local maximum solution *x*^{ref2}, the reference point is updated to *x*^{ref} ← *x*^{ref2}. In a similar manner, as the search continues further to the right, the reference point is updated to *x*^{ref} ← *x*^{ref3} after passing the third local maximum point.

At each iteration, the current drug response *f*(**x**^{c}) is compared to the response *f*(**x**^{ref}) of the reference drug combination, based on which the drug update rule is determined. For example, let {\mathbf{x}}^{c}=\left(\cdots ,{x}_{n}^{c},\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}}\right) and {\mathbf{x}}^{\mathsf{\text{ref}}}=\left(\cdots ,{x}_{n}^{\mathsf{\text{ref}}},\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}}\right), and assume that we want to update the concentration of the *n*-th drug by comparing the two drug response values *f*(**x**^{c}) and *f*(**x**^{ref}). As before, we have the following four possible cases.

\begin{array}{c}\left(\mathsf{\text{Case}}-1\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}<{x}_{n}^{\mathsf{\text{ref}}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}f\left({\mathbf{x}}^{c}\right)f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right):increasing\mathsf{\text{theconcentrationismorebeneficial}}\\ \left(\mathsf{\text{Case}}-2\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{x}_{n}^{\mathsf{\text{ref}}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}f\left({\mathbf{x}}^{c}\right)f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right):increasing\mathsf{\text{theconcentrationismorebeneficial}}\\ \left(\mathsf{\text{Case}}-3\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{x}_{n}^{\mathsf{\text{ref}}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}f\left({\mathbf{x}}^{c}\right)f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right):decreasing\mathsf{\text{theconcentrationismorebeneficial}}\\ \left(\mathsf{\text{Case}}-4\right)\phantom{\rule{2.77695pt}{0ex}}{x}_{n}^{c}{x}_{n}^{\mathsf{\text{ref}}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}f\left({\mathbf{x}}^{c}\right)f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right):decreasing\mathsf{\text{theconcentrationismorebeneficial}}.\end{array}

(5)

Conceptually, we can view the above as estimating the "virtual" slope between two points \left({x}_{n}^{c},\phantom{\rule{2.77695pt}{0ex}}f\left({\mathbf{x}}^{c}\right)\right) and \left({x}_{n}^{\mathsf{\text{ref}}},\phantom{\rule{2.77695pt}{0ex}}f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right)\right) as follows

\frac{f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right)-f\left({\mathbf{x}}^{c}\right)}{{x}_{n}^{\mathsf{\text{ref}}}-{x}_{n}^{c}},

(6)

based on which we determine how to update the concentration *x*_{
n
} of the *n*-th drug to increase the drug response *f*(\mathbf{x}). Given the update rules in (5), the actual update decision is made by evaluating the following function

g\left({\mathbf{x}}^{c},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{x}}^{\mathsf{\text{ref}}}\right)=\frac{1}{2}\left\{1+\alpha \cdot \text{max}\left[f\left({\mathbf{x}}^{c}\right),f\left({\mathbf{x}}^{\mathsf{\text{ref}}}\right)\right]\right\}

(7)

and comparing it with a random number *r*_{
n
} ∈ [0, 1]. If *g*(**x**^{c}, **x**^{ref}) >*r*_{
n
}, the drug concentration *x*_{
n
} is updated by a single level, following the beneficial update direction predicted by (5). Otherwise, the concentration is updated in the opposite direction. As briefly mentioned before, the parameter *α* ∈ [0, 1] controls the randomness of the algorithm. For example, *α* = 0 will make the search process completely random, regardless of the observed drug responses. Using a larger *α* means that we are giving a larger weight to the past observations when deciding how to update the drug concentrations. The value of this control parameter is limited to *α* ≤ 1 such that *g*(**x**^{c}, **x**^{ref}) ≤ 1. Also note that we always have *g*(**x**^{c}, **x**^{ref}) ≥ 0.5, which implies that at any drug update step, the update is always more likely to take place in accordance with the rules in (5), which have been derived based on past observations of the drug response. In other words, the ARU algorithm tries to effectively utilize the past response data to beneficially update the drug concentrations, and ultimately, to identify a potent drug combination, while keeping the search still stochastic. For illustration, let us again consider the drug response function in Figure 2, where the hypothetical search process proceeds from the lowest drug concentration to the highest concentration. The black solid arrows below the graph shows the drug update direction that gets higher probability according to the new algorithm, described above. For example, in region-A (*c*_{min} *< × < x*^{ref1}), the algorithm tends to increase the drug concentration *x* further, as the response *f*(*x*) is larger than *f*(*c*_{min}) of the initial reference concentration (i.e., *c*_{min}). As *x* continues to increase and passes the first local maximum point *x*^{ref1}, the reference is updated to *x*^{ref} ← *x*^{ref1}. In region-B (*x*^{ref1} *< × < x*^{ref2}), the search algorithm tends to drive the concentration towards *x*^{ref1} by decreasing the concentration. Suppose the search continues to increase the drug concentration *x* beyond *x*^{ref2}, the second local maximum point, despite the tendency of the algorithm to decrease *x* back to *x*^{ref1}. After passing *x*^{ref2}, the reference is updated to *x*^{ref} ← *x*^{ref2}. In region-C, the search algorithms assigns higher probability to the update rule that tries to bring the concentration down to *x*^{ref2}, since *f*(*x*) *< f*(*x*^{ref2}) in the given region. However, once *x* enters region-D, where *f*(*x*) *> f*(*x*^{ref2}), the algorithm begins to drive the drug concentration *x* further to the right until it passes the third local maximum point *x*^{ref3}. The reference concentration is updated to *x*^{ref} ← *x*^{ref3}, once the search continues to the right and the drug concentration *x* gets larger than *x*^{ref3}. Since *f*(*x*^{ref3}) is larger than *f*(*x*) in region-D (*x*^{ref3} *< × < c*_{max}), the search algorithm will tend to bring the concentration down to the current reference concentration, namely, *x*^{ref} = *x*^{ref3}.

Choosing a local maximum solution as a reference combination has a number of practical advantages. First of all, it allows the algorithm to adjust the drug update rules based on a long-range trend of the given drug response function, which makes the algorithm robust to small variations in the observed response. Another advantage of using a long-range trend is that the search process will become also less sensitive to random fluctuations that may exist in the observed drug response. Considering that the drug response function *f*(**x**) has to be experimentally estimated through actual biological experiments, where random factors (e.g., measurement noise) that affect the estimation results cannot be completely ruled out, such robustness is critical for the algorithm to be used in practical drug optimization applications. It is also beneficial to use the drug combination that corresponds to the most recent local maximum response, instead of the drug combination that has yielded the highest response among all past combinations, as the reference point. This prevents the search process from dwelling too much on past observations, while keeping it robust to variations and random fluctuations.

### Drug response functions

In order to evaluate the overall performance of the ARU algorithm, we used the algorithm to search for the optimal drug cocktail for several different drug response functions.

**Two-dimensional drug response functions** For performance assessment, we first used the four two-dimensional drug response functions that are shown in Figure 3. The first drug response function *f*_{2a}(*x*_{1}, *x*_{2}) shown in Figure 3A is the normalized HIV inhibitor response obtained from [17], where *x*_{1} ∈ {0, 0.01, 0.03, 0.09, 0.27, 0.82, 2.47, 7.41, 22.22, 66.67}(*nM*) was considered for Maraviroc and *x*_{2} ∈ {0, 0.09, 0.27, 0.8, 2.41, 7.22, 21.67, 65}(*nM*) for ROAb14. The second drug response *f*_{2b}(*x*_{1}, *x*_{2}) shown in Figure 3B is the normalized second De Jong function (Rosenbrock's saddle) [18]. We considered *x*_{1}, *x*_{2} ∈ {*c*_{0}, *c*_{1}, ... , *c*_{20}}, where *c*_{
k
} = 4(*k/* 20 - 0.5), obtained by evenly dividing the range [-2, 2] into 21 distinct values. The third drug response function *f*_{2c}(*x*_{1}, *x*_{2}) in Figure 3C is the normalized lung cancer inhibition response obtained from [1], where *x*_{1} ∈ {0, 1, 2, 4, 6, 8, 12, 16, 20, 22}(*μM* ) was considered for Chlorpromazine and *x*_{2} ∈ {0, 0.25, 0.4, 0.6, 0.8, 1, 1.5, 2, 4, 6.8}(*μM* ) for Pentamidine. Finally, Figure 3D shows the fourth response function *f*_{2d}(*x*_{1}, *x*_{2}), the normalized bacterial (*S. aureus*) inhibition response reported in [6]. *x*_{1} ∈ {0, 0.08, 0.16, 0.32, 0.63, 1.25, 2.5, 5, 10} was considered for Trimethoprim and *x*_{2} ∈ {0, 0.31, 0.62, 1.25, 2.5, 5, 10, 20, 40} was considered for Sulfamethoxazole. All four drug response functions were normalized to span the entire range [0, 1], such that the minimum response is 0 and the maximum response is 1.

**Multi-dimensional drug response functions** To evaluate the performance for optimizing multi-drug cocktails, we defined several hypothetical drug response functions with up to six drugs. First, we defined two 3-dimensional drug functions *f*_{3a}(*x*_{1}, *x*_{2}, *x*_{3}) and *f*_{3b}(*x*_{1}, *x*_{2}, *x*_{3}). The first function is defined as

{f}_{3a}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3}\right)={x}_{1}^{2}\cdot {\text{sin}}^{2}\left({x}_{2}\right)\cdot {\text{cos}}^{2}\left({x}_{3}\right),

(8)

where each of *x*_{1}, *x*_{2}, *x*_{3} can take one of the 11 discrete concentrations that evenly divide the range [-2.5, 2.5]. The second function is defined as follows

{f}_{3b}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3}\right)=peaks\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2}\right)\cdot {x}_{3},

(9)

using the Matlab peaks(*x*_{1}, *x*_{2}) function. We assume that each drug can take one of the 11 discrete values that evenly divide [-3, 3]. Next, we defined two 4-dimensional drug functions

{f}_{4a}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3},\phantom{\rule{2.77695pt}{0ex}}{x}_{4}\right)={x}_{1}\cdot {e}^{-\left({x}_{1}^{2}+{x}_{2}^{2}+{x}_{3}^{2}+{x}_{4}^{2}\right)}

(10)

and

{f}_{4b}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3},\phantom{\rule{2.77695pt}{0ex}}{x}_{4}\right)={\text{cos}}^{2}\left(0.3{x}_{1}\right)\cdot \text{sin}\left(0.3{x}_{2}\right)\cdot \text{tan}\left(0.1{x}_{3}\right)\cdot {x}_{4}.

(11)

We assume that *x*_{1} and *x*_{2} in the first function *f*_{4a}(*x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}) can take one of the 11 discrete values that evenly divide the range [-2, 2], while *x*_{3} and *x*_{4} can take one of the 11 discrete values that evenly divide the range [-3, 3]. For the second drug response function *f*_{4b}(*x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}), we assume that each drug can take one of the 11 distinct values that evenly divide [-3, 3]. In addition, we also defined the following 5-dimensional drug response functions

{f}_{5a}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3},\phantom{\rule{2.77695pt}{0ex}}{x}_{4},\phantom{\rule{2.77695pt}{0ex}}{x}_{5}\right)={e}^{-{x}_{1}}\cdot {\text{cos}}^{2}\left({x}_{2}\right)\cdot {x}_{3}^{2}\cdot \left[{e}^{-{\left({x}_{4}+2\right)}^{2}-{\left({x}_{5}+3\right)}^{2}}+{e}^{-{\left({x}_{4}-2\right)}^{2}-{\left({x}_{5}-3\right)}^{2}}\right]

(12)

and

{f}_{5b}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3},\phantom{\rule{2.77695pt}{0ex}}{x}_{4},\phantom{\rule{2.77695pt}{0ex}}{x}_{5}\right)=\frac{1}{2}\mathsf{\text{peaks}}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2}\right)\cdot \text{cos}\left(0.5{x}_{3}\right)\cdot \text{sin}\left(0.5{x}_{4}\right)\cdot {x}_{5}^{2}.

(13)

For the first function *f*_{5a}(*x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}, *x*_{5}), *x*_{1} and *x*_{2} are allowed to take any value from the set of values obtained by evenly dividing the range [-2, 2] into 11 discrete concentrations. The remaining drug concentrations (*x*_{3}, *x*_{4}, and *x*_{5}) can take any of the 11 concentrations that evenly divide [-4.5, 4.5]. For the second drug response function *f*_{5b}(*x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}, *x*_{5}), we assume that each drug can take one of the 11 discrete values that evenly divide [-3, 3]. Finally, we also defined two 6-dimensional drug response functions

{f}_{6a}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3},\phantom{\rule{2.77695pt}{0ex}}{x}_{4},\phantom{\rule{2.77695pt}{0ex}}{x}_{5},\phantom{\rule{2.77695pt}{0ex}}{x}_{6}\right)={e}^{-0.75\left({x}_{1}\right)}\cdot \left({\text{sin}}^{2}\left({x}_{2}\right)+\text{cos}\left({x}_{3}\right)\right)\cdot {e}^{-0.75\left({x}_{4}^{2}+{x}_{5}^{2}\right)}\cdot {x}_{6}

(14)

and

{f}_{6b}\left({x}_{1},\phantom{\rule{2.77695pt}{0ex}}{x}_{2},\phantom{\rule{2.77695pt}{0ex}}{x}_{3},\phantom{\rule{2.77695pt}{0ex}}{x}_{4},\phantom{\rule{2.77695pt}{0ex}}{x}_{5},\phantom{\rule{2.77695pt}{0ex}}{x}_{6}\right)={e}^{-0.1\left({x}_{1}^{2}-{x}_{2}^{2}\right)-0.1\left({x}_{3}^{2}+{x}_{4}^{2}\right)}\cdot {\text{cos}}^{2}\left(0.2{x}_{5}^{3}\right)\cdot \text{sin}\left(0.2{x}_{6}^{3}\right),

(15)

where every drug concentration can take its value from one of the 11 discrete concentrations that evenly divide the range [-2.5, 2.5].