### Methodology

The current version of OSAT provides two algorithms for creation of sample assignment across the batches based on the principle of block randomization, which is an effective approach in controlling variability from nuisance variables such as batches and its interaction with variables of our primary interest [6–8, 13]. Both algorithms are composed of a block randomization step and an optimization step. The default algorithm (implemented in function *optimal.shuffle*) sought to first block all variables considered to generate a single initial assignment setup, then identify the optimal one which minimizes the objective functions (*i.e.,* the one with most homogeneous cross-batch strata distribution) through shuffling the initial setup. The alternative algorithm (implemented in function *optimal.blcok*) sought to first block specified variables (*e.g.*, list of variables of primary interests) to generate a pool of assignment setups, then select the optimal one which minimize the objective functions based on all variables considered (including those variables which are not included in the block randomization step). A detailed description is provided as below.

By combining the variables of interest, we can create a unified variable with its levels based on all possible combinations of the levels of the variables involved. Assuming there are a total of *s* levels in the unified variable (referred as optimization strata in this package) with *S*
_{
j
} samples in each stratum, *j* = 1 … *s*, and assuming we have *m* batches with *B*
_{
i
}, *i = 1… m* wells available in each batch. In an ideal balanced RCBD experiment, we have equal sample size in each strata: *S*
_{
1
}
*= …= S*
_{
s
}
*= S,* and each batch includes the same number of available wells, *B*
_{
1
}
*= … = B*
_{
m
}
*= B*, with equal number of samples from each sample strata.

The expected number of sample from each stratum to each batch is denoted as

*E*
_{
ij
}. One can split it to its integer part and fractal part as

where ⌊*E*
_{
ij
}⌋ is the integer part of the expected number and *δ*
_{
ij
} is the fractal part. In the case of equalbatch size, it reduces to
. When we have RCBD, all *δ*
_{
ij
} are zero.

For an actual sample assignment

where *n*
_{
ij
} is the number of sample in each optimization strata from an actual sample assignment. Our goal is, through a block randomization step and an optimization step, to minimize the difference between expected sample size *E*
_{
ij
} and the actual sample size *n*
_{
ij
}.

The block randomization step is to create initial setup(s) of randomized sample assignment based on strata combining the blocking variables considered. The blocking variables include all variables of interests in the default algorithm, but only a specified subset of variables in the alternative algorithm.

In this step, we sample *i* sets of samples from each strata *S*
_{
j
} with size ⌊*E*
_{
ij
}⌋, as well as *j* sets of wells from each *B*
_{
j
} batches with size of ⌊*E*
_{
ij
}⌋. The two selections are linked together by the *ij* subgroup, randomized in each of them. The rest of samples *r*
_{
j
} = *S*
_{
j
} − ∑ _{
i
}⌊*E*
_{
ij
}⌋ can be assigned to the available wells in each Block *w*
_{
i
} = *B*
_{
i
} − ∑ _{
j
}⌊*E*
_{
ij
}⌋. The probability of a sample in *r*
_{
j
} from strata *S*
_{
j
} being assigned to a well from block *B*
_{
i
} is proportional to the fractal part of the expected sample size *δ*
_{
ij
}. For a RCBD, each batch will have equal number of samples with same characteristic and there is no need for further optimization. However, for other instances where the collection of samples is unbalanced and/or incomplete, an optimization step is needed to create a more optimal setup of sample assignment.

The optimization step aims to identify an optimal setup of sample assignments from multiple candidates. To select optimal sample assignment, we need to measure the variation of sample characteristics between batches. In this package, we define the optimal design as a sample assignment setup that minimizes our objective function based on principle of least square method [

13]. The objective function can be defined as

where *E*
_{
ij
} and *n*
_{
ij
} were defined previously.

In the default algorithm implemented in OSAT, optimization is conducted through shuffling the initial setup obtained in the block randomization step. Specifically, after initial setup is created, we randomly select *k* samples from different batches and shuffle them between batches to create a new sample assignment. Value of the objective function is calculated for the new setup and compared to that of the original one. If the new value is smaller, the new assignment will replace the previous one. This procedure will continue until we reach a pre-set number of attempts (5000 by default).

In the alternative algorithm, multiple (typically thousands of or more) sample assignment setups are first generated by procedure described in the block randomization step above, based only on the list of specified blocking variable(s). The optimal one will be chosen by selecting the setup (from the pool generated in the block randomization step) which minimizes the value of the objective function based on all variables considered. This algorithm will guarantee the identification of a setup that is conformed to the blocking requirement for the list of specified blocking variables, while attempting to minimize the between-batches variations of the other variables considered.