The proposed approach is an integration of experiment and theory to investigate regulatory process dynamics by combining multiple complementary disciplines, including: (i) using fluorescent reporters in molecular technology to study cells' transcriptional activities under drug perturbation; (ii) these being captured by an automatic epifluorescent microscope over a time course; and (iii) such data being processed by large-scale image processing for dynamic analysis. A truly multi-dimensional dynamics of tumor cell response to drugs can be characterized through systematic perturbations to test different combinations of cell types, reporters, and drugs/dosages, augmented by iterative systematic theoretical analysis. This methodology differs from high-throughput technique like RNA expression profiling with microarrays, which provide a snapshot of an aspect of the system at one time point.
Experimental methodology
Understanding cell response to a drug requires experimental designs that ask very specific questions about what is happening in a cell in the absence of a drug and how the cell activities change when the drug is present. The objective of the experimental protocol is to efficiently capture cell process dynamics in response to drugs and thereby obtain a deeper understanding of the genetic regulatory mechanisms, the point being to make preclinical research more predictive. Fluorescent reporters have long been used in molecular technology to study cells' transcriptional activities or the cellular localization of components, either in a population of cells or a single cell [40–42]. In this study, we track the transcriptional activities of particular genes. A fluorescent reporter to serve this purpose can be constructed by fusing the promoter region of a gene of interest with the coding sequence of a fluorescent protein, most commonly a green fluorescent protein (GFP). By delivering a single cassette bearing the promoter/GFP reporter into the genome of each cell in a population of cells, any change in the expression levels of the native coding sequence driven by that promoter will be reflected in the transcriptional activity of the cassette. This allows the estimation of the total fluorescence of the reporter in the cell, captured by imaging with an epifluorescent microscope, which is then used as a relative measure of the transcriptional activity of the native gene. Because this procedure is non-invasive to the cell, it allows tracking of the same cell population for an extended period of time by imaging the same site repeatedly. The recent introduction of automated digital microscopes allows researchers to use multi-well microtiter plates and sequentially capture the transcriptional activities in all wells. In our experimental protocol, a single assay is carried out by epifluorescent imaging of a site at the bottom of each well in a 384 well plate, producing an image of the cells in that region (~200-400 cells) bearing fluorescent reporters. The imaging speed of automated systems easily accommodates sampling an entire 384 well plate at hourly intervals. If needed, the experiment can be extended to multiple plates to cover a wider range of cell types and reporters.
In this experimental set-up, using different wells to test different combinations of cell type, GFP reporter and experimental condition allows this approach to provide a multi-dimensional examination of the cells' responses to a variety of stimuli. Not only can it follow multiple genes simultaneously, but it can also compare cellular activities under various conditions. Furthermore, it captures the dynamics of transcriptional regulation. This produces data on ~200-400 individual cells per well that can be analyzed both individually, as a distribution, or in aggregate, as an average. Fluorescent intensity data can be extracted from these images using specialized image analysis tools developed for this application [43]. This image processing procedures include finding cells, identifying individual cells, and quantifying the fluorescence associated with each cell. The objective is to extract gene expression levels from the fluorescent image and track them over the time course. We approach this goal through morphology-based image processing methods.
Image processing
Typical fluorescent images are shown in Figure 2 (left panels), where nuclei are detected in the blue channel and promoter reporters to study cells' transcriptional activities are detected in the green channel. With a 384-well plate there will be at least 384 videos for evaluation and the number can be much higher if the experiment requires multiple plates to cover all experimental conditions. Visual evaluation is unreliable when one needs to quantitatively compare different conditions and the high-throughput nature of the green fluorescent protein reporter approach calls for a more automatic and quantitative solution to efficiently extract gene-expression levels from the fluorescent images and track them over the time course.
To facilitate automatic processing of the experiment results, the transcriptional levels of the fluorescent images need be properly extracted, quantized, and saved and the image processing algorithm should be fast with good balance between performance and robustness [43]. An algorithm based on morphological image processing [44], in particular, the watershed transformation [45] is currently adopted in our study. Overall, the image processing breaks down into three major components: (i) nuclei channel segmentation, (ii) reporter channel segmentation, and (iii) measurement of cell-by-cell promoter activity levels. Figure 3 shows the segmentation results of a typical fluorescent image pair, where only a portion of the full image is shown in order to show the segmentation details. Once the individual cells are identified, the transcriptional activity represented by the reporter is extracted for every cell by summing up the background subtracted pixel intensity of the whole cell area and taking a log2 transform before being exported.
Experimental set-up for the dosing study
The dosing study is carried out on the colon cancer cell-line HCT116 with a reporter for the MKI67 gene, a nuclear antigen tightly correlated with proliferation [46, 47], with responses to lapatinib treatment with 6 dosages (1 to 32 µ M). First we infect the HCT116 cell lines with the desired packaged reporter (packaged as lentiviral particles). Then plate cells/reporter pair in a media containing a live-cell nuclear stain. The cells are allowed to attach to the plate and grow overnight. Drugs are added to the appropriate wells (we have 6 wells [biological duplicates] for each dosage). In order to remove environmental effects, such as growth factor depletion, there are 6 control wells for each dosage (no drug added, total 36 wells). We image the plate once an hour for 48 hours to characterize the response of each cell/reporter pair to the drug over time. Note that the fluorescence intensity of cells without a GFP reporter expressed is not zero, since cells have numerous small molecules which fluoresce in the same wavelengths as GFP when excited with 488 nm light. This defines the minimum fluorescence, which is approximately 214. One of the time courses from experiment (dosage = 8µ M) is shown in Figure 2. The left panels of Figure 2 show two fluorescent images sampled for the same site in a 48 hour lapatinib treatment for 8 µ M dosage. The right panels of Figure 2 show the log2(GFP) intensity histogram for each time point.
Since MKI67 is turned on during proliferation and off when the cells are not cycling, it is expected to show a binary, switch-like histogram of cell intensities, rather than a graded transition. This behavior is observed in Figure 2. We have the readout of the GFP intensity level for each individual cell/dosage pair with 48 time points. These can be compared with a threshold value to determine whether that cell is shifted or not [37, 43]. Such a reporter assay allows one to determine the dynamics of drug responses for different dosages. Consequently, we propose a time-varying model for the cell shifting process where the drug effect coefficient is assumed changing with time. This is in contrast to many existing approaches where the drug effect coefficient is treated as a constant and the experiment just provides one reading rather than time-series characterization.
Mathematical model formulation
The experimental results provide information on the percentage of cells shifted as a consequence of the drug activity. The measurements facilitate asking important questions in drug development. For instance, does dosing alter the extent of response, the timing of response, or both? In addition to qualitative questions, we are interested in modeling the drug effect quantitatively, which requires a novel mathematical model that is biologically sound and fits the experimental setup. Our experiments and the proposed modeling has two important features: (i) Our experiment is based on the readout of the intensity level of each individual cell, which is compared with a threshold value to determine whether that cell is shifted or not. Although we count the number of shifted cells at each sampling time point, the proposed model is not a population model merely giving the average readout of all the cells. (ii) Our experiment collects time-series data under drug perturbation for 48 hours, with one sample per hour. A time-varying model is proposed for the cell shifting process, where the drug effect coefficient is assumed changing with time.
Because there are different numbers of cells in different wells (the range is about ~200-400 cells per well), we perform normalization to calculate the percentage of cells shifted. Since there are many factors including drug effect that contribute to the cell shifting, calibration is performed by comparing to the control group to exclude other contributing factors. The notations used in this work are listed below
-
N: total number of cells
-
N1(t): number of shifted cells at time t after applying drug
-
ρ1(t) = N1(t)/N: percentage of cells shifted at time t after applying drug
-
N
c
: total number of cells in the control group (no drug applied)
-
N1c(t): number of shifted cells at time t in the control group
-
ρ1c(t) = N1c(t)/N
c
: percentage of cells shifted at time t in the control group
-
ρ(t) = ρ1(t) − ρ1c(t): calibrated percentage of cells shifted at time t after applying drug
-
ρ
av
(t) = E[ρ(t)]: mean of the calibrated percentage of cells shifted at time t after applying drug
-
X
i
(t): state of cell i at time t after applying drug (either shift-ready or not)
We justify N1(t) being modeled as a Gaussian process when the number of cells per well is sufficiently large. Then a model is proposed for the cell shifting process, where the calibrated percentage of shifted cells follows a Gaussian process.
N1(t) is a Gaussian process when the number of cells per well is large enough
In general, N is a random variable since N may be different from well to well in the experiment; however, N can be treated as a known constant for each specific well, as can N
c
. At any given time point t
j
in the experiment, X
i
(t
j
) can be considered as either shift-ready or not. Thus, the experiment of drug effect on each cell can be treated as a Bernoulli trial and X
i
(t
j
) can be modeled as a Bernoulli random variable, i.e., the Probability Mass Function (PMF) of X
i
(t
j
) is given by
(1)
where 0 ≤ p ≤ 1 and t
j
is dropped for simplicity of presentation. Under this definition, . Assuming that all cell states are independent, N1 has the binomial PMF given by
(2)
When the number of cells per well is large, say N > 100, the PMF of N1 at any given time instant can be accurately approximated by the Gaussian distribution due to the central limit theorem. Next we show that N1(t) is a Gaussian process.
Proposition 1. The random process N1(t) is approximately Gaussian when the number of cells per well is large.
Proof. At the beginning of the experiment, t0, N1(t0) is a Gaussian random variable. For any sampling point, at time t
j
, N1(tj) can be expressed as
(3)
where N1(t
j
−1) is the total number of shifted cells at time t
j
−1, and the additional number of shifted cells in the time interval [t
j
−1, t
j
] is given by
(4)
If N − N1(t
j
−1) is sufficiently large, N − N1(t
j
−1) > 32, then ΔN1(t
j
) is well approximated by a Gaussian random variable. Since N1(t0) is Gaussian, N1(t
j
) is Gaussian as well by mathematical induction. □
Modeling the cell shifting process
From our previous experimental observation, the cell shifting process on colon cancer cell-line HCT116 with a reporter for the MKI67 gene under lapatinib treatment shows a binary shifting characteristic. It is assumed that the number of shifting cells is related to: (i) the drug effect corresponding to different dosages; and (ii) the number of proliferating cells (non-shifted cells, N − N1). Since N1(t) is Gaussian process when the number of cells per well is large and N is a constant, the percentage of cells shifted at time t after applying drug, ρ1(t) = N1(t)/N, is a Gaussian process normalized to 0[1]. Similarly, for the control group, ρ1c(t) = N1c(t)/N
c
, is also a Gaussian process normalized to 0[1]. Then ρ(t) = ρ1(t) − ρ1c(t), the calibrated percentage of cells shifted at time t after applying drug, is a Gaussian process too. We are interested in the distribution of ρ(t), specifically, how the mean value of ρ(t), ρ
av
(t), changes along time under different dosage. Based on the above discussions, we propose the following model for cell shifting:
(5)
where is the drug effective coefficient depending on the dosage d, and β > 0 is a balancing factor. ρ
av
(t) changes along time since the corresponding random process ρ(t) is non-stationary, thus its mean changes with time. Specifically, the change of ρ
av
(t) follows a linear differential equation (Eq.(5)) that reflects the fact that the change would be positively affected by the product of drug effectiveness and the percentage of cells not shifted (1st term in Eq.(5)), and negatively affected by the percentage of cells already shifted (2nd term in Eq.(5)), thus the term "balancing factor" for β since more shifted cells mean less non-shifted cells that the drug may affect.
In this model, we assume that both and β change along time, thus the proposed model is a time-varying system. It is also assumed that the number of non-shifted cells, N − N1, decreases exponentially with the factor . and ν are independent Gaussian white noise processes. µ represents the process noise. Its covariance matrix is
ν is the measurement noise. Its covariance matrix is
The noise terms account for the various uncertainties introduced by the experiment. For instance, the cells may not be at the same cell cycle during the experiments, and thus may not be affected by the drug if some of the cells are actually dormant. This kind of uncertainties are modeled by process noise µ. There also exists another type of uncertainty due to measurement procedures, such as the imperfect photographic device and the image processing software. This type of uncertainty is modeled by measurement noise ν.
To observe the relationship between the drug effect coefficient and the dosage d, we need to estimate for each dosage. Since this is a time-varying model, changes with time.
System identification from time-series data using Kalman filter
Kalman filtering [48] provides minimum-mean-square-error estimation of the state of a stochastic linear system disturbed by Gaussian white noise. In our proposed scheme, a Kalman filter is applied to estimate the coefficients, and β, of the proposed cell shifting model. The corresponding state and measurement equations are
(6)
(7)
where the 2-dimensional state vector (containing the parameters to be estimated) is . δ can be calculated as . .
The implementation of the Kalman filter is given by the following equations [48]:
(8)
(9)
(10)
(11)
(12)
where K(n) is the Kalman filter gain and P is the covariance matrix of the error. The superscripts - and + indicate the a priori and a posteriori values of the variables, respectively. and are the prior and posterior estimates, respectively. Q and R are the covariance matrices of the parameter noise and external noise, respectively. The initial conditions are and .
In general, a Kalman filter may be interpreted as a one-step predictor with an appropriate gain calculator [49]. Specifically, Eq.(10) is the one-step predictor, Eq.(11) calculates the Kalman filter gain, and Eq.(12) solves the corresponding Riccati equation.
Convergence of the Kalman filter is an important issue [48]. The rate of convergence is defined as the number of iterations to obtain the optimum estimates. The convergence of the Kalman filter includes the convergence of the estimates and the convergence of the estimation error e(n). Convergence will be studied in detail in the simulations.
In practice, noise statistics (such as the covariance matrices) may not be known and need to be estimated. The Kalman filter is sensitive to the estimation error of noise statistics. Poor estimates of the noise covariance can result in filter divergence. An alternative would be using an H
∞
filter [50, 51].