As described by Fan et al [10], MCF7 human breast cancer cells (American Type Culture Collection, Manassas, VA) and MCF7-OTH cells were cultured and treated with E2 (10^{-8} mol/L) for three hours. Then, cells were cross-linked with 1% formaldehyde and chromatin immunoprecipitation was done as previously described [11]. The antibodies against Pol II were purchased from Santa Cruz Biotechnology (Santa Cruz, sc-899 X and sc-8005 X). After immunoprecipitation and purification, ChIP DNA sample was run in 12% PAGE and the 100–300 bp DNA fraction was excised and eluted from the gel slice. Then, Illumina library was constructed and sequenced with Illumina/Solexa Genome Analyzer.

Most DNA binding proteins, such as transcription factors, bind to *cis*-acting DNA element with specific sequence features usually described by a position weight matrix (PWM). A hypothetical distribution of ChIP-seq-derived DNA fragments corresponding to transcription factors and RNA polymerase II (Pol II) is shown in Figs. 1A and 1B, respectively. For transcription factors, ChIP-seq detects a set of fragments that cluster and center around distinct biological binding sites, forming a "peak" around the binding locus (Fig. 1A). In contrast, Pol II binds throughout promoter regions, the 5'- and 3'- untranslated regions, the open reading frame (ORF), and downstream regions of the activated gene (Fig. 1B). Although Pol II can form a distinct peak around the transcription start site under certain circumstances, commonly-used peak-finding algorithms are not able to identify Pol II-enriched regions in gene transcript region derived from ChIP-seq experiments.

### A *Poisson* mixture model to identify transcripts with different Pol II binding quantity

Based on the assumption that the number of Pol II-binding fragments detected within gene transcript region, including 5'- and 3'-untranslated regions and open reading frames, follows a *Poisson* distribution, we developed a mixture model to identify differences in Pol II binding under two conditions, control *vs.* treatment. Denote *y*_{
ij
}as the Pol II quantity for gene *i* (i = 1,..., n) under condition *j* (*j* = 1, 2). In this application, *j* = 1, 2 indicates two biological conditions, MCF7 control (vehicle-treated) and MCF7 treated (3 hour E2 treatment), respectively. Marginally, *y*_{
ij
}follows a *Poisson* distribution,

{y}_{ij}~\frac{{e}^{-{\lambda}_{ij}}{\lambda}_{ij}^{{y}_{ij}}}{{y}_{ij}!},{\lambda}_{ij}\ge 0

(1)

where *λ*_{
ij
}denotes the expected quantity of Pol II binding for gene *i* under condition *j* (*j* = 1, 2). For the *i*-th gene, if Pol II binding quantities within the gene transcript region are statistically different between the two biological conditions, *y*_{
ij
}follows marginal distributions with different parameters *λ*_{i 1}and *λ*_{i 2}; conversely, if the number of Pol II binding does not demonstrate a significant difference, *y*_{
ij
}follows the same distribution with a unified *λ*. A mixture model is proposed to estimate the posterior probability of differential Pol II binding quantity.

For unified Pol II binding quantity between two conditions, *λ*_{i 1}and *λ*_{i 2}follow the same *Gamma* distribution (2). The selection of *Gamma* distribution is based on two considerations. Firstly, the two parameter *Gamma* distribution is a very flexible function which can be used to describe a wide range of distribution shapes. Secondly, the *Gamma* distribution is a conjugate distribution of *Poisson* for the *λ* parameters. Hence, the follow-up expectation step in an E-M algorithm has a close form solution.

{\lambda}_{i1}={\lambda}_{i2}={\lambda}_{i}~Gamma({\alpha}_{1},{\beta}_{1})=\frac{{\lambda}_{i}^{{\alpha}_{1}-1}{e}^{-{\lambda}_{i}/{\beta}_{1}}}{\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}.

(2)

Otherwise, (*λ*_{i 1}, *λ*_{i 2}) are distributed independently.

\begin{array}{c}{\lambda}_{i1}~Gamma({\alpha}_{1},{\beta}_{1})=\frac{{\lambda}_{i1}^{{\alpha}_{1}-1}{e}^{-{\lambda}_{i1}/{\beta}_{1}}}{\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}\\ {\lambda}_{i2}~Gamma({\alpha}_{2},{\beta}_{2})=\frac{{\lambda}_{i2}^{{\alpha}_{2}-1}{e}^{-{\lambda}_{i2}/{\beta}_{2}}}{\Gamma ({\alpha}_{2}){\beta}_{2}^{{\alpha}_{2}}}\end{array}

(3)

Denote *Z*_{
i
}as the *Bernoulli* random variable with probability *p*, *i.e*., it is equal to 1 if (*λ*_{i 1}, *λ*_{i 2}) are independently distributed (and equal to 0 if not). Therefore, the joint distribution of Pol II binding quantity is modeled by a mixture of uniform binding events (*p* = 0) and differential binding events (*p* = 1).

\begin{array}{l}\mathrm{Pr}(y,\lambda ,Z|{\alpha}_{1},{\beta}_{1},{\alpha}_{2},{\beta}_{2},P)\hfill \\ ={\displaystyle \prod _{i=1}^{n}\mathrm{Pr}({y}_{i1},{y}_{i2},{\lambda}_{i},{\lambda}_{i1},{\lambda}_{i2}|{Z}_{i},{\alpha}_{1},{\beta}_{1},{\alpha}_{2},{\beta}_{2})\mathrm{Pr}({Z}_{i}|p)}\hfill \\ ={\displaystyle \prod _{i=1}^{n}{(1-P)}^{(1-{Z}_{i})}{\left[\frac{{e}^{-{\lambda}_{i}}{\lambda}_{i}^{{y}_{i1}+{y}_{i2}}{e}^{-{\lambda}_{i}/{\beta}_{1}}{\lambda}_{i}^{{\alpha}_{1}-1}}{{y}_{i1}!{y}_{i2}!\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}\right]}^{1-{Z}_{i}}\times}\hfill \\ {P}^{{Z}_{i}}\left[\frac{{e}^{-2{\lambda}_{i1}}{\lambda}_{i1}^{{y}_{i1}}{e}^{-{\lambda}_{i1}/{\beta}_{1}}{\lambda}_{i1}^{{\alpha}_{1}-1}}{{y}_{i1}!\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}\times \frac{{e}^{-{\lambda}_{i2}}{\lambda}_{i2}^{{y}_{i2}}{e}^{-{\lambda}_{i2}/{\beta}_{2}}{\lambda}_{i2}^{{\alpha}_{2}-1}}{{y}_{i2}!\Gamma ({\alpha}_{2}){\beta}_{2}^{{\alpha}_{2}}}\right]\hfill \end{array}

(4)

Based on equation (4), we implement E-M algorithm by treating *Z*_{
i
}as missing data.

The **E-step** of the algorithm is specified as:

\begin{array}{l}{\widehat{\lambda}}_{i}=E({\lambda}_{i}|{Z}_{i})=({y}_{i1}+{y}_{i2}+{\alpha}_{1})\times (1/(2+1/{\beta}_{1}))\hfill \\ {\widehat{\lambda}}_{i1}=E({\lambda}_{i1}|{Z}_{i})=({y}_{i1}+{\alpha}_{1})\times (1/(1+1/{\beta}_{1}))\hfill \\ {\widehat{\lambda}}_{i2}=E({\lambda}_{i2}|{Z}_{i})=({y}_{i2}+{\alpha}_{2})\times (1/(1+1/{\beta}_{2}))\hfill \\ {\widehat{Z}}_{i}=\frac{p\left[\frac{{e}^{-(1+1/{\beta}_{1}){\widehat{\lambda}}_{i1}}{\widehat{\lambda}}_{i1}^{{y}_{i1}+{\alpha}_{1}-1}{e}^{-(1+1/{\beta}_{2}){\widehat{\lambda}}_{i2}}{\widehat{\lambda}}_{i2}^{{y}_{i2}+{\alpha}_{2}-1}}{{y}_{i1}!\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}{y}_{i2}!\Gamma ({\alpha}_{2}){\beta}_{2}^{{\alpha}_{2}}}\right]}{p\left[\frac{{e}^{-(1+1/{\beta}_{1}){\widehat{\lambda}}_{i1}}{\widehat{\lambda}}_{i1}^{{y}_{i1}+{\alpha}_{1}-1}{e}^{-(1+1/{\beta}_{2}){\widehat{\lambda}}_{i2}}{\widehat{\lambda}}_{i2}^{{y}_{i2}+{\alpha}_{2}-1}}{{y}_{i1}!\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}{y}_{i2}!\Gamma ({\alpha}_{2}){\beta}_{2}^{{\alpha}_{2}}}\right]+(1-p)\frac{{e}^{-(2+1/{\beta}_{1}){\widehat{\lambda}}_{i}}{\widehat{\lambda}}_{i}^{{y}_{i1}+{y}_{i2}+{\alpha}_{1}-1}}{{y}_{i1}!{y}_{i2}!\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}}\hfill \end{array}

(5)

In the **M-step**, the parameters in the *Gamma* distribution (*α*_{1}, *β*_{1}, *α*_{2}, *β*_{2}) are estimated by maximizing the two likelihood functions in Equation 6.

\begin{array}{l}L({\alpha}_{1},{\beta}_{1};\lambda ,Z)\propto {\displaystyle \prod _{i=1}^{n}\left\{{\left[\frac{{e}^{-{\lambda}_{i}/{\beta}_{1}}{\lambda}_{i}^{{\alpha}_{1}-1}}{\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}\right]}^{1-{Z}_{i}}{\left[\frac{{e}^{-{\lambda}_{i1}/{\beta}_{1}}{\lambda}_{i1}^{{\alpha}_{1}-1}}{\Gamma ({\alpha}_{1}){\beta}_{1}^{{\alpha}_{1}}}\right]}^{{Z}_{i}}\right\}}\hfill \\ L({\alpha}_{2},{\beta}_{2};\lambda ,Z)\propto {\displaystyle \prod _{i=1}^{n}\left\{{\left[\frac{{e}^{-{\lambda}_{i2}/{\beta}_{2}}{\lambda}_{i2}^{{\alpha}_{2}-1}}{\Gamma ({\alpha}_{2}){\beta}_{2}^{{\alpha}_{2}}}\right]}^{{Z}_{i}}\right\}}\hfill \\ p=\frac{{\displaystyle \sum _{i=1,2,\mathrm{...},n}{Z}_{i}}}{n}\hfill \end{array}

(6)

In the M-step, the optimization procedure is challenging, because searching for the optimal solutions for *Gamma* parameters can be trapped into local optimum, causing either slow convergence or failure to converge on the global optimal solution. In order to overcome these difficulties, we utilize particle swarm optimization (PSO), an artificial intelligence approach that mimics a behavior of swarm-forming agents, providing a good balance between global optimum searching and computation efficiency [12].

Because the likelihood functions of (*α*_{1}, *β*_{1}) and (*α*_{2}, *β*_{2}) are factorized in equation 6, the PSO optimization procedures are conducted independently using the following four steps.

Step 1: 100 particles (potential solution) were initially randomly distributed in 2-dimensional parameter spaces (*α*_{1}, *β*_{1}) or (*α*_{2}, *β*_{2}).

Step 2: the likelihood of each of the 100 particles are calculated by following Equation 6.

Step 3: the velocity vector of the particle, serving as the guide to search for the optimal solution, was calculated using Equation 7.

{V}_{k}^{\text{'}}={c}_{0}{V}_{k}+{c}_{1}({P}_{global}-{P}_{k})+{c}_{2}({P}_{k-local}-{P}_{k})

(7)

where **P**_{
global
}is the global optimal solution achieved so far; **P**_{k-local}is the local optimal solution achieved by particle *k*; and *C*_{0}, *C*_{1}, *C*_{2} are adjustable weight factors used to control searching speed.

Step 4: in the solution space, all the particles are re-positioned based on their current positions and movement velocities calculated in Equation 8.

{P}_{k}^{\text{'}}={P}_{k}+{V}_{k}^{\text{'}}

(8)

Steps 1–4 will be iterated until further particle movement cannot result in higher likelihood (defined in Equation 6).

At the convergence, *Z*_{
i
}can be interpreted as the probability of differential Pol II binding between two conditions. Although the model derivation is based on ChIP-seq data from MCF7 cells before and after treatment with vehicle or E2 for 3 hours, it can also be equally applied to OHT-MCF7 (+/- E2 treatment). In practice, the solution of the E-M algorithm converges in only 5 to 6 cycles.