In this section, we first describe the proposed generative model to define m^{6}A peak clusters and then derive the Expectation-Maximization algorithm for the inference. In the end, we discuss a Bayesian Information Criterion (BIC) [18] for selecting the optimal number of m^{6}A peak clusters.

### The proposed graphical model for clustering RNA methylation peaks

The proposed graphical model for clustering of m^{6}A peaks in MeRIP-seq data is shown in Fig. 1. Suppose we have identified a set of *N* m^{6}A peaks, by using peak-calling software such as exomePeak or HEPeak. The goal is to cluster these peaks according to their methylation degree, which is defined as IP reads count divided by the total count of IP and control reads. For the *n*
_{
th
} m^{6}A peak, let *Z*
_{
n
} ∈ {1, 2,.., *K*} denote the index of the particular methylation cluster that *n*-th peak belongs to, with *K* representing the total number of clusters, then *Z*
_{
n
} follows a discrete distribution

$$ P\left({Z}_n\Big|\boldsymbol{\uppi} \right)={\displaystyle \prod_{k=1}^K{\pi_k}^{I\left({Z}_n=k\right)}} $$

(1)

where *π*
_{
k
} is the unknown probability that an m^{6}A peak belongs to cluster *k*, where ∑_{
K
}
*π*
_{
k
} = 1 and *I*(⋅) is the indicator function. Also, let the observed reads count in the *n*
_{
th
} peak of the *m*
_{
th
} IP replicate sample be *X*
_{
m,n
} and that of the *m*
_{
th
} input replicate denote as *Y*
_{
mn
}. Under the assumption that reads count follows a Poisson distribution, the reads count *X*
_{
mn
} given the total reads account *T*
_{
mn
} = *X*
_{
mn
} + *Y*
_{
mn
} can be shown to follow a Binomial distribution

$$ P\left({X_m}_n\Big|{p}_n,{Z}_n\right)={{\displaystyle \prod_{k=1}^K\left(\left(\begin{array}{c}\hfill {T}_{mn}\hfill \\ {}\hfill {X}_{mn}\hfill \end{array}\right){p_n}^{X_{mn}}{\left(1-{p}_n\right)}^{Y_{mn}}\right)}}^{I\left({Z}_n=k\right)} $$

(2)

where is *p*
_{
n
} represents unknown methylation degree at the *n*
_{
th
} Peak of the *m*
_{
th
} replicate. In order to model the variance of the replicates for the *n*
_{
th
} peak, given cluster assignment *Z*
_{
n
}, *p*
_{
n
} is assumed to follow the Beta distribution

$$ P\left(p\Big|{Z}_n\right)={\displaystyle \prod_{k=1}^K Beta{\left({\alpha}_k,{\beta}_k\right)}^{I\left({Z}_n=k\right)}} $$

(3)

Therefore, after integrating the variable \( {p}_n \), \( {X}_{mn} \) follows a mixture of Beta-binomial distribution

$$ \begin{array}{l}P\left({X}_{mn}\Big|{Z}_n;\boldsymbol{\upalpha}, \boldsymbol{\upbeta} \right)={\displaystyle \sum_{p_n}P\left({X_m}_n\Big|{p}_n,{Z}_n\right)P\left({p}_n\Big|\boldsymbol{\upalpha}, \boldsymbol{\upbeta} \right)}\\ {}={{\displaystyle \prod_{k=1}^K\left(C\bullet \frac{\varGamma \left({X}_{mn}+{\alpha}_k\right)\varGamma \left({Y}_{mn}+{\beta}_k\right)}{\varGamma \left({T}_{mn}+{\alpha}_k+{\beta}_k\right)}\frac{\varGamma \left({\alpha}_k+{\beta}_k\right)}{\varGamma \left({\alpha}_k\right)\varGamma \left({\beta}_k\right)}\right)}}^{I\left\{{Z}_n=k\right\}}\end{array} $$

(4)

where *α* = [*α*
_{1}, *α*
_{2},.., *α*
_{
K
}]^{T}, *β* = [*β*
_{1}, *β*
_{2},.., *β*
_{
K
}]^{T} are the unknown parameters of Beta distribution and *C* is the normalization constant. Thus, by considering the *N* m^{6}A peaks in *M* replicates, the joint distribution is

$$ P\left(\mathbf{X},\mathbf{Z}\Big|\boldsymbol{\upalpha}, \boldsymbol{\upbeta}, \boldsymbol{\uppi} \right)={\displaystyle \prod_{n=1}^N{\displaystyle \prod_{m=1}^M{\displaystyle \prod_{k=1}^K{\left({\pi}_kBB\left({X}_{mn}\Big|{Z}_n\right)\right)}^{I\left\{{Z}_n=k\right\}}}}} $$

(5)

where *BB*(*X*
_{
mn
}|*Z*
_{
n
}) represents formula (3). Then, the log-likelihood of the observed data can be expressed as

$$ \begin{array}{l}l= \lg P\left(\mathbf{X}\Big|\boldsymbol{\upalpha}, \boldsymbol{\upbeta}, \boldsymbol{\uppi} \right)= \lg {\displaystyle \sum_{\mathbf{Z}}P\left(\mathbf{X},\mathbf{Z}\Big|\boldsymbol{\upalpha}, \boldsymbol{\upbeta}, \boldsymbol{\uppi} \right)}\\ {}={\displaystyle \sum_{n=1}^N{\displaystyle \sum_{m=1}^M \lg {\displaystyle \sum_{k=1}^K{\pi}_kBB\left({X}_{mn}\Big|{Z}_n;\boldsymbol{\upalpha}, \boldsymbol{\upbeta} \right)}}}\end{array} $$

(6)

where *Z* = [*Z*
_{1}, *Z*
_{2}, …, *Z*
_{
N
}]^{T}, **X** = [*X*
^{T}_{
1,}
*X*
^{T}_{
2,…,}
*X*
^{T}_{
N
}
]^{T} and \( {\boldsymbol{X}}_{\boldsymbol{n}}={\left[{X}_{1n},{X}_{2n},\dots, {X}_{Mn}\right]}^T \). The goal of inference is to predict the cluster index \( {Z}_n \) for all the peaks and estimate the unknown model parameters \( \boldsymbol{\theta} =\left[\boldsymbol{\alpha}, \boldsymbol{\beta}, \boldsymbol{\pi} \right] \). Next, we first discuss the maximum likelihood solution for parameter inference, based which an EM algorithm is introduced afterwards to perform model parameters inference and cluster assignment jointly.

### Parameter inference by the Newton’s method

Given that the cluster indices are known, the model parameters can be inferred by the maximum likelihood criterion as

$$ {\widehat{\uptheta}}_{ML}=\underset{\boldsymbol{\uptheta}}{ \arg \max (l)}. $$

(7)

Given (5–6), the log-likelihood \( l \) can be rewritten as

$$ \begin{array}{l}l={\displaystyle \sum_{n=1}^N{\displaystyle \sum_{m=1}^M \lg {\displaystyle \sum_{k=1}^Kq\left({Z}_{nk}\right)\frac{\pi_kBB\left({X}_{mn}\Big|{Z}_n\right)}{q\left({Z}_{nk}\right)}}}}\\ {}\ge {\displaystyle \sum_{n=1}^N{\displaystyle \sum_{m=1}^M{\displaystyle \sum_{k=1}^Kq\left({Z}_{nk}\right)\left[ \lg {\pi}_k+ \lg BB\left({X}_{mn}\Big|{Z}_n\right)- \lg q\left({Z}_n\right)\right]}}}\\ {}={\displaystyle \sum_{n=1}^N{\displaystyle \sum_{k=1}^KM\cdotp q\left({Z}_{nk}\right) \lg {\pi}_k}}-{\displaystyle \sum_{n=1}^N{\displaystyle \sum_{k=1}^KM\cdotp q\left({Z}_{nk}\right) \lg q\left({Z}_{nk}\right)}}\\ {}+{\displaystyle \sum_{n=1}^N{\displaystyle \sum_{m=1}^M{\displaystyle \sum_{k=1}^Kq\left({Z}_{nk}\right)\left[\begin{array}{l}\varPhi \left({\alpha}_k+{\beta}_k\right)-\varPhi \left({T}_{mn}+{\alpha}_k+{\beta}_k\right)+\varPhi \left({X}_{mn}+{\alpha}_k\right)\\ {}+\varPhi \left({Y}_{mn}+{\beta}_k\right)-\varPhi \left({\alpha}_k\right)-\varPhi \left({\beta}_k\right)\end{array}\right]}}}\end{array} $$

(8)

where \( \Phi = \lg \Gamma \left(\cdotp \right) \) and \( q\left({Z}_n\right)=P\left({Z}_n=k|\boldsymbol{X}\right) \). Here, given that \( q\left({Z}_n\right) \) is a complex simplex, according to the Jensen’s inequality, the lower bound of \( l \) is achieved when \( q\left({Z}_n\right)=P\left({Z}_n|\boldsymbol{X}\right) \). With a little abuse of notation, \( l \) denotes the lower bound of (7). Given the equality constrain \( {\sum}_K{\pi}_k=1 \), the parameters of \( \boldsymbol{\pi} \) can be computed by maximizing \( l \) and its dual problem with Lagrange multiplier \( \lambda \) is

$$ \max \kern1em g\left(\boldsymbol{\uppi}, \lambda \right)={\displaystyle \sum_n{\displaystyle \sum_m{\displaystyle \sum_kq\left({Z}_n\right) \lg {\pi}_k}+\lambda \left({\displaystyle \sum_k{\pi}_k-1}\right)}} $$

(9)

then \( \lambda \) and \( \pi \) can be calculated as

$$ \begin{array}{l}\lambda =-N\cdotp M\\ {}{\pi}_k=\frac{1}{N}{\displaystyle \sum_{n=1}^NP\left({Z}_n=k\Big|{\mathbf{X}}_{\mathbf{n}}\right)}\end{array} $$

(10)

Due to lack of analytical solution for the derivatives of \( l \) with respect to \( \boldsymbol{\alpha} \) and \( \boldsymbol{\beta} \), a Newton’s method is applied and the the gradient can be computed as

$$ {J}^k=\left[\begin{array}{c}\hfill {\displaystyle \sum_{n=1}^Nq\left({Z}_{nk}\right)\left[\begin{array}{l}\varPhi \left({\alpha}_k+{\beta}_k\right)\cdotp M-{\displaystyle \sum_{m=1}^M\varPhi \left({T}_{mn}+{\alpha}_k+{\beta}_k\right)}\\ {}-\varPhi \left({\alpha}_k\right)\cdotp M+{\displaystyle \sum_{m=1}^M\varPhi \left({X}_{mn}+{\alpha}_k\right)}\end{array}\right]}\hfill \\ {}\hfill {\displaystyle \sum_{n=1}^Nq\left({Z}_{nk}\right)\left[\begin{array}{l}\varPhi \left({\alpha}_k+{\beta}_k\right)\cdotp M-{\displaystyle \sum_{m=1}^M\varPhi \left({T}_{mn}+{\alpha}_k+{\beta}_k\right)}\\ {}-\varPhi \left({\beta}_k\right)\cdotp M+{\displaystyle \sum_{m=1}^M\varPhi \left({Y}_{mn}+{\beta}_k\right)}\end{array}\right]}\hfill \end{array}\right] $$

(11)

and the Hessian is

$$ \begin{array}{l}{H^k}_{1,1}={\displaystyle \sum_{n=1}^Nq\left({Z}_{nk}\right)\left[\begin{array}{l}{\varPhi}^{\hbox{'}}\left({\alpha}_k+{\beta}_k\right)\cdotp M-{\displaystyle \sum_{m=1}^M{\varPhi}^{\hbox{'}}\left({T}_{mn}+{\alpha}_k+{\beta}_k\right)}\\ {}-{\varPhi}^{\hbox{'}}\left({\alpha}_k\right)\cdotp M+{\displaystyle \sum_{m=1}^M{\varPhi}^{\hbox{'}}\left({X}_{mn}+{\alpha}_k\right)}\end{array}\right]}\\ {}{H^k}_{2,2}={\displaystyle \sum_{n=1}^Nq\left({Z}_{nk}\right)\left[\begin{array}{l}{\varPhi}^{\hbox{'}}\left({\alpha}_k+{\beta}_k\right)\cdotp M-{\displaystyle \sum_{m=1}^M{\varPhi}^{\hbox{'}}\left({T}_{mn}+{\alpha}_k+{\beta}_k\right)}\\ {}-{\varPhi}^{\hbox{'}}\left({\beta}_k\right)\cdotp M+{\displaystyle \sum_{m=1}^M{\varPhi}^{\hbox{'}}\left({Y}_{mn}+{\beta}_k\right)}\end{array}\right]}\\ {}{H^k}_{1,2}={H^k}_{2,1}\\ {}={\displaystyle \sum_{n=1}^Nq\left({Z}_{nk}\right)\left[{\phi}^{\hbox{'}}\left({\alpha}_k+{\beta}_k\right)\cdotp M-{\displaystyle \sum_{m=1}^M{\phi}^{\hbox{'}}\left({T}_{mn}+{\alpha}_k+{\beta}_k\right)}\right]}.\end{array} $$

(12)

Then, the parameters for the \( {k}_{th} \) cluster can be updated iteratively as

$$ \left[\begin{array}{c}\hfill {\alpha^k}_{new}\hfill \\ {}\hfill {\beta^k}_{new}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill {\alpha^k}_{old}\hfill \\ {}\hfill {\beta^k}_{old}\hfill \end{array}\right]-{\left({H}^k\right)}^{-1}{J}^k $$

(13)

### m^{6}A peak cluster assignment

Assigning m^{6}A peak to a cluster amounts to inferring cluster index \( {Z}_n \), whose posterior probability given \( \boldsymbol{\theta} \) can be written as

$$ \begin{array}{l}P\left({Z}_n=k\Big|{\mathbf{X}}_{\mathbf{n}},\boldsymbol{\uptheta} \right)=\frac{P\left({Z}_n=k,{\mathbf{X}}_{\mathbf{n}}\Big|\boldsymbol{\uptheta} \right)}{{\displaystyle \sum_{k=1}^KP\left({Z}_n=k,{\mathbf{X}}_{\mathbf{n}}\Big|\boldsymbol{\uptheta} \right)}}\\ {}=\frac{\pi_k\cdotp {\displaystyle \prod_{m=1}^MBB\left({X}_{mn}\Big|{Z}_n=k,\boldsymbol{\uptheta} \right)}}{{\displaystyle \sum_{k=1}^K{\pi}_k{\displaystyle \prod_{m=1}^MBB\left({X}_{mn}\Big|{Z}_n=k,\boldsymbol{\uptheta} \right)}}}.\end{array} $$

(14)

However, \( P\left({Z}_n=k|{\boldsymbol{X}}_{\boldsymbol{n}},\boldsymbol{\theta} \right) \) cannot be computed directly, because parameter \( \boldsymbol{\theta} \) is also unknown. To circumvent the difficulty, we developed an EM [19] algorithm to infer \( {Z}_n \) and estimate the model parameters \( \boldsymbol{\theta} \) in an iterative fashion. The steps of the proposed EM algorithm are described in the following

Repeat until convergence achieved:

E-step: use the previous computed parameters \( {\boldsymbol{\theta}}_{\boldsymbol{old}} \) to update the posterior probability of the hidden states \( P\left({Z}_n=k|{\boldsymbol{X}}_{\boldsymbol{n}},\boldsymbol{\theta} \right) \) according to (13).

M-step: maximize the lower bound \( l \) in (7) and estimate parameters \( {\boldsymbol{\theta}}_{\boldsymbol{new}} \) according to (12).

### Selection of the number of states by Bayesian information criterion (BIC)

Note that the total number of states *K* is also unknown. In order to determine *K*, the BIC is applied search in the range of 2 to 15. The best number of states is selected by the lowest BIC, which is denoted as

$$ BIC=-2\widehat{l}+2K \lg N $$

(15)

where \( \widehat{l} \) is the estimated log-likelihood when the EM algorithm converges.