In

*RNAMotifModeler*, the consensus of each binding motif is defined by the following components: 1) The reference motif, a

*k*-base RNA sequence on which the protein preferably binds; 2) Retained binding affinity despite of a one-nucleotide deviation from reference motif to the sequence of one binding sites. For each

*k*-base motif, there are

*3k* retained binding affinities that describe all the possible deviations from reference motif. For instance, if the

*i*-th base of the reference motif and a specific binding site is

*m*
_{
i
} and

*f*
_{
i
}, respectively, the retained binding affinity is defined as

; 3) a vector that denotes the optimal base pairing probability of

*k* bases in the motif

**θ** = (

*θ*
_{
i
}); and 4) the penalty for the deviation from the optimal base pairing probability

*α*. All these parameters will be optimized iteratively. A matching score describing the similarity between an RNA fragment (

*F*) and a reference motif (

*R*) is defined:

#### Identify the optimal reference motif from CLIP-seq data

We adopted an iterative approach to identify the optimal reference motif and its associated parameters, using a Quantum Particle Swarm Optimization algorithm (QPSO) [28]. The iterative strategy includes the selection of reference motif*R*, and optimization of the parameters associated to the reference motif **λ**
_{
R
}. The overall procedure includes the following steps:

1. Randomly select a motif candidate *R*
_{
init
} from the motif searching space **M** = {*b*
_{1}
*b*
_{2}...*b*
_{
k
} : *b*
_{1},*b*
_{2},...*b*
_{
k
} ∈ {A,G,C,U}}as the reference motif.

2. Optimize the parameters for the reference motif by maximizing its ability for characterizing the CLIP-seq-derived RNA fragments.

Step 2.1. Parameter initiation. We first create M particles in the parameter space by randomly selecting numbers from U(0,1).

Step 2.2. Particle evaluation. For each particle (parameters), we evaluate its capability for distinguishing the CLIP-seq-derived RNA fragment from background sequences. We plot an ROC (Receiver Operating Characteristic) curve by adjusting the matching score threshold, calculated in Eq. (2). The quality of the parameter is evaluated based on the AUC (area under the curve) of the ROC plot.

Step 2.3.

Particle update. Let

and

λ
^{
globalbest
}(

*t*) be the best individual particle and the population of particles has met at the

*t*-th iteration. To guarantee convergence, each particle must converge to its local attractor

[

28]. Compute

and the mean of the best positions of all particles

as follows:

where *φ*
_{
1
} and *φ*
_{
2
}are random variables following U(0,1);

QPSO employs Monte Carlo method to update parameters:

where *β* is called contraction-expansion coefficient controlling the convergence speed of QPSO; *u* and *k* are random variables which also follow U(0,1).

Repeat Step 2 and Step 3 until |λ
^{
globalbest
}(*t*+1)- λ
^{
globalbest
}(*t*)| < ε repeatedly, in which *ε* is a tolerance used here as a criterion for the algorithm to terminate;

3. Based on the final parameter vector

λ
^{
globalbest
}, the maximal binding affinity of motif candidate

*K* in positive gold standard sequence

*F* is:

where *Ω*
_{
K,F
} denotes the set of all binding sites for motif *K* in sequence *F*; *a*
_{
K,F,σ
} is also computed by Eq. (3).

Let

*n*
_{
s
} and

*n*
_{
m
} be the number positive gold standard sequences and the number of motif candidates, respectively. Let

be the maximal binding affinity computed using optimized parameters for the initial reference motif

*R*
_{
init
} in sequence

*F*. Although

*R*
_{
init
} is a reference motif,

is not necessarily contributed by a binding site instance of

*R*
_{
init
}. In contrast, the 'real' reference motif contributes are always expected to contribute more to the binding affinities. Thus, to evaluate contributions of all motif candidatesto binding affinities ofpositive gold standard sequence, we define

as the motif contribution scorematrix:

and

as the motif contribution score vector:

We denote the motif associated with the maximum score in **v** as *R*
_{
max
}. If *R*
_{
max
} = *R*
_{
init
}, meaning the initialized reference motif accounts for the most contribution to the retained binding affinities, then we stop the iteration; otherwise, let *R*
_{
max
} be the next *R*
_{
init
}, and repeat steps 2 and 3 until convergence.