 Research
 Open Access
 Published:
LncRNAmiRNA interaction prediction through sequencederived linear neighborhood propagation method with information combination
BMC Genomics volume 20, Article number: 946 (2019)
Abstract
Background
Researchers discover lncRNAs can act as decoys or sponges to regulate the behavior of miRNAs. Identification of lncRNAmiRNA interactions helps to understand the functions of lncRNAs, especially their roles in complicated diseases. Computational methods can save time and reduce cost in identifying lncRNAmiRNA interactions, but there have been only a few computational methods.
Results
In this paper, we propose a sequencederived linear neighborhood propagation method (SLNPM) to predict lncRNAmiRNA interactions. First, we calculate the integrated lncRNAlncRNA similarity and the integrated miRNAmiRNA similarity by combining known lncRNAmiRNA interactions, lncRNA sequences and miRNA sequences. We consider two similarity calculation strategies respectively, namely similaritybased information combination (SC) and interaction profilebased information combination (PC). Second, the integrated lncRNA similaritybased graph and the integrated miRNA similaritybased graph are respectively constructed, and the label propagation processes are implemented on two graphs to score lncRNAmiRNA pairs. Finally, the weighted averages of their outputs are adopted as final predictions. Therefore, we construct two editions of SLNPM: sequencederived linear neighborhood propagation method based on similarity information combination (SLNPMSC) and sequencederived linear neighborhood propagation method based on interaction profile information combination (SLNPMPC). The experimental results show that SLNPMSC and SLNPMPC predict lncRNAmiRNA interactions with higher accuracy compared with other stateoftheart methods. The case studies demonstrate that SLNPMSC and SLNPMPC help to find novel lncRNAmiRNA interactions for given lncRNAs or miRNAs.
Conclusion
The study reveals that known interactions bring the most important information for lncRNAmiRNA interaction prediction, and sequences of lncRNAs (miRNAs) also provide useful information. In conclusion, SLNPMSC and SLNPMPC are promising for lncRNAmiRNA interaction prediction.
Background
Noncoding RNAs (ncRNAs) are a class of RNAs that are not translated into functional proteins [1]. NcRNAs can be classified into many types, e.g. long noncoding RNA, circular RNA, snRNA, etc. Long noncoding RNAs (lncRNAs) are a kind of ncRNAs whose lengths are more than 200 nucleotides [2]. Studies [3, 4] show that a great number of lncRNAs are involved in many biological processes, such as cell proliferation, chromatin remodeling, gene imprinting and immune response. More importantly, some researchers discovered that lncRNAs are associated with severe diseases such as prostate cancer and gastric cancer [5,6,7,8,9,10].
LncRNAs play functional roles by interacting with other biological molecules (DNAs, RNAs and proteins), and the studies on lncRNAbiomolecule interactions help to characterize the functions of lncRNAs. For example, lncRNA loc285194 can interact with p53 gene and act as a tumor suppressor [11]; lncRNA PVT1 interacts with FOXM1 protein and promotes gastric cancer progression [12]. For a long time, researchers have been paying attention to lncRNADNA interactions [13, 14] or lncRNAprotein interactions [15, 16]. Recently, some researchers discover [17] that lncRNAs can act as decoys or sponges to regulate the behavior of miRNAs. For example, the lncRNA H19 is found to modulate let7 family of miRNAs [18]. Therefore, exploring lncRNAmiRNA interactions contributes to understanding the complicated functions of lncRNAs.
Previous studies conduct wet experiments to identify lncRNAmiRNA interactions. For example, Amanda et al. [18] carry out in vivo crosslinking combined with affinity purification experiments to explore the interaction between lncRNA H19 and miRNA let7. Based on the crosslinking and realtime PCR (RTqPCR) experiment, their results demonstrated that lncRNA H19 can physically interact with let7 in vivo. Zhang et al. [19] once studied the miRNA miR7’s function in breast cancer stem cell (BCSCs) and its associated lncRNA. By implementing ChIPPCR and DoubleLuciferase Reporter assay, they find that the downregulation of miR7 in BCSCs might be indirectly attributed to lncRNA HOTAIR. The wet methods are timeconsuming and laborintensive; thus, it is important to perform in silico prediction to refine the candidate list for further validation experiments.
Recently, researchers introduce machine learning techniques into the lncRNAbiomolecule interaction prediction, especially the lncRNAprotein interaction [20,21,22,23,24,25]. However, only a few lncRNAmiRNA interaction prediction methods have been proposed. Huang et al. [26] propose a method named EPLMI, which relies on the assumption that lncRNAs having similar expression profiles are prone to associate with a cluster of miRNAs that have similar expression profiles. Huang et al. [27] develop a novel group preference Bayesian collaborative filtering model called GBCF, which picks up a topk probability ranking list for an individual miRNA or lncRNA based on known miRNAlncRNA interaction network. Hu et al. [28] predict lncRNAmiRNA interactions by integrating the expression similarity network and the sequence similarity network, and develop a method named INLMI. Nevertheless, these methods have several limitations, which inspire us to develop better models. Firstly, existing methods rely on several features of lncRNAs and miRNAs, such as sequences, expression profiles and target genes, but expression profiles and target genes are not available for all lncRNAs (or miRNAs). Secondly, many lncRNAs and miRNAs do not have any known interaction, but a desirable model should be capable of predicting their interactions.
In this paper, we propose a sequencederived linear neighborhood propagation method (SLNPM) to predict lncRNAmiRNA interactions. First, we calculate integrated lncRNAlncRNA similarity and integrated miRNAmiRNA similarity by combining known lncRNAmiRNA interactions, lncRNA sequences and miRNA sequences. As the extension of our previous work [29], we consider two integrated similarity calculation strategies, namely similaritybased information combination (SC) and interaction profilebased information combination (PC). Second, the integrated lncRNA similaritybased graph and the integrated miRNA similaritybased graph are respectively constructed, and the label propagation processes are respectively implemented on two graphs to score lncRNAmiRNA pairs. Finally, the averages of their outputs are adopted as final predictions. In this way, we construct two editions of SLNPM based on similarity information combination (SLNPMSC) and based on interaction profile information combination (SLNPMPC). The experimental results show that SLNPMSC and SLNPMPC predict lncRNAmiRNA interactions with higher accuracy compared with other stateoftheart methods. We also analyze the prediction capability of SLNPMSC and SLNPMPC for lncRNAs (or miRNAs) which do not have any known interaction, and the case studies demonstrate that SLNPMSC and SLNPMPC help to find novel interactions which do not exist in our dataset.
This paper makes the following contributions: (1) the proposed SLNPM models make use of diverse information to achieve highaccuracy performances; (2) the proposed SLNPM models can deal with the lncRNAs (or miRNAs) that do not have any known interaction.
Datasets and methods
Datasets
There are several datasets about lncRNAs, miRNAs and lncRNAmiRNA interactions, such as lncRNASNP [17], NONCODE [30], miRBase [31] and miRmine [32]. LncRNASNP [17] contains experimentally validated lncRNArelated SNPs and lncRNAmiRNA interactions, which can facilitate to study lncRNAs’ functions. NONCODE [30] is an integrated knowledge database of noncoding RNAs (ncRNAs). The ncRNA sequences and related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature. MiRBase [31] is a comprehensive database about miRNAs, containing published miRNA sequences and annotation. The database miRmine [32] provides highquality human miRNASeq and miRNA expression profiles.
To compile our datasets, we first download data from lncRNASNP, and obtain 8091 experimentally verified lncRNAmiRNA interactions. After removing duplicated associations, there remain 5118 interactions between 780 lncRNAs and 275 miRNAs. Then, we collect lncRNA’s sequences from NONCODE and collect miRNAs’ sequences from miRbase. Thus, sequences are available for 642 lncRNAs and 275 miRNAs. Next, we obtain expression profiles of lncRNAs in 24 human tissues from NONCODE, and obtain expression profiles of miRNAs in 16 types of human tissues and 24 types of cell types from miRmine. The expression profiles are available for 417 lncRNAs and 265 miRNAs. Therefore, we compile a dataset named SLNPMS by removing lncRNAs and miRNAs whose sequences or expression profiles are unavailable. Similarly, we compile a dataset named SLNPML by removing lncRNAs and miRNAs whose sequences are unavailable. SLNPMS serves as the main dataset for model training and performance evaluation, and SLNPML is used for the case study. Table 1 summarizes the details of two datasets.
Linear neighborhood similarity measure
In previous work [33, 34], we proposed a novel similarity measure named linear neighborhood similarity (LNS), and successfully solved several problems in bioinformatics [24, 35,36,37]. In this paper, we adopt the linear neighborhood similarity measure (LNS) to calculate lncRNAlncRNA similarity and miRNAmiRNA similarity. Here we first introduce the detailed process of LNS.
Given ndimensional feature vectors x_{1}, x_{2}, ⋯, x_{m}, these feature vectors are considered as the data points in the feature space. We concentrate the vectors row by row to obtain the n × m matrix X, where x_{i} is the i th row of the matrix X. It is assumed that each data point can be reconstructed by the linear weighted sum of neighboring data points. Generally, nearest neighbors based on the Euclidean distance are selected for each data point x_{i}, and the ratio of the neighbors (selected nearest neighbors vs all neighboring data points) is called neighborhood ratio, denoted by K. N(x_{i}) is the set of selected nearest neighbors of x_{i}. By minimizing the reconstructive errors for all data points, we present the following optimization problem:
where C is an indicator matrix. C(i, j) = 1 if x_{j} ∈ N(x_{i}); else C(i, j) = 0; C(i, i) = 0. ‖∙‖_{F} is the Frobeniusnorm. e = (1, 1, …, 1)^{T}, and ⊙ is Hadamard product. μ is the tradeoff parameter. W is a m × m weight matrix, where the ith row indicates the data points’ reconstruction contributions to the data point x_{i}.
To solve the objection function (1), we introduce the Lagrange function:
where Φ is Lagrange multiplier. Differentiating L with respect to W, we have:
By Complementary slackness condition, we obtain:
So W_{ij} can be written as:
But there still exists λ in (3), and (2) has the equivalent form:
where G^{i} is the Gramm Matrix whose entry is \( \left({x}_i,{x}_{i_j}\right){\left({x}_i,{x}_{i_k}\right)}^T \). The Lagrange function of (4) is:
By Karush–Kuhn–Tucker (KKT) conditions, we obtain:
Then, it can be inferred that:
So:
The reconstruction error \( \frac{1}{2}{\omega}_i^T{G}^i{\omega}_i\approx 0 \). If ω_{i} is the optimal solution for (5), e^{T}ω_{i} − 1 = 0. So λ_{i} ≈ μ. Let λ = μe. Then we obtain:
Weight matrix W is updated according to Eq. (6) until convergence.
Sequence similarity and interaction profile similarity
In this section, we introduce mathematical notations for lncRNA (and miRNA) interaction profile, lncRNA (and miRNA) sequence similarity and lncRNA (and miRNA) interaction profile similarity. Given lncRNAs L_{1}, …, L_{i}, …, L_{l} and miRNAs M_{1}, …, M_{j}, …, M_{m}, their pairwise interactions are represented by a l × m interaction matrix Y, where Y_{ij} = 1 if the lncRNA L_{i} interacts with the miRNA M_{j}, otherwise Y_{ij} = 0. By using the interaction matrix Y, we define the interaction profiles for lncRNAs and miRNAs. The interaction profile of lncRNA L_{i} is a binary vector specifying the absence or presence of its interactions with every miRNA, and corresponds to the i th row of Y, namely Y(i, :). The interaction profile of a miRNA M_{j} is a binary vector encoding the absence or presence of its interactions with every lncRNA, and corresponds to the j th row of Y, namely Y(:, j).
LncRNA sequences and miRNA sequences provide important information for exploring their functions, and the kmer [38] is a popular sequencederived feature, which describes repeated patterns of sequences. There exist four types of nucleotides i.e. A, C, G and T/U for both lncRNA sequences and miRNA sequences. For the kmer feature, we count the frequencies of 4^{k} types of klength contiguous subsequences along lncRNA (miRNA) sequences. More specifically, for a lncRNA (or miRNA) sequence x, the kmer feature of the sequence is defined as \( {f}_k(x)=\left({d}_1,{d}_2,\dots {d}_{4^k}\right) \), where d_{i} is the occurrence frequency of corresponding klength contiguous subsequences. In this work, we set k = 5, and we present lncRNAs and miRNAs with their corresponding kmer vectors. Then, we calculate sequence similarities for l lncRNAs, denoted as a l × l matrix S_{LSF}, by using the linear neighborhood similarity measure (LNS). Similarly, we utilize LNS to calculate sequence similarities for m miRNAs, denoted as a m × m matrix S_{MSF}.
Related studies [39,40,41] adopt biological molecules’ interaction profiles in prediction models and achieve highaccuracy performance. These studies reveal the importance of interaction profiles in predicting unknown associations. Based on the interaction matrix Y, lncRNAs L_{1}, …, L_{i}, …, L_{l} are represented by interaction profiles Y(1, :), …, Y(i, :), …, Y(l, :), and miRNAs M_{1}, …, M_{j}, …, M_{m} are represented by interaction profiles Y(:, 1), …, Y(:, j), …, Y(:, l). Then, we can respectively calculate interaction profile similarities for l lncRNAs, denoted as a l × l matrix S_{LIP}, using the linear neighborhood similarity measure; we calculate interaction profile similarities for m miRNAs, denoted as a m × m matrix S_{MIP}.
Sequencederived linear neighborhood propagation method
Since we have the sequence feature and interaction profiles for lncRNAs (miRNAs), we integrate diverse information of lncRNAs (or miRNAs) to develop prediction models. On the one hand, information integration can lead to improved performances. On the other hand, there exist lncRNAs (miRNAs) that have no known interaction with any miRNA (lncRNA), and the interaction profiles are unavailable for these lncRNAs (miRNAs). The information integration can deal with such lncRNAs (miRNAs). Here, we propose a sequencederived linear neighborhood propagation method (SLNPM) and consider two strategies: similaritybased information combination (SC) and interaction profilebased information combination (PC) to integrate diverse features and meanwhile address abovementioned problems. Thus, we present two editions of SLNPM: sequencederived linear neighborhood propagation method based on similarity information combination (SLNPMSC) and sequencederived linear neighborhood propagation method based on interaction profile information combination (SLNPMPC). The flowchart of two prediction models is shown in Fig. 1.
Similaritybased information combination
In this section, we propose the similaritybased information combination strategy to build the sequencederived linear neighborhood propagation model, abbreviated as SLNPMSC.
For a lncRNA L_{i} (miRNA M_{j}), which has no interaction with any miRNA (lncRNA), its interaction profile is an allzero vector. We cannot calculate the interaction profile similarities for lncRNAs (miRNAs) without interactions. Therefore, entries in the i th (j th) row and i th (j th) column of the lncRNA (miRNA) interaction profile similarity matrix S_{LIP} (S_{MIP}) are all zeros. The similaritybased information combination strategy is described below.
First, we calculate the sequence similarity S_{LSF} for all lncRNAs, and calculate the interaction profile similarity S_{LIP} for lncRNAs with interaction information. Then, we calculate the integrated similarity S_{LIS} for lncRNAs by:
Similarly, we calculate the sequence similarity S_{MSF} for all miRNAs, and calculate the interaction profile similarity S_{MIP} for miRNAs with interaction information. Then, we calculate the integrated similarity S_{MIS} for miRNAs by:
Then, we construct a directed graph based on the integrated lncRNA similarity matrix S_{LIS}, and construct another directed graph based on the integrated miRNA similarity matrix S_{MIS}. Considering miRNA M_{j}, the j th column of interaction matrix Y is regarded as the initial labels of all nodes (lncRNAs) in the integrated lncRNA similaritybased graph. The label information is iteratively propagated in the graph until convergence, and the details about label propagation can refer to [42]. The prediction matrix P^{l} with size l × m is obtained. Similarly, considering lncRNA L_{i}, the ith row of interaction matrix Y is regarded as the initial labels of all nodes (miRNAs) in the integrated miRNA similaritybased graph, and the l × m prediction matrix P^{m}. Finally, the prediction result of SLNPMSC model is produced by:
where 0 ≤ β ≤ 1 is the weighted coefficient.
Interaction profilebased information combination
In this section, we propose the interaction profilebased information combination strategy to build a sequencederived linear neighborhood propagation model, abbreviated as SLNPMPC.
The interaction profiles of lncRNAs (miRNAs) without any interaction information are unavailable, and corresponding rows (columns) in the interaction matrix are all zeros. The interaction profilebased information integration strategy is described below.
For miRNA L_{i}, which does not have any interaction, its interaction profile is complemented by the sequence information,
where N(L_{i}) is the set of k most similar lncRNAs to the lncRNA L_{i} based on lncRNA sequence similarity S_{LSF}, and each of similar lncRNAs has at least one association with miRNAs. Q_{i} is the sum of similarity between the lncRNA L_{i} and k most similar lncRNAs, \( {Q}_i={\sum}_{i_k\epsilon N\left({L}_i\right)}{S}_{LSF}\left(i,{i}_k\right) \).
Similarly, for miRNA M_{j}, which does not have any interaction, its interaction profile is complemented by the sequence information,
where N(M_{i}) is the set of k most similar miRNAs for the miRNA M_{j} based on miRNA sequence similarity S_{MSF}, and each of similar miRNAs has at least one association with lncRNAs. Q_{j} is the sum of similarity between the miRNA M_{j} and k most similar miRNAs, \( {Q}_j={\sum}_{j_k\epsilon N\left({M}_j\right)}{S}_{MSF}\left(j,{j}_k\right) \).
After complementing interaction profiles by using lncRNA (miRNA) sequence similarities, we can calculate interaction similarity matrices for lncRNA and miRNA respectively. Then, we construct prediction models based on lncRNAlncRNA similarity graph and miRNAmiRNA similarity graph by using label propagation, and the prediction models produce the prediction matrices P^{m} and P^{l}. The final prediction matrix P_{SLNPM − PC} is produced by a weighted average of two prediction matrices,
where 0 ≤ β ≤ 1 is the weighted coefficient.
Results and discussion
Evaluation metrics
Here, we adopt 5fold crossvalidation (5CV) to evaluate prediction models. Specifically, we randomly split known lncRNAmiRNA interactions into five subsets. In each fold, we keep one subset as the testing set, and use others as the training set. All the prediction models are built on the interactions in the training set, and then make predictions for other lncRNAmiRNA pairs. Then, the predictions and real labels (interactions or not) for these pairs are used to calculate evaluation metrics: the area under receiveroperating characteristic curve (AUC), the area under precisionrecall curve (AUPR), sensitivity (SEN), specificity (SPEC), precision (PREC), accuracy (ACC) and Fmeasure (F).
The area under the precisionrecall curve (AUPR) and the area under the ROC curve (AUC) are adopted as the evaluation metrics. AUPR and AUC evaluate the performances of prediction models regardless of any threshold. We also adopt binary classification metrics to measure the models, i.e. recall (REC), specificity (SP), precision (PR), accuracy (ACC) and F1measure (F1). In the experiments, we run 20 runs of 5CV for each model and adopt averages.
Parameter settings
In this study, both SLNPMSC and SLNPMPC have two major components: the linear neighborhood similarity calculation and similaritybased label propagation. The linear neighborhood similarity has the parameter: neighbor number K, and the label propagation has the parameter: absorbing probability α. β is a tradeoff parameter in the final prediction phase. Here, we consider different combinations of following values: {10%, 20%, 30%, …, 90%} of number of data points for K, {0.1, 0.2, 0.3, …, 0.9} for α and {0, 0.05, 0.1, …, 0.95, 1} for β to build SLNPMSC model and SLNPMPC model, and then evaluate the influence of parameters. All the experiments are conducted with 5fold crossvalidation on SLNPMS dataset. The result shows that SLNPMSC model achieves the best AUPR score of 0.6033 when K = 80%, α = 0.4 and β = 0.25 and SLNPMPC model produces the best AUPR score of 0.5996 when K = 90%, α = 0.4 and β = 0.25.
For simplicity, we use the parameter setting in the SLNPMSC model for analysis. Firstly, we set β = 0.25 and then evaluate the influence of K and α on the performances of SLNPMSC model. The AUPR scores of SLNPMSC models with different combinations of K value and α value are visualized in Fig. 2 (a). This figure indicates that the parameter α has great impact on the performance of SLNPMSC model. More specifically, when α becomes greater, the performances first increase and then decrease after a peak. Besides, better performance can also be obtained as the neighborhood ratio K keeps increasing. This result might be the consequence of more neighbors’ information being considered to calculate similarity. Then, we fix K = 0.8 and α = 0.4 and evaluate the influence of parameter β in the prediction model. Note that β is a tradeoff parameter between lncRNAbased prediction and miRNAbased prediction. The parameter β = 1 means that SLNPMSC only utilizes the lncRNAlncRNA similarity information in lncRNAmiRNA interaction prediction. Vice versa, SLNPMSC only uses the miRNAmiRNA similarity information when β = 0. All the results are summarized and shown in Fig. 2 (b) and denote that the prediction model produces the best result when β = 0.25. This result demonstrates the SLNPMSC model depends more on the miRNA informationbased component than the lncRNA informationbased component (0.75 VS. 0.25).
Therefore, we adopt K = 80%, α = 0.4 and β = 0.25 for SLNPMSC model and K = 90%, α = 0.4 and β = 0.25 for the SLNPMPC model in the following sections.
Results of SLNPMSC and SLNPMPC
SLNPMSC integrates sequence similarity and interaction profile similarity to obtain combined similarities, and then makes predictions based on the combined similarities; SLNPMPC utilizes the sequence similarities to complement the interaction profiles and then calculates the interaction profile similarity to make predictions.
To demonstrate the superiority of the SLNPMSC and SLNPMPC, we build several similar models by using individual features or other similarity measures. First, we respectively build sequencederived linear neighbor propagation (SLNPM) models based on either interaction profile similarities or sequence similarities. Since existing work [43] ever used the expression profiles of lncRNAs and miRNAs in predicting lncRNAmiRNA interactions, we calculate the expression profile similarity by using linear neighborhood similarity measure (LNS) and build the SLNPM model. We also calculate the sequence similarity by using the SmithWaterman algorithm (SW) [44] and build the SLNPM model. The performances of the above models are evaluated on SLNPMS dataset by using 5CV, and results are shown in Table 2. Clearly, SLNPMSC and SLNPMPC produce better results than other SLNPM models, indicating the effectiveness of two information combination strategies. The SLNPM model produced by LNS has better performances than the SLNPM model produced by SW, demonstrating the LNS can better measure lncRNAlncRNA similarity and miRNAmiRNA similarity than SW. Moreover, the SLNPM models which utilize interaction profile similarities outperform other SLNPM models based on individual feature similarities, revealing the importance of interaction profiles.
Previous studies [26, 29] and our experimental results demonstrate that interaction profiles are critical for predicting lncRNAmiRNA associations. However, interaction profiles of some lncRNAs (miRNAs) are unavailable. Therefore, the models which mainly rely on interaction profiles cannot make predictions for such lncRNAs (miRNAs), and thus we solve this problem with the proposed information combination strategies which utilize the biological feature: lncRNA (miRNA) sequences. Besides, we notice that expression profiles can also describe lncRNAs (miRNAs), and relevant study [28] shows expression profiles play a crucial role in lncRNAmiRNA interactions. To compare the effectiveness of different information sources used in the combination strategy, we respectively utilize sequences and expression profiles to build SLNPMSC and SLNPMPC. The performances of these models are evaluated by 5CV and detailed results are displayed in Table 3. Specifically, we calculate the lncRNA expression profile similarity and miRNA expression profile similarity by using linear neighborhood similarity measure, and build SLNPMSC (M2) model and SLNPMPC model (M4), our original SLNPMSC model(M1) and SLNPMPC model(M3) based on sequence similarity are denoted by M1 and M3 respectively. Clearly, the SLNPM models based on the sequence similarity can lead to much better performances than the SLNPM models based on expression profile similarity.
Since we implement 20 runs of 5CV for each model, we can obtain 20 AUPR scores and 20 AUC scores of each model. Further, we test the statistical difference between SLNPMSC models (M1 and M2) and SLNPMPC models (M3 and M4) by using the paired ttest. For the SLNPMSC models, the Pvalues are 7.97E27 (M2 VS. M1) and 1.07E10 (M2 VS. M1) respectively in terms of the AUPR scores and AUC scores. For the SLNPMPC models, considering the AUPR scores and AUC scores, the Pvalues are 1.24E22 (M3 VS. M4) and 1.63E04 (M3 VS. M4), respectively. The experimental results show that two editions of sequencederived linear neighborhood propagation method (M1 and M3) can statistically outperform the SLNPM models based on expression information (M2 and M4) in terms of AUPR and AUC (Pvalue< 0.05).
Comparison with stateoftheart methods
To the best of our knowledge, there are only a few machinelearning based methods for lncRNAmiRNA interaction prediction. Here, we adopt EPLMI [26] and INLMI [28] as benchmark methods. EPLMI is a twoway diffusion model which uses the known lncRNAmiRNA interactionbased bipartite graph and expression profiles to predict lncRNAmiRNA interaction. We implement EPLMI using its publicly available source code. INLMI [28] integrates the expression similarity network and the sequence similarity network to predict lncRNA–miRNA interactions, and we implement this model according to descriptions in [28]. Since predicting lncRNAmiRNA interactions can be considered as a link prediction task, we adopt several network link inference methods as baseline methods, i.e. the collaborative filtering method (CF) [45] and the resource allocation algorithm (RA) [46]. The collaborative filtering method takes known lncRNAmiRNA interactions as a bipartite graph and exploits external information, such as expression profiles to calculate the lncRNAlncRNA similarity and miRNAmiRNA similarity. Then, CF method finds neighbors for each lncRNA and each miRNA, and then predicts unknown interactions by utilizing a weighted average of its neighbors’ interacting miRNAs/lncRNAs, then combines the lncRNAs’ neighborsbased prediction and the miRNAs’ neighborbased prediction with a tradeoff parameter. The resource allocation algorithm also formulates lncRNAs/miRNAs as nodes and lncRNAmiRNA interactions as links in a bipartite graph. Interaction information is iteratively propagated from miRNAs to their linked lncRNAs, and then the information is allocated from lncRNAs back to miRNAs. After finite iteration, final resources for miRNAs are probabilities that the lncRNA interacts with these miRNAs. EPLMI and RA have no parameter. INLMI has a parameter that represents the dimension of latent variable in the nonnegative matrix factorization. CF has a tradeoff parameter for the lncRNAs’ neighborbased prediction and the miRNAs’ neighborbased prediction. We tuned the parameters of INLMI and CF, and adopted the values that produce the best results.
All models are evaluated on SLNPMS dataset by using 5CV. As shown in Table 4, SLNPMSC model achieves AUPR score of 0.6033 and AUC score of 0.9115, and SLNPMPC model produces AUPR score of 0.5996 and AUC score of 0.9006. The performances of the proposed models are far better than EPLMI (AUPR score of 0.0706 and AUC score of 0.8494), INLMI (AUPR score of 0.0723 and AUC score of 0.8477), RA (AUPR score of 0.5078 and AUC score of 0.8637) and CF (AUPR score of 0.2363 and AUC score of 0.8610). There are several reasons why SLNPMSC and SLNPMPC have excellent prediction performances. On one hand, the linear neighborhood similarity measure effectively calculates the lncRNAlncRNA similarities and miRNAmiRNA similarities. On the other hand, the integrated similarities and complemented interaction profile make use of diverse information.
In the computational predictions, the topranked predictions are very important and reflect the performances of models. Here, we check up on the topranked predictions ranging from top 100 to top 1000, and figure out how many real interactions can be predicted. As shown in Fig. 3, SLNPMSC model and SLNPMPC model perform better than the other three methods when checking up on topranked predictions. In the top 100 predictions, EPLMI, INLMI, RA, CF, SLNPMSC and SLNPMPC find out 18, 19, 87, 33, 91 and 91 real interactions respectively. Importantly, SLNPMSC model and SLNPMPC model can respectively predict 71 and 70% of interactions when only verifying top 1000 predictions.
Case studies
In this section, we conduct the experiments on SLNPML dataset to demonstrate the practical capability of SLNPMSC and SLNPMPC for the lncRNAmiRNA interaction prediction.
First, we analyze the performances of SLNPMSC and SLNPMPC for predicting lncRNAs (miRNAs) interacted with a specific miRNA (lncRNA). In the experiment, we remove the interactions of a specific lncRNA or the interactions of a specific miRNA in our dataset, and build the SLNPMSC model and SLNPMPC model to predict the removed interactions. For every lncRNA or miRNA, we adopt the prediction scores and real labels (interaction or noninteraction) to calculate the AUC scores. We conduct the statistical analysis on the results for lncRNAs and miRNAs, and draw the boxplot. As shown in Fig. 4, the medians of lncRNAs and miRNAs are all larger than 0.65, indicating SLNPMSC model and SLNPMPC model can produce satisfying results in predicting lncRNAinteracting miRNAs and miRNAinteracting lncRNAs.
Further, we build the SLNPMSC model and SLNPMPC model based on SLNPML dataset to predict novel lncRNAmiRNA interactions, which are not included in the SLNPML dataset. Since the SLNPML dataset is compiled from lncRNASNP [17], the predictions are validated by other databases and publicly available literature. We take the lncRNA “MALAT1” and the miRNA “hsamiR175p” as examples, and respectively build prediction models (SLNPMSC and SLNPMPC) to predict miRNAs interacting with “MALAT1” and lncRNAs interacting with “hsamiR175p”. The lncRNA MALAT1(metastasisassociated lung adenocarcinoma transcript 1), a bona fide long noncoding RNA, is reported to be closely related with lung cancer and is one of the first discovered cancerassociated lncRNAs [47, 48]. The miRNA hasmiR175p, also known as miR17, is identified as a member of solid cancer miRNA signature [49], and also acts as both an oncogene and a tumor suppressor in different cellular contexts [50, 51].
The top 10 predictions for the lncRNA “MALAT1” and the miRNA “hsamiR175p” are shown in Table 5. Both SLNPMSC and SLNPMPC correctly predict that hsamiR1 can interact with the lncRNA “MALAT1”. The study [60] reported that MALAT1 was identified as the target of miRNA hsamiR1, and MALAT1 could directly bind with hsamiR1, and level of miRNA hsamiR1 was negatively associated with that of MALAT1 in breast cancer tissues. In general, SLNPMSC successfully identifies 5 miRNAs interacting with the lncRNA “MALAT1” and 4 lncRNAs interacting with the miRNA “hsamiR175p”; SLNPMSC identifies 8 miRNAs interacting with the lncRNA “MALAT1” and 4 lncRNAs interacting with the miRNA “hsamiR175p”. Therefore, both SLNPMSC and SLNPMPC can predict novel lncRNAmiRNA interactions with high accuracy.
Conclusions
LncRNAmiRNA interactions are critical to many biological events, and exploring these interactions contributes to understanding lncRNA’s functions. In this work, we propose a computational method named the sequencederived linear neighborhood propagation method (SLNPM). SLNPM makes the best use of lncRNA sequences, miRNA sequences and known lncRNAmiRNA interactions to predict novel lncRNAmiRNA interactions. To deal with the miRNAs (or lncRNAs) without interaction information, we introduce two information combination strategies: similaritybased information combination and interaction profilebased information combination, and develop two editions of SLNPM: SLNPMSC and SLNPMPC. The proposed models are compared with benchmark methods and baseline methods. The experimental results show that the interaction profiles are very important for the highaccuracy performances of SLNPMSC and SLNPMPC, and the information combination strategies further improve performances. The prediction capabilities of proposed models are also tested by case studies, and predicted lncRNAs (miRNAs) for the given miRNA (lncRNAs) are confirmed by existing literature. In conclusion, SLNPMSC and SLNPMPC are promising for lncRNAmiRNA interaction prediction. However, SLNPM has several parameters, and it costs a large amount of time to determine optimal parameters. How to effectively tune parameters of SLNPM is our future consideration.
Availability of data and materials
Not applicable.
Abbreviations
 5CV:

5fold crossvalidation
 AUC:

Area under ROC curve
 AUPR:

Area under the precisionrecall curve
 IP:

Interaction profile
 SLNPMPC:

Sequencederived linear neighborhood propagation method based on interaction profile information combination
 SLNPMSC:

Sequencederived linear neighborhood propagation method based on similarity information combination
References
 1.
Mercer TR, Dinger ME, Mattick JS. Long noncoding RNAs: insights into functions. Nat Rev Genet. 2009;10(3):155–9.
 2.
Hung T, Chang HY. Long noncoding RNA in genome regulation: prospects and mechanisms. RNA Biol. 2010;7(5):582–5.
 3.
Fatica A, Bozzoni I. Long noncoding RNAs: new players in cell differentiation and development. Nat Rev Genet. 2014;15(1):7–21.
 4.
Turner M, Galloway A, Vigorito E. Noncoding RNA and its associated proteins as regulatory elements of the immune system. Nat Immunol. 2014;15(6):484–91.
 5.
Chakravarty D, Sboner A, Nair SS, Giannopoulou E, Li RH, Hennig S, Mosquera JM, Pauwels J, Park K, Kossai M, et al. The oestrogen receptor alpharegulated lncRNA NEAT1 is a critical modulator of prostate cancer. Nat Commun. 2014;5:1–3.
 6.
Xia T, Liao Q, Jiang X, Shao Y, Xiao B, Xi Y, Guo J. Long noncoding RNA associatedcompeting endogenous RNAs in gastric cancer. Sci Rep. 2014;4:6088.
 7.
Quagliata L, Matter MS, Piscuoglio S, Arabi L, Ruiz C, Procino A, Kovac M, Moretti F, Makowska Z, Boldanova T. lncRNA HOTTIP / HOXA13 expression is associated with disease progression and predicts outcome in hepatocellular carcinoma patients. Hepatology. 2014;59(3):911.
 8.
Zheng HT, Shi DB, Wang YW, Li XX, Xu Y, Tripathi P, Gu WL, Cai GX, Cai SJ. High expression of lncRNA MALAT1 suggests a biomarker of poor prognosis in colorectal cancer. Int J Clin Exp Pathol. 2014;7(6):3174.
 9.
Fang JS, Li YJ, Liu R, Pang XC, Li C, Yang RY, He YY, Lian WW, Liu AL, Du GH. Discovery of multitargetdirected ligands against Alzheimer's disease through systematic prediction of chemical protein interactions. J Chem Inf Model. 2015;55(1):149–64.
 10.
Sun H, Wang G, Peng Y, Zeng Y, Zhu QN, Li TL, Cai JQ, Zhou HH, Zhu YS. H19 lncRNA mediates 17βestradiolinduced cell proliferation in MCF7 breast cancer cells. Oncol Rep. 2015;33(6):3045–52.
 11.
Qian L, Jianguo H, Nanjiang Z, Ziqiang Z, Ali Z, Zhaohui L, Fangting W, YinYuan M. LncRNA loc285194 is a p53regulated tumor suppressor. Nucleic Acids Res. 2013;41(9):4976–87.
 12.
Xu MD, Wang Y, Weng W, Wei P, Qi P, Zhang Q, Tan C, Ni SJ, Dong L, Yang Y. A positive feedback loop of lncRNAPVT1 and FOXM1 facilitates gastric Cancer growth and invasion. Clin Cancer Res. 2016;23(8):2071.
 13.
Simon MD. Capture hybridization analysis of RNA targets (CHART). Curr Protoc Mol Biol. 2013;21(21 25):1–6.
 14.
Berghoff EG, Clark MF, Chen S, Cajigas I, Leib DE, Kohtz JD. Evf2 (Dlx6as) lncRNA regulates ultraconserved enhancer methylation and the differential transcriptional control of adjacent genes. Development. 2013;140(21):4407–16.
 15.
Hao YJ, Wu W, Li H, Yuan J, Luo JJ, Zhao Y, Chen RS. NPInter v3.0: an upgraded database of noncoding RNAassociated interactions. DatabaseOxford. 2016;:1–5. https://doi.org/10.1093/database/baw057
 16.
Wang TJ, Xie HW. Drug target proteins prediction with network topological indices. Res J Biotechnol. 2014;9(12):76–81.
 17.
Gong J, Liu W, Zhang J, Miao X, Guo AY. lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse. Nucleic Acids Res. 2015;43(Database issue):D181–6.
 18.
Kallen AN, XiaoBo Z, Jie X, Chong Q, Jing M, Lei Y, Lingeng L, Chaochun L, JaeSung Y, Haifeng Z. The imprinted H19 lncRNA antagonizes let7 microRNAs. Mol Cell. 2013;52(1):101–12.
 19.
Hongyi Z, Kai C, Jing W, Xiaoying W, Kai C, Fangfang S, Longwei J, Yunxia Z, Jun D. MiR7, inhibited indirectly by lincRNA HOTAIR, directly inhibits SETDB1 and reverses the EMT of breast cancer stem cells by downregulating the STAT3 pathway. Stem Cells. 2015;32(11):2858–68.
 20.
Zhang W, Qu QL, Zhang YQ, Wang W. The linear neighborhood propagation method for predicting long noncoding RNA  protein interactions. Neurocomputing. 2018;273:526–34.
 21.
Ao L, Zang Q, Sun D, Wang M. A text featurebased approach for literature mining of lncRNA–protein interactions. Neurocomputing. 2016;206:73–80.
 22.
Hu H, Zhu C, Ai H, Zhang L, Zhao J, Zhao Q, Liu H. LPIETSLP: lncRNAprotein interaction prediction using eigenvalue transformationbased semisupervised link prediction. Mol BioSyst. 2017;13(9):1781–7.
 23.
Zheng X, Yang W, Kai T, Zhou J, Guan J, Luo L, Zhou S. Fusing multiple proteinprotein similarity networks to effectively predict lncRNAprotein interactions. BMC Bioinformatics. 2017;18(Suppl 12):420.
 24.
Zhang W, Yue X, Tang G, Wu W, Huang F, Zhang X. SFPELLPI: sequencebased feature projection ensemble learning for predicting LncRNAprotein interactions. PLoS Comput Biol. 2018;14(12):e1006616.
 25.
Zhang T, Wang M, Xi J, Ao L. LPGNMF: Predicting Long Noncoding RNA and Protein Interaction Using Graph Regularized Nonnegative Matrix Factorization. IEEE/ACM Trans Comput Biol Bioinform. 2018;PP(99):1–1.
 26.
Huang YA, Chan K, You ZH. Constructing Prediction Models from Expression Profiles for Large Scale lncRNAmiRNA Interaction Profiling. Bioinformatics. 2017;34(5):812–9.
 27.
Huang ZA, Huang YA, You ZH, Zhu Z, Sun Y. Novel link prediction for largescale miRNAlncRNA interaction network in a bipartite graph. BMC Med Genet. 2018;11(6):113.
 28.
Hu P, Huang YA, Chan KCC, You ZH. Discovering an Integrated Network in Heterogeneous Data for Predicting lncRNAmiRNA Interactions. Cham: Springer; 2018. p. 539–45.
 29.
Zhang W, Tang G, Wang S, Chen Y, Zhou S, Li X. Sequencederived linear neighborhood propagation method for predicting lncRNAmiRNA interactions. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2018.
 30.
Fang S, Zhang L, Guo J, Niu Y, Wu Y, Li H, Zhao L, Li X, Teng X, Sun X, et al. NONCODEV5: a comprehensive annotation database for long noncoding RNAs. Nucleic Acids Res. 2018;46(D1):D308–14.
 31.
Kozomara A, GriffithsJones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42(D1):D68–73.
 32.
Panwar B, Omenn GS, Guan YF. miRmine: a database of human miRNA expression profiles. Bioinformatics. 2017;33(10):1554–60.
 33.
Zhang W, Chen Y, Tu S, Liu F, Qu Q. Drug side effect prediction through linear neighborhoods and multiple data source integration. 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, 2016, pp. 427–34.
 34.
Zhang W, Yue X, Chen YL, Lin WR, Li BL, Liu F, Li XH. Predicting drugdisease associations based on the known association bipartite network. In: 2017 Ieee International Conference on Bioinformatics and Biomedicine (Bibm); 2017. p. 503–9.
 35.
Zhang W, Yue X, Huang F, Liu R, Chen Y, Ruan C. Predicting drugdisease associations and their therapeutic function based on the drugdisease association bipartite network. Methods. 2018;145:51–9.
 36.
Zhang W, Jing K, Huang F, Chen Y, Li B, Li J, Gong J. SFLLN: a sparse feature learning ensemble method with linear neighborhood regularization for predicting drug–drug interactions. Inf Sci. 2019;497:189–201.
 37.
Zhang W, Li Z, Guo W, Yang W, Huang F. A fast linear neighborhood similaritybased network link inference method to predict microRNAdisease associations. IEEE/ACM transactions on computational biology and bioinformatics, Early Access, https://doi.org/10.1109/TCBB.2019.2931546.
 38.
Li DF, Luo LQ, Zhang W, Liu F, Luo F. A genetic algorithmbased weighted ensemble method for predicting transposonderived piRNAs. Bmc Bioinformatics. 2016;17:329.
 39.
Zhang W, Chen YL, Li DF. DrugTarget Interaction Prediction through Label Propagation with Linear Neighborhood Information. Molecules. 2017;22(12):2056.
 40.
Zhang W, Chen YL, Liu F, Luo F, Tian G, Li XH. Predicting potential drugdrug interactions by integrating chemical, biological, phenotypic and network data. Bmc Bioinformatics. 2017;18:18.
 41.
Zhang W, Yue X, Liu F, Chen YL, Tu SK, Zhang XN. A unified frame of predicting side effects of drugs by using linear neighborhood similarity. BMC Syst Biol. 2017;11:101.
 42.
Wen Z, Weitai Y, Xiaoting L, Feng H, Fei L. The bidirection similarity integration method for predicting microbedisease associations. IEEE Access. 2018;6:38052–61.
 43.
Huang YA, Chan KCC, You ZH. Constructing prediction models from expression profiles for large scale lncRNAmiRNA interaction profiling. Bioinformatics. 2018;34(5):812–9.
 44.
Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985;13(2):645–56.
 45.
Schafer JB, Frankowski D, Herlocker J, Sen S. Collaborative filtering recommender systems. ACM Trans Inf Syst. 2004;22(1):5–53.
 46.
Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, Zhang YC. Solving the apparent diversityaccuracy dilemma of recommender systems. Proc Natl Acad Sci U S A. 2010;107(10):4511–5.
 47.
Gutschner T, Hämmerle M, Diederichs S. MALAT1 — a paradigm for long noncoding RNA function in cancer. J Mol Med. 2013;91(7):791–801.
 48.
Tony G, Monika HM, Moritz E, Jeff H, Youngsoo K, Alexey R, Gayatri A, Marion S, Matthias G. The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res. 2013;73(3):1180–9.
 49.
Volinia S, Calin G, Liu CG, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C, Ferracin M, et al. A microRNA expression signature of human solid tumors define cancer gene targets. Proc Natl Acad Sci U S A. 2006;103:2257–61.
 50.
Cloonan N, Brown MK, Steptoe AL, Wani S, Chan WL, Forrest AR, Kolle G, Gabrielli B, Grimmond SM. The miR175p microRNA is a key regulator of the G1/S phase cell cycle transition. Genome Biol. 2008;9(8):R127.
 51.
Li H, Bian C, Liao L, Li J, Zhao RC. miR175p promotes human breast cancer cell migration and invasion through suppression of HBP1. Breast Cancer Res Treat. 2011;126(3):565–75.
 52.
Jin C, Yan B, Lu Q, Lin Y, Ma L. Reciprocal regulation of HsamiR1 and long noncoding RNA MALAT1 promotes triplenegative breast cancer development. Tumour Biol. 2015;37(6):7383–94.
 53.
Wang H, Li W, Zhang G, Lu C, Chu H, Rui Y, Zhao G. MALAT1/miR1013p/MCL1 axis mediates cisplatin resistance in lung cancer. Oncotarget. 2018;9(7):7501–12.
 54.
Wang SH, Zhang WJ, Wu XC, Zhang MD, Weng MZ, Zhou D, Wang JD, Quan ZW. Long noncoding RNA Malat1 promotes gallbladder cancer development by acting as a molecular sponge to regulate miR206. Oncotarget. 2016;7(25):37857–67.
 55.
JunHao L, Shun L, Hui Z, LiangHu Q, JianHua Y. starBase v2.0: decoding miRNAceRNA, miRNAncRNA and proteinRNA interaction networks from largescale CLIPSeq data. Nucleic Acids Res. 2014;42(Database issue):D92.
 56.
Xia C, Liang S, He Z, Zhu X, Chen R, Chen J. Metformin, a firstline drug for type 2 diabetes mellitus, disrupts the MALAT1/miR1423p sponge to decrease invasion and migration in cervical cancer cells. Eur J Pharmacol. 2018;830:59–67.
 57.
Zhang Y, Tang X, Shi M, Wen C, Shen B. MiR216a decreases MALAT1 expression, induces G2/M arrest and apoptosis in pancreatic cancer cells. Biochem Biophys Res Commun. 2017;483(2):816–22.
 58.
Wang P, Li J, Zhao W, Shang C, Jiang X, Wang Y, Zhou B, Bao F, Qiao H. A novel LncRNAmiRNAmRNA triple network identifies LncRNA RP11363E7.4 as an important regulator of miRNA and gene expression in gastric Cancer. Cell Physiol Biochem. 2018;47(3):1025–41.
 59.
Li L, Yang Z, Wang Y, Zhang Y, Zhou Y, Wang W, Lin L, Su W. Long noncoding RNA MALAT1 promote triplenegative breast cancer progression by regulating miR204 expression. Biosci Rep. 2016;9:969–77.
 60.
Liu R, Li J, Lai Y, Liao Y, Liu R, Qiu W. HsamiR1 suppresses breast cancer development by downregulating Kras and long noncoding RNA MALAT1. Int J Biol Macromol. 2015;81:491–7.
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Genomics Volume 20 Supplement 11, 2019: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2018: genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume20supplement11.
Funding
Publication costs are funded by National Key Research and Development Program (2018YFC0407904), the National Natural Science Foundation of China (61772381, 61572368) and Huazhong Agricultural University Scientific & Technological Selfinnovation Foundation. The funders have no role in the design of the study and collection analysis, and interpretation of data and writing the manuscript.
Author information
Affiliations
Contributions
WZ designed the study, implemented the algorithm and drafted the manuscript. GT implemented the algorithm and drafted the manuscript. SZ, YN helped prepare the data and draft the manuscript. All authors read and approve the final manuscript.
Corresponding authors
Correspondence to Wen Zhang or Yanqing Niu.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhang, W., Tang, G., Zhou, S. et al. LncRNAmiRNA interaction prediction through sequencederived linear neighborhood propagation method with information combination. BMC Genomics 20, 946 (2019). https://doi.org/10.1186/s128640196284y
Published:
Keywords
 lncRNAmiRNA interactions
 Integrated similarity
 Label propagation