Skip to main content

FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association

Abstract

Background

In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations. Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations.

Result

First, we extract three miRNA similarity kernels and three disease similarity kernels. Then, we combine these kernels into a single kernel through the Fast Kernel Learning (FKL) model, and use sparse kernel (Spa) to eliminate noise in the integrated similarity kernel. Finally, we find the associations via Laplacian Regularized Least Squares (LapRLS). Based on three evaluation methods, global and local leave-one-out cross validation (LOOCV), and 5-fold cross validation, the AUCs of our method achieve 0.9563, 0.8398 and 0.9535, thus it can be seen that our method is reliable. Then, we use case studies of eight neoplasms to further analyze the performance of our method. We find that most of the predicted miRNA-disease associations are confirmed by previous traditional experiments, and some important miRNAs should be paid more attention, which uncover more associations of various neoplasms than other miRNAs.

Conclusions

Our proposed model can reveal miRNA-disease associations and improve the accuracy of correlation prediction for various diseases. Our method can be also easily extended with more similarity kernels.

Background

MicroRNAs (miRNAs) are some of non-coding RNAs with 20∼25 nucleotides [1]. In the process of post-transcription, miRNAs are a part of messenger RNA (mRNA) sequences and affect protein synthesis [2–4]. Some previous studies have proved that miRNAs are related to various diseases including cancers. For example, the expression level of hsa-mir-21 leads to more than 125 diseases, such as Alzheimer Disease, Diabetes Mellitus, Lymphoma and so on. Thus, the research of miRNAs is helpful for the diagnosis and treatment of diseases [5]. The traditional experiments to detect the associations between miRNAs and diseases are time-consuming and expensive [6]. Therefore, it is especially important to find potential miRNA-disease associations by the computational methods [7]. Previous researches achieved massive miRNA-disease associations through the traditional experiments, and some databases have been constructed for miRNA-disease associations. Human MicroRNA Disease Database (HMDD) [8] collects 572 miRNAs, 378 Disease and 10368 miRNA-disease associations. The miR2Disease [9] includes 349 miRNAs, 163 disease and 3273 miRNA-disease associations. The dbDEMC contains of 2224 miRNAs, 36 cancer types and 20037 miRNA-disease associations through the high-throughput methods. Thus, these associations promote the development of the computing methods.

Up to now, it has achieved excellent performance that people find the potential disease-miRNA associations by the computational methods [10–14]. Most of these methods are based on the assumption that miRNAs with high similarity apt to be related with similar diseases and vice versa [15, 16]. Xuan et al. [17] proposed HDMP that achieves a score for one miRNA by weighting k most similar neighbors, and a larger score has higher possibility to associate with a specific disease, but HDMP can’t work for a new disease without known related miRNAs. Jiang et al. [18] devised a hypergeometric distribution-based model to calculate the score of each miRNA for a specific disease, and the miRNA with larger score tend to cause this disease. Scores of above two methods are based on miRNA neighbor information, which ignores entire informations of miRNA similarity network. Many models find miRNA-disease associations based on the similarity networks [19–23]. Chen et al. developed the RWRMDA model [24], which uses the information of miRNA functional similarity network and known miRNA-disease association network, and utilizes the random walk model to find the potential miRNA-disease association. However, RWRMDA is faced with the same problem as HDMP, because of the initial nonzero vector. Therefore, Chen et al. [25] proposed WBSMDA to find the potential association by integrating the miRNA functional similarity network, disease semantic similarity and known miRNA-disease association network. For the similarity between two miRNAs/diseases, WBSMDA integrates Gaussian Interaction Profile (GIP) kernel similarity for miRNA and disease, and calculates the association probability for miRNA-disease pair using Within-Score and Between-Score of disease and miRNA. Gu et al. [26] developed NCPMDA by constructing novel similarity kernel for miRNA and disease via the matrix operation and calculating the space projection scores of miRNA and disease. The final score between miRNA and disease is calculated by combining two space projection scores. The predictive performance of NCPMDA is superior over the previous methods when working for a disease without any known related miRNAs [13].

Many previous models are based on defining a cost function and minimizing this cost function. Chen et al. [27] developed RLSMDA, a semi-supervised method, which minimizes the Regularized Least Squares cost function and uncovers the potential miRNAs associated with various diseases. After that, Chen et al. [28] proposed LRSSLMDA, which is used to reveal the potential association between miRNA and disease. LRSSLMDA constructs comprehensive statistical features and graph theoretic features by combining the miRNA and disease similarity kernels. Then, Laplacian regularization term is used to add objective function. Experimental results demonstrate that LRSSLMDA is a valuable computational model. In addition, many previous methods are based on machine learning algorithms [29, 30], matrix completion [31–33] and graph theory [34]. For example, Shen et al. [35] proposed CMFMDA that uses WKNKN to estimate association probability for unknown associations between miRNA and disease, and uses Collaborative Matrix Factorization to uncover the potential association. You et al. [36] developed PBMDA that constructs a heterogeneous graph by integrating five networks, gets all scores of paths for a miRNA-disease pair, and calculates the miRNA-disease association possibility through the sum of all path score. PBMDA gets a remarkable performance to find the potential miRNA-disease association.

All above methods have achieved remarkable results, but there are still different limitations or restrictions. For example, most of the existing methods are based on the assumption that miRNAs with high similarity apt to be related with similar diseases. About constructing miRNA and disease similarity kernel, most researches use the functional similarity and GIP kernel similarity for miRNA, and use the semantic similarity and GIP kernel similarity for disease. To integrate two similarity kernels, lots of works only tend to accumulate or average [29, 37, 38]. Therefore, there is an urgent need to propose an effective method for integrating multiple miRNA and disease similarity kernels [39].

In this paper, we firstly extract the miRNA functional similarity, the miRNA sequence similarity and GIP kernel similarity for miRNA, and the disease semantic similarity, disease functional similarity and GIP kernel similarity for disease. Then, we use the Fast Kernel Learning method to construct one miRNA similarity kernel and one disease similarity kernel. Finally, we propose a novel Sparse Laplacian Regularized Least Squares method to uncover the miRNA-disease association. Here, three evaluation methods are used to assess performance, including global Leave-One-Out Cross Validation (global LOOCV), local Leave-One-Out Cross Validation (local LOOCV) and 5-fold cross validation (5-fold CV). In these three evaluation methods, our method obtains the remarkable performance (AUCs of 0.9563, 0.8398 and 0.9535, respectively) compared with other nine models. And also, we use case studies of eight Neoplasms for further analyzing the performance of our method. We find that 47 of top 50 candidates are confirmed to have associations with Lymphoma in global verification, and all top 50 candidates are confirmed to have associations with Breast and Colorectal Neoplasms in local verification. Moreover, we find that some of the miRNAs need to be paid more attention to uncover more associations with various neoplasms, including hsa-mir-106b, hsa-mir-19b, hsa-mir-29c, hsa-mir-1, hsa-mir-29a and so on.

Methods

We firstly use three miRNA similarity kernels and three disease similarity kernels to uncover potential miRNA-disease associations, respectively. Then, we combine these similarity kernels into a miRNA similarity kernel and a disease similarity kernel using Fast Kernel Learning, and sparse two similarity kernels after combination. Finally, we use Laplacian Regularized Least Squares to construct a loss function and get predicted association matrix from miRNA and disease space, respectively. Figure 1 is the flow chart of our method.

Fig. 1
figure 1

The flowchart of our method, FKL-Spa-LapRLS, for the miRNA-disease association prediction

Human miRNA-disease associations dataset

In this paper, the set of miRNAs is denoted by \(M=\left \{m_{i}\right \}_{i=1}^{m}\), and the set of diseases is denoted by \(D=\left \{d_{j}\right \}_{j=1}^{n}\), where m and n are the numbers of miRNAs and diseases respectively. The associations between miRNAs and diseases can be downloaded from HMDD database, which include 5430 associations between 495 miRNAs and 383 diseases. The associations are represented by a binary matrix Y∈Rm×n, where yi,j∈{0,1}. if a miRNA mi is association with a disease dj, yi,j is set to 1; otherwise, yi,j is set to 0;

MiRNA similarity

Basing on the assumption that miRNAs with high similarity tend to be associated with the same disease, we extract three classes of miRNA similarity, including functional similarity, sequence similarity and Gaussian Interaction Profile (GIP) kernel similarity.

MiRNA functional similarity

In the previous works, the MISIM method [40] proposed by Cui et al. calculated the score of miRNA functional similarity. We extract 495 functional similarity score through MISIM and construct kernel \(K_{1}^{m} \in R^{m\times m}\) to represent the miRNA functional similarity network, in which \(K_{1}^{m}(m_{i},m_{j}\)) is the functional similarity score between miRNAs mi and mj.

MiRNA sequence similarity

All 495 miRNA sequences are downloaded from miRBase database [41]. We extract miRNA sequence similarity using the Needleman-Wunsch Algorithm and get kernel \(K_{2}^{m}\in R^{m\times m}\) to represent the miRNA similarity of sequence network, in which \(K_{2}^{m}(m_{i},m_{j})\) is the similarity of sequence score between miRNA mi and mj.

GIP kernel similarity for miRNAs

GIP the kernel similarity [29, 38, 42] between miRNAs mi and mj is denoted as \(K_{3}^{m}\in R^{m\times m}\) and the calculation method is as Eq. (1)

$$\begin{array}{@{}rcl@{}} K_{3}^{m}(m_{i},m_{j})=exp\left(-\gamma_{m} \parallel IP(m_{i})-IP(m_{j})\parallel^{2}\right) \end{array} $$
(1)

where IP(mi)∈R1×n denotes the interaction profiles of miRNA mi by observing whether miRNA mi is associated with each disease or not, that is to say, the i-th row of the associations matrix Y; γm is used for kernel bandwidth control, which is set to − 1 in this paper.

Disease similarity

We extract three classes of disease similarity, including semantic similarity, functional similarity and GIP kernel similarity.

Disease semantic similarity

In the previous research [37, 40], disease d(i) can be described as a node in Directed Acyclic Graph(DAG) based on the MeSH [43] database (https://www.nlm.nih.gov/bsd/disted/meshtutorial/themeshdatabase/), and denoted as \({DAG}_{d_{i}}=(d_{i},T_{d_{i}},E_{d_{i}})\), in which \(T_{d_{i}}\) is the set of all ancestor nodes of di including node di itself and \(E_{d_{i}}\) is the set of corresponding links. A semantic score of each disease \(t \in T_{d_{i}}\) can be calculated by Eq. (2).

$$ {}D_{d_{i}}(t) = \left\{ \begin{array}{lc} 1 & if ~t = d_{i} \\ max\left\{\Delta * D_{d_{i}}(t^{\prime}) |t^{\prime} \in children ~of ~t\right\} & if ~t \neq d_{i} \end{array} \right. $$
(2)

where Δ is the semantic contribution factor, which is set to 0.5 in this paper.

Then, we define the semantic score of disease di by Eq. (3).

$$\begin{array}{@{}rcl@{}} DV(d_{i}) = {\sum}_{t \in T_{d_{i}}} D_{d_{i}}(t) \end{array} $$
(3)

Therefore, we denote the disease semantic similarity as \(K_{1}^{d}\in R^{n\times n}\) and the disease semantic similarity value between di and dj is calculated by Eq. (4).

$$\begin{array}{@{}rcl@{}} K_{1}^{d}\left(d_{i},d_{j}\right) = \frac{{\sum}_{t \in T_{d_{i}} \cap T_{d_{j}}} \left(D_{d_{i}}(t)+D_{d_{j}}(t)\right)}{DV(d_{i})+DV(d_{j})} \end{array} $$
(4)

Disease functional similarity

The associations between disease-gene and gene-gene are widely used to understand disease similarity [44]. From the HumanNet [45] database, we download the interactions of genes and one interaction has an log likehood score (LLS) that measure the probability of a functional linkage between genes. The LLS scores are normalized by Eq. (5)

$$\begin{array}{@{}rcl@{}} LLS^{*}(g_{i},g_{j})=\frac{LLS(g_{i},g_{j})-{LLS}_{min}}{LLS_{max}-{LLS}_{min}} \end{array} $$
(5)

where LLS(gi,gj) represents LLS between the i-th and j-th genes; LLS∗(gi,gj) represents the LLS score after normalization; LLSmin and LLSmax indicate the minimum and maximum LLS scores in HumanNet respectively.

The functional similarity score between two genes is defined as Eq. (6)

$$ {}FS(g_{i},g_{j}) = \left\{ \begin{array}{lc} 1 & if ~i = j \\ LLS^{*}(g_{i},g_{j}) & if ~i \neq j ~and ~e(i,j) \in S_{HumanNET} \\ 0 & if ~i\neq j ~and ~e(i,j) \notin S_{HumanNET} \end{array} \right. $$
(6)

where SHumanNET indicates the gene-gene associations in the HumanNet database; e(i,j) indicates the association between i-th and j-th genes.

Then, the functional similarity score between a gene g and a gene set G is defined as Eq. (7).

$$\begin{array}{@{}rcl@{}} F_{G}(g)= \max_{g_{i}\in G}(FS(g,g_{i})) \end{array} $$
(7)

In many cases, a disease di is related to many genes, which is defined as gene set Gi, the associations between disease and genes are download from SIDD [46]. The disease functional similarity score is defined as Eq. (8)

$$\begin{array}{@{}rcl@{}} K_{2}^{d}\left(d_{i},d_{j}\right)=\frac{\sum_{g_{k} \in G_{j}}F_{G_{i}}(g_{k})+\sum_{g_{s} \in G_{i}}F_{G_{j}}(g_{s})}{|G_{j}|+|G_{i}|} \end{array} $$
(8)

GIP kernel similarity for diseases

Similar to calculation of GIP kernel similarity for miRNA, GIP kernel similarity for disease is denoted as \(K_{3}^{d}\in R^{n\times n}\), calculated as Eq. (9).

$$\begin{array}{@{}rcl@{}} K_{3}^{d}\left(d_{i},d_{j}\right)=exp\left(-\gamma_{d} \parallel IP(d_{i})-IP(d_{j})\parallel^{2}\right) \end{array} $$
(9)

where IP(di)∈Rm×1 denotes the interaction profiles of disease di by observing whether disease di is associated with each miRNA or not, that is to say, the i-th column of the associations matrix Y; γd is used for kernel bandwidth control, which is set to − 1 in this paper.

Fast kernel learning

Considering that a single similarity kernel cannot cover all information between miRNAs, we integrate \(K_{1}^{m}\), \(K_{2}^{m}\) for \(K_{3}^{m}\) to get a new integrated similarity kernel Km∈Rm×m using the method of Fast Kernel Learning (FKL) [47]. We define Km as Eq. (10).

$$\begin{array}{@{}rcl@{}} K^{m}=\sum_{j=1}^{3} \mu_{j}^{m} K_{j}^{m} \end{array} $$
(10)

It is believed that Km should be close to the associations metrix Y. We define the miRNAs associations similarity as Eq. (11).

$$\begin{array}{@{}rcl@{}} Y^{m}=YY^{T} \end{array} $$
(11)

Therefore, we would like to find μm∈R3×1 using the following Eq. (12) to minimize the distance between Km and Ym.

$$\begin{array}{@{}rcl@{}} \min \limits_{\mu^{m}} ||K^{m}-Y^{m}||_{F}^{2} \end{array} $$
(12)

where \(||K^{m}-Y^{m}||_{F}^{2} = \sum _{i}\sum _{j}\left (K_{i,j}^{m}-Y_{i,j}^{m}\right)^{2}\).

To avoid overfitting in learning procedure, a regularization term should be added to equation as Eq. (13).

$$\begin{array}{@{}rcl@{}} \begin{aligned} \min \limits_{\mu^{m}} &~~~||K^{m}-Y^{m}||_{F}^{2} + \lambda^{m}||\mu^{m}||^{2} \\ s.t. &~~~\mu_{j}^{m} \ge 0, j=1,2,3 \\ ~ &~~~\sum_{j=1}^{3} \mu_{j}^{m}=1 \end{aligned} \end{array} $$
(13)

where λm is set to 200 in this paper.

We use the matlab R2017a CVX to solve this optimization problem and obtain the integrate parameter \(\mathcal {\mu }^{m} \in R^{1 \times 3} \) for miRNA functional similarity, miRNA sequence similarity and GIP kernel similarity. Therefore, the integrated miRNA similarity kernel is defined as Eq. (14).

$$\begin{array}{@{}rcl@{}} K^{m}=\sum_{j=1}^{3} \mathcal{\mu}_{j}^{m} K_{j}^{m} \end{array} $$
(14)

Similarly, we obtain the integrate parameter \(\mathcal {\mu }^{d} \in R^{1 \times 3} \) for disease semantic similarity, disease functional similarity and GIP kernel similarity by FKL, and the integrated disease similarity kernel is defined as Eq. (15).

$$\begin{array}{@{}rcl@{}} K^{d}=\sum_{j=1}^{3} \mathcal{\mu}_{j}^{d} K_{j}^{d} \end{array} $$
(15)

Laplacian regularized least squares

Given the similarity kernels of miRNAs and diseases, we use Sparse Laplacian Regularized Least Squares (Spa-LapRLS) to get a new association matrix, and find potential miRNA-disease associations. It includes Sparse kernel model and LapRLS model.

Sparse kernel model

We use a Top-k Neighbor model to reduce noise in integrated similarity kernel. For the miRNA subspace, we construct a weight matrix wm∈Rm×m for Km, whose elements are defined as Eq. (16), by the Top-k Neighbor method.

$$ w_{m}(i,j) = \left\{\!\! \begin{array}{lc} 1 & if~K^{m}(i,j) > \max(T(k,i),T(k,j)) \\ 0.5 &\! if~K^{m}(i,j)\! \in [\min(T(k,i),T(k,j)),\max(T(k,i),T(k,j))] \\ 0 & if~K^{m}(i,j) < \min(T(k,i),T(k,j)) \end{array}\right. $$
(16)

where k satisfies condition 0<k<m; T(k,i) represents the k-th largest element of the i-th row in Km and T(k,j) represents the k-th largest element of the j-th column in Km.

Therefore, we record the denoised miRNA similarity kernel as Eq. (17)

$$\begin{array}{@{}rcl@{}} K_{m}^{*}=w_{m} \circ K^{m} \end{array} $$
(17)

Similarity, we also calculate the denoised disease similarity kernel as \(K_{d}^{*} \in R^{n \times n}\).

LapRLS for miRNA-disease interaction prediction

Given a pair of similarity kernels for miRNA \(K_{m}^{*}\) and disease \(K_{d}^{*}\), we first use the Least Squares on the two subspace, and add Laplacian Regularization term to avoid overfitting. For miRNA subspace, the objective function of LapRLS [48] is defined as Eq. (18)

$$\begin{array}{@{}rcl@{}} \min\limits_{F_{m}} ~||Y-F_{m}||_{F}^{2} + \beta_{m}||F_{m}^{T}L_{m}F_{m}||_{F}^{2} \end{array} $$
(18)

where \(F_{m}=K_{m}^{*} \alpha _{m} \in R^{m \times n}\) is the predictive association matrix from miRNA; \(L_{m} = D_{m}^{-\frac {1}{2}}\left (D_{m} -K_{m}^{*}\right)D_{m}^{-\frac {1}{2}} \), in which Dm is the diagonal matrix of \(K_{m}^{*}\) in the form of \(D_{m}(i,i)=\sum _{j=1}^{m}K_{m}^{*}(i,j)\); βm is the regularization coefficients, which is set to 2−5 in this paper; αm is renewed by the function Eq. (19) in [48].

$$ {}\alpha_{m}\,=\,\arg\!\! ~\min\limits_{\alpha_{m}\in R^{m \times n}} ~\!\! \left\{||Y\,-\,K_{m}^{*} \alpha_{m}\! ||_{F}^{2}\! +\! \beta_{m}||\alpha_{m}^{T}K_{m}^{*} L_{m}K_{m}^{*} \alpha_{m}||_{F}^{2}\right\} $$
(19)

The derivation of the optimization algorithm are presented in [48].

In this way, the predicted associations matrix for all miRNA-disease pairs from the view of miRNAs are obtained as Eq. (20).

$$ F_{m} = K_{m}^{*}\left(K_{m}^{*}+\beta_{m}L_{m}K_{m}^{*}\right)^{-1}Y $$
(20)

Similarly, we can get the predicted associations matrix for all miRNA-disease pairs from the view of miRNAs as Eq. (21)

$$ F_{d} = K_{d}^{*}\left(K_{d}^{*}+\beta_{d}L_{d}K_{d}^{*}\right)^{-1}Y^{T} $$
(21)

where \(F_{d}=K_{d}^{*} \alpha _{d} \in R^{n \times m}\); βd is the regularization coefficients, which is set to 2−5 in this paper.

In the end, the predicted associations matrix from the view of miRNA and disease is defined as Eq. (22)

$$ F^{*} = \frac{F_{m}+F_{d}^{T}}{2} $$
(22)

where F∗∈Rm×n.

Results and discussion

In this section, we study the performance of our method from different aspects on prediction of unknown miRNA-disease associations. First, we establish three evaluation methods and two assessment indicators to evaluate the accuracy of our method. Second, we analyze the performance of our method with different parameters by using 10-fold CV and local LOOCV. Third, we employ 10-fold CV and local LOOCV to analyze the performance of the FKL model. Fourth, we compare the performance of LapRLS with multiple matrix factorization method. Fifth, we compare the performance of FKL-Spa-LapRLS with nine outstanding methods. Finally, for a further validation, we implement the global and local verifications on eight neoplasms for case studies.

Evaluation criteria

In this paper, we implement 10-fold CV, global LOOCV and local LOOCV to evaluate the prediction accuracy of our method. In the 10-fold CV, all miRNA-disease associations are randomly divided into ten uncrossed groups, one of which is regarded as test set and the other nine groups are used for training set in turns. In the global LOOCV, all 5430 miRNA-disease verified associations are regarded as objective research sample, and each association is left in turns served as a test sample and other known associations are regarded as training sample. In the local LOOCV, only considering miRNAs for a specific disease, for disease d(i), each miRNA related to d(i) is left out as test set, and other associations are regarded as training set. All the miRNA-disease associations in test set are reseted as 0 in the association matrix Y.

In our study, we use Area Under Curve (AUC) and Area Under the Precision-Recall curve (AUPR) to establish the assessment criteria for method prediction. AUC is the area under the receiver operating characteristic (ROC) created by plotting true positive rate against false positive rate at various threshold settings. An AUC value of 1 indicates perfect performance and an AUC of 0.5 indicates random performance. AUPR is the area under the curve created by plotting precision against recall at various threshold setting. The greater the value of AUPR, the better performance of the model.

Parameter selection

In this section, we use 10-fold CV and local LOOCV to analyze several parameters, including γm, γd, λm, λd, βm, βd and k value.

The γm and γd are the parameters in the process of constructing GIP kernel similarity for miRNA and diseases, respectively. We just use GIP kernel similarity to predict potential miRNA-disease associations and use 10-fold CV to evaluate performance of GIP kernel with different parameters. Then, we take γm and γd from − 10 to 10 with step 1 and calculate AUCs, respectively. The results are shown in Fig. 2a. It shows that the performance of GIP similarity kernel is sensitive to γm and γd, and the optimal AUC is obtained when γm and γd equal to 0. However, the Km,3 and Kd,3 are matrices with ones in all elements according to Eqs. (1) and (9) when two parameters equal to 0. Therefore, we adopt suboptimal γm=−1 and γd=−1 in this paper. Since most of elements in GIP similarity kernel are more than 1, we need to normalize GIP similarity kernel before integrating multiple kernels.

Fig. 2
figure 2

The AUCs of parameters by the 10-fold CV. Blue line and red line represent the AUCs of using single kernel. Green line represents the AUCs of using two kernels. a The AUCs of GIP with different γ. b The AUCs of FKL with different λ. c The AUCs of LapRLS with different β

The λm and λd are the regularization coefficients of FKL. We use different λm and λd to integrate three miRNA similarity kernels and three disease similarity kernels, respectively. Then we use integrated similarity kernel and LapRLS to uncover potential associations and use 10-fold CV to evaluate performance of FKL with different parameters. The λm and λd are gradually varying from 0 to 15000 with step 100 in order to find the best value. The results are shown in Fig. 2b. It can be found that AUC keeps small fluctuation in the range between 0 to 15000. It demonstrates that FKL is insensitive to regularization coefficient. So, λm and λd are set to 200 in this paper.

The βm and βd are the regularization coefficients of LapRLS. We take βm and βd from 2−10 to 210, respectively. We adopt 10-fold CV to evaluate performance of LapRLS with different parameters. The results are shown in Fig. 2c. It can be found that AUC keeps small fluctuation in the range between 2−10 to 2−2, and AUC has obvious change when βm and βd greater than 2−2. We select the optimal βm and βd by the highest AUC value and set βm and βd as 2−5 in this paper.

Meanwhile, k value in the process of sparse kernel is an important parameter in this paper. We use 10-fold CV and local LOOCV to analyze k value. The value of k is taken from 20 to 250 with step 5, are shown in Fig. 3. It can be clearly seen that the process of sparse kernel has positive effect on the discovery of potential miRNA-disease associations. In this study, k value is set to 20 in the 10-fold CV and global LOOCV, and is set to 40 in the local LOOCV.

Fig. 3
figure 3

The results of our method with different k values. a The AUCs of LapRLS with different k by the 10-fold CV. b The AUCs of LapRLS with different k by the local LOOCV

FKL performance analysis

In this section, we analyze the performance of FKL. First, we compare FKL with single kernel and average kernel by the 10-fold CV and local LOOCV. Then, we compare FKL with two multiple kernels learning method by the 10-fold CV and local LOOCV.

Comparison with single kernel and average kernel

We compare the prediction performance of FKL with three single similarity kernels and an average similarity kernels by using 10-fold CV and local LOOCV methods. The experiments are remarked as following.

$$ {} \left\{ \begin{array}{lc} K_{1}^{m} ~~\& ~~K_{1}^{d} &K_{1}\\ K_{2}^{m} ~~\& ~~K_{2}^{d} &K_{2}\\ K_{3}^{m} ~~\& ~~K_{3}^{d} &K_{3}\\ avg\left(K_{1}^{m},K_{2}^{m},K_{3}^{m}\right) ~~\& ~~avg\left(K_{1}^{d},K_{2}^{d},K_{3}^{d}\right) &AVG\\ K_{m}^{*} ~~\& ~~K_{d}^{*} & FKL \end{array} \right. $$
(23)

The comparison results obtained by the 10-fold CV and local LOOCV are shown in Fig. 4.

Fig. 4
figure 4

The AUCs and AUPRs of five models by the 10-fold CV and local LOOCV. a The AUCs of five models by the 10-fold CV. b The AUPRs of five models by the 10-fold CV. c: The AUCs of five models by the local LOOCV. d The AUPRs of five models by the local LOOCV

In the 10-fold CV, The AUC of FKL is the highest among five curves, and the AUC difference between the FKL model and the K1 is slight but the difference in AUPR is obvious. Local LOOCV is a measure that can express model performance excellently when we handle a new disease not having known associations with miRNA. In Fig. 4, the AUC of average kernel is greater than FKL kernel. In the process of KFL, we need to find a optimized μ to weight kernels. Here, we get \(\mathcal {\mu }^{m}=\left (0.6610,0.3390,1.1562\times 10^{-9}\right)\) and \(\mathcal {\mu }^{d}=\left (1,9.1453\times 10^{-10},7.3854\times 10^{-10}\right)\), that is to say, the miRNA functional similarity kernel and the miRNA sequence similarity kernel are more important than GIP kernel similarity, and disease semantic similarity kernel is the most important in the three kernels. The model loses a part of information in the weighting process. However, a new disease not having any known association with miRNA needs more detail information from different aspects. The average kernel method satisfies this requirement of more detail informations. That is why the AUC of FKL model is lower than average kernel, but the AUPR of FKL model is higher than average kernel method. Moreover, AUPR can evaluate the classifier performance better when dealing with unbalanced dataset. Therefore, it demonstrates that the FKL model is most significant in all kinds of models.

Comparison with other multiple kernel learning methods

Several multiple kernel learning methods have been proposed to predict microRNA-disease associations, including Kronecker regularized least squares (KRLS) [39, 49] and kernelized Bayesian matrix factorization (KBMF) [32, 50]. We compare FKL with these two methods to integrate the similarity kernels to predict potential associations, respectively. Then, we use 10-fold CV and local LOOCV to evaluate performance of these three methods. The comparison results are shown in Fig. 5. In the 10-fold CV, it can be observed that the best AUC of 0.9584 and the best AUPR of 0.6431 are obtained by FKL. Comparing with KRLS, FKL achieves AUC improvement of 0.0162 (0.9584 over 0.9422) and AUPR improvement of 0.1201 (0.6431 over 0.5230). Comparing with KBMF, FKL achieves AUC improvement of 0.0598 (0.9584 over 0.8986) and AUPR improvement of 0.2005 (0.6431 over 0.4426). In local LOOCV, it can be observed that the best AUC of 0.8398 and the best AUPR of 0.2480 are also obtained by FKL. It shows that FKL is excellent at the aspect of uncovering associations between miRNAs and diseases.

Fig. 5
figure 5

The AUCs and AUPRs of three multiple kernel learning methods by the 10-fold CV. a The AUCs of three models by the 10-fold CV. b The AUPRs of three models by the 10-fold CV. c The AUCs of three models by the local LOOCV. d The AUPRs of three models by the local LOOCV

Comparison with matrix factorization

The matrix factorization (MF) methods are widely used for different bioinformatics applications, including Protein-Protein interactions (PPI) prediction, drug-target interaction (DTI) prediction, drug response prediction, and so on. Therefore, we compare sparse LapRLS with four MF methods, including Similarity-Regularized Matrix Factorization(SRMF) [51], Collaborative Matrix Factorization (CMF) [52], Neighborhood Regularized Logistic Matrix Factorization (NRLMF) [53] and Graph Regularized Matrix Factorization (GRMF) [54]. We use the same integrated similarity kernels and these five methods to predict potential associations, and adopt 10-fold CV to evaluate performance of different methods. The results are shown in Fig. 6. In 10-fold CV, it can be observed that the best AUC of 0.9584 and the best AUPR of 0.6431 are obtained by spa-LapRLS. In local LOOCV, it can be observed that the best AUC of 0.8398 and the best AUPR of 0.2480 are also obtained by sparse LapRLS. It demonstrates that sparse LapRLS is reliable for predicting miRNA-disease associations.

Fig. 6
figure 6

The comparison results between our method and other matrix factorization models by the 10-fold CV and local LOOCV. a The AUCs of five models by the 10-fold CV. b The AUPRs of five models by the 10-fold CV. c The AUCs of five models by the local LOOCV. d The AUPRs of five models by the local LOOCV

Comparison with other methods

We furtherly compare the performance of FKL-Spa-LapRLS with nine computational prediction models (i.e., PBMDA [36], MCMDA [31], MaxFlow, NCPMDA [26], WBSMDA [25], HDMP [17], RLSMDA [27], LRSSLMDA [28], HGIMDA [55]), and the comparisons are shown in Table 1. In the local LOOCV, FKL-Spa-LapRLS gets an AUC of 0.8398, which is slightly under performance of NCPMDA (0.8584) and LRSSLMDA (0.8418). However, in the global LOOCV, our method gets an AUC of 0.9563, which is significant superior to the result of other methods. In the 5-fold, FKL-Spa-LapRLS obtains an AUC of 0.9535, which also has a great outperformance than other methods. Therefore, FKL-Spa-LapRLS improves the prediction performance of disease-miRNA associations from different evaluation measures.

Table 1 The comparison results between our method and other nine computational models

Case studies

In this section, we study several important diseases to further validate the predictive power of our method. We utilize the known miRNA-disease associations included in HMDD to find the potential miRNA-disease associations not included in HMDD, and verify the predicted results though two independent databases (dbDEMC [56] and miR2Disease [9]). In fact, dbDEMC and miR2Disease are commonly utilized to be benchmark datasets for many models, such as PBMDA and LRSSLMDA. The dbDEMC database includes 2224 miRNAs, 36 cancer types and 20037 miRNA-disease associations by the high-throughput method, and our model predicts the top five disease, including Colon Neoplasms, Gastric Neoplasms, Pancreatic Neoplasms, Colorectal Neoplasms and Esophageal Neoplasms. Furthermore, in previous work, Kidney Neoplasms, Breast Neoplasms and Lymphoma were used to infer their underlying associated miRNAs. Therefore, we use case studies of eight diseases to analyze the performance of FKL-Spa-LapRLS in this section.

We implement two methods, global validation and local validation, to evaluate the predicted performance of our method in case studies. In global verification, 5430 known miRNA-disease associations in HMDD are used as a training set to discover the potential associations. For each disease, we extract top 50 candidate associations that can’t be covered by training set. And we get all of 400 candidate associations that are checked by dbDEMC and miR2Disease databases. In the local validation, all known associations that are related to a special disease are reset to unknown ones. We use other known associations as training set to discover the potential associations. we also extract top 50 candidate associations for this special disease. And we obtain all of 400 candidate associations that are checked by the HMDD, miR2Disease and dbDEMC databases.

The verification results of eight diseases are listed in Table 2. In Table 2, the global verification is the number of confirmed associations by dbDEMC and miR2Disease in top 50 miRNAs. And the local verification is the number of identified associations by HMDD, dbDEMC and miR2Disease. In Table 2, we can find that 47 of top 50 candidates are associated with lymphoma confirmed by global verification, and local verification confirms that all top 50 candidates are associated with breast and Colorectal Neoplasms.

Table 2 The verification results about eight neoplasms types

The results of case studies and some special miRNAs are shown in Figs. 7 and 8 (detail results in Additional files 1, 2, 3, 4, 5, 6, 7 and 8). The green lines are the confirmed miRNA-disease associations, the red lines are the unconfirmed miRNA-disease associations, the black nodes are the eight neoplasms, and the brown nodes are the predicted miRNAs associated with diseases. There are 400 associations in Fig. 7, and we can find that most of the miRNA-disease associations are confirmed by the global verification. In addition, there are many miRNAs that are only related to Breast Neoplasms but they have nothing to do with other diseases. And there are nine associations are unconfirmed. The reason is that of total 495 miRNAs in the training set, 202 have been linked to Breast Neoplasms, so there is a large possibility that the remaining miRNAs have no association with it. Similarly, there are 11 miRNAs related to Esophageal Neoplasms but not confirmed. The reason is that there are already 74 miRNAs associated with the Esophageal Neoplasms in the training set. On the other hand, there are a few unconfirmed miRNAs associated with other six diseases. In Fig. 7, we can see that hsa-mir-106b, hsa-mir-19b and hsa-mir-29c are associated with six out of eight diseases, and these miRNAs should be paid more attention to reveal more associations. Moreover, hsa-mir-1 and hsa-mir-29a are expected to be associated with five diseases out of eight diseases, but these associations still have not been verified by valid experiment. In Fig. 8, we can find that most of miRNAs work on various diseases. For a special disease with unknown associations with miRNAs, our method can reveal the miRNAs associated with it, and only 26 associations out of 400 cannot be confirmed by known experiments. These unconfirmed associations need to be paid more attention. Especially for hsa-let-7a, hsa-let-7b, hsa-mir-125b, hsa-mir-126, hsa-mir-145, hsa-mir-155, hsa-mir-181b, hsa-mir-20a, hsa-mir-21, hsa-mir-34a, hsa-mir-92a, these miRNAs are associated with all diseases. And we find that the related miRNAs among eight Neoplasms are highly similar. Therefore, it is very important to find more diseases related to these n11 miRNAs.

Fig. 7
figure 7

The case studies by global verification. The green lines are the confirmed candidate associations. The red lines are the unconfirmed candidate associations. The black nodes are the disease. The brown nodes are the candidate miRNAs. First class miRNA represents miRNA associated with multiple diseases. Second class miRNA represents miRNA associated with one disease. Third class miRNA represents important miRNA associated with more than six diseases

Fig. 8
figure 8

The case studies by local verification. The green lines are the confirmed candidate associations. The red lines are the unconfirmed candidate associations. The black nodes are the disease. The brown nodes are the candidate miRNAs. First class miRNA represents miRNA associated with multiple diseases. Second class miRNA represents miRNA associated with one disease. Third class miRNA represents important miRNA associated with more than six diseases

Conclusions

In this paper, we propose a FKL-Spa-LapRLS model to uncover potential miRNA-disease associations. We demonstrate that the KFL model is more importance than the average kernel method using 10-fold CV and local LOOCV, and the process of sparse kernal has a positive effect on noise elimination in similarity network. The LapRLS method contributes to accuracy of finding potential miRNA-disease associations.

FKL-Spa-LapRLS has been compared with nine prediction methods that have got excellent performance for prediction of miRNA-disease associations, including PBMDA, MCMDA, MaxFlow, NCPMDA, WBSMDA, HDMP, RLSMDA, LRSSLMDA and HGIMDA. FKL-Spa-LapRLS has the significantly highest accuracy in 5-fold CV and global LOOCV, albeit weakly lower than NCPMDA and LRSSLMDA in local LOOCV. To further analyze the performance of FKL-Spa-LapRLS, we implement case studies of eight Neoplasms. We find that 47 of top 50 candidates are confirmed to be associated with Lymphoma in global verification and all the top 50 candidates are confirmed to be associated with Breast and Colorectal Neoplasms in local verification, and some miRNAs need to be paid more attention.

Of course, FKL-Spa-LapRLS also have some limitations that need to be improved in the future. For example, our method needs more similarity kernels that are constructed by many information about gene-disease, disease-disease and miRNA-miRNA, and it would lose some detail information in the process of FKL when handling a new disease without the known associations with miRNAs.

Abbreviations

CMF:

Collaborative matrix factorization

CV:

Cross validation

FKL:

Fast kernel learning

GIP:

Gaussian interaction profile

GRMF:

Graph regularized matrix factorization

HMDD:

Human microRNA disease database

KBMF:

Kernelized Bayesian matrix factorization

KRLS:

Kronecker regularized least squares

LapRLS:

Laplacian regularized least squares

LLS:

Log likehood score

LOOCV:

Leave-one-out cross validation

NRLMF:

Neighborhood regularized logistic matrix factorization

SRMF:

Similarity-regularized matrix factorization

References

  1. Shi H, Zhang G, Zhou M, Cheng L, Yang H, Wang J, et al. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations. Plos ONE. 2016; 11(2):e0148521.

    Article  Google Scholar 

  2. Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, et al. Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods. Biomed Res Int. 2015; 2015(10):810514.

    PubMed  PubMed Central  Google Scholar 

  3. Yuan D, Cui X, Wang Y, Zhao Y, Li H, Hu S, et al. Enrichment Analysis Identifies Functional MicroRNA-Disease Associations in Humans. Plos ONE. 2015; 10(8):e0136285.

    Article  Google Scholar 

  4. Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microRNA-disease network: a survey. Brief Funct Genom. 2016; 15(1):55.

    CAS  Google Scholar 

  5. Zeng X, Liu L, Lu L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics. 2018; 34:2425–32.

    Article  Google Scholar 

  6. Zeng X, Zhang X, Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief Bioinform. 2016; 17(2):193.

    Article  CAS  Google Scholar 

  7. Mørk S, Pletscher-Frankild S, Palleja CA, Gorodkin J, Jensen LJ. Protein-driven inference of miRNA-disease associations. Bioinformatics. 2014; 30(3):392.

    Article  Google Scholar 

  8. Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, et al. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014; 42(Database issue):D1070.

    Article  CAS  Google Scholar 

  9. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009; 37(1):D98—104.

    PubMed  Google Scholar 

  10. Peng L, Peng M, Liao B, Huang G, Liang W, Li K. Improved low-rank matrix recovery method for predicting miRNA-disease association. Sci Rep. 2017; 7(1):6007.

    Article  Google Scholar 

  11. Luo J, Ding P, Liang C, Chen X. Semi-supervised prediction of human miRNA-disease association based on graph regularization framework in heterogeneous networks. Neurocomputing. 2018; 294:29–38.

    Article  Google Scholar 

  12. Zhao Q, Xie D, Liu H, Wang F, Yan GY, Chen X. SSCMDA: spy and super cluster strategy for MiRNA-disease association prediction. Oncotarget. 2018; 9(2):1826–42.

    PubMed  Google Scholar 

  13. Liu Y, Zeng X, He Z, Quan Z. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2016; PP(99):1–1.

    Google Scholar 

  14. Shi H, Xu J, Zhang G, Xu L, Li C, Wang L, et al. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013; 7(1):1–12.

    Article  CAS  Google Scholar 

  15. Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform. 2017; 66:194–203.

    Article  Google Scholar 

  16. Lan W, Wang J, Li M, Liu J, Wu FX, Pan Y. Predicting microRNA-disease associations based on improved microRNA and disease similarities. IEEE/ACM Trans Comput Biol Bioinform. 2016; PP(99):1–1.

    Google Scholar 

  17. Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, et al. Correction: Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors. Plos ONE. 2013; 8(9):e70204.

    Article  CAS  Google Scholar 

  18. Jiang Q, Hao Y, Wang G, Juan L, Zhang T, Teng M, et al. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010; 4(S1):S2.

    Article  Google Scholar 

  19. Pasquier C, Gardès J. Prediction of miRNA-disease associations with a vector space model. Sci Rep. 2016; 6:27036.

    Article  CAS  Google Scholar 

  20. Yu Q, Zhang H, Cheng L, Xiao D. KATZMDA: Prediction of miRNA-disease associations based on KATZ model. IEEE Access. 2017; PP(99):1–1.

    Article  Google Scholar 

  21. Nalluri JJ, Kamapantula BK, Barh D, Jain N, Bhattacharya A, Almeida SSD, et al. DISMIRA: Prioritization of disease candidates in miRNA-disease associations based on maximum weighted matching inference model and motif-based analysis. BMC Genom. 2015; 16 Suppl 5(S5):S12.

    Article  Google Scholar 

  22. Liao B, Ding S, Chen H, Li Z, Cai L. Identifying human microRNA–disease associations by a new diffusion-based method. J Bioinform Comput Biol. 2015; 13(04):1550014.

    Article  CAS  Google Scholar 

  23. Zeng X, Liao Y, Liu Y, Zou Q. Prediction and Validation of Disease Genes Using HeteSim Scores. IEEE/ACM Trans Comput Biol Bioinform. 2016; 99:1–1.

    Google Scholar 

  24. Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA–disease associations. Mol BioSyst. 2012; 8(10):2792.

    Article  CAS  Google Scholar 

  25. Chen X, Yan CC, Zhang X, You ZH, Deng L, Liu Y, et al. WBSMDA: Within and Between Score for MiRNA-Disease Association prediction. Sci Rep. 2016; 6:21106.

    Article  CAS  Google Scholar 

  26. Gu C, Bo L, Li X, Li K. Network Consistency Projection for Human miRNA-Disease Associations Inference. Sci Rep. 2016; 6:36054.

    Article  CAS  Google Scholar 

  27. Chen X, Yan GY. Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep. 2014; 4:5501.

    Article  CAS  Google Scholar 

  28. Chen X, Huang L. LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction. Plos Comput Biol. 2017; 13(12):e1005912.

    Article  Google Scholar 

  29. Fu L, Peng Q. A deep ensemble model to predict miRNA-disease association. Sci Rep. 2017; 7(1):14482.

    Article  Google Scholar 

  30. Jiang Q, Wang G, Zhang T, Wang Y. Predicting human microRNA-disease associations based on support vector machine. Int J Data Min Bioinform. 2011; 8(3):282–93.

    Article  Google Scholar 

  31. Li JQ, Rong ZH, Chen X, Yan GY, You ZH. MCMDA: Matrix completion for MiRNA-disease association prediction. Oncotarget. 2017; 8(13):21187.

    PubMed  PubMed Central  Google Scholar 

  32. Lan W, Wang J, Li M, Liu J, Pan Y. Predicting microRNA-disease associations by integrating multiple biological information. In: IEEE International Conference on Bioinformatics and Biomedicine. Bioinformatics and Biomedicine: 2015. p. 183–8.

  33. Zeng X, Ding N, Rodríguez-Patón A, Quan Z. Probability-based collaborative filtering model for predicting gene–disease associations. BMC Med Genomics. 2017; 10(5):76.

    Article  Google Scholar 

  34. Chen X, Guan NN, Li JQ, Yan GY. GIMDA: Graphlet interaction-based MiRNA-disease association prediction. J Cel Mol Med. 2018; 22(3):1548–61.

    Article  CAS  Google Scholar 

  35. Shen Z, Zhang YH, Han K, Nandi AK, Honig B, Huang DS. miRNA-Disease Association Prediction with Collaborative Matrix Factorization. Complexity. 2017; 2017(9):1–9.

    Article  Google Scholar 

  36. You ZH, Huang ZA, Zhu Z, Yan GY, Li ZW, Wen Z, et al. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. Plos Comput Biol. 2017; 13(3):e1005455.

    Article  Google Scholar 

  37. You ZH, Wang LP, Chen X, Zhang S, Li XF, Yan GY, et al. PRMDA: personalized recommendation-based MiRNA-disease association prediction. Oncotarget. 2017; 8(49):85568–83.

    Article  Google Scholar 

  38. Peng L, Chen Y, Ma N, Chen X. NARRMDA: negative-aware and rating-based recommendation algorithm for miRNA-disease association prediction. Mol BioSyst. 2017; 13:2650–59.

    Article  CAS  Google Scholar 

  39. Chen X, Niu YW, Wang GH, Yan GY. MKRMDA: multiple kernel learning-based Kronecker regularized least squares for MiRNA–disease association prediction. J Transl Med. 2017; 15(1):251.

    Article  Google Scholar 

  40. Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010; 26(13):1644–50.

    Article  CAS  Google Scholar 

  41. Kozomara A, Griffithsjones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014; 42(Database issue):D68.

    Article  CAS  Google Scholar 

  42. Chen X, Niu YW, Wang GH, Yan GY. HAMDA: Hybrid Approach for MiRNA-Disease Association prediction. J Biomed Inform. 2017; 76:50–58.

    Article  CAS  Google Scholar 

  43. Lowe HJ, Barnett GO. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama. 1994; 271(14):1103–8.

    Article  CAS  Google Scholar 

  44. Luo J, Xiao Q, Liang C, Ding P. Predicting MicroRNA-Disease Associations Using Kronecker Regularized Least Squares Based on Heterogeneous Omics Data. IEEE Access. 2017; 5(99):2503–13.

    Article  Google Scholar 

  45. Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011; 21(7):1109.

    Article  CAS  Google Scholar 

  46. Liang C, Wang G, Li J, Zhang T, Xu P, Wang Y. SIDD: A Semantically Integrated Database towards a Global View of Human Disease. Plos ONE. 2013; 8(10):e75504.

    Article  Google Scholar 

  47. He J, Chang SF, Xie L. Fast kernel learning for spatial pyramid matching: Computer Vision and Pattern Recognition; 2008, pp. 1–7.

  48. Xia Z, Zhou X, Sun Y, Wu LY. Semi-supervised Drug-Protein Interaction Prediction from Heterogeneous Spaces, Vol. 4; 2010. p. S6.

  49. Nascimento ACA, Prudencio RBC, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics. 2016; 17(1):46.

    Article  Google Scholar 

  50. Gonen M, Kaski S. Kernelized Bayesian Matrix Factorization. IEEE Trans Pattern Anal Mach Intell. 2014; 36(10):2047–60.

    Article  Google Scholar 

  51. Wang L, Li X, Zhang L, Gao Q. Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC Cancer. 2017; 17(1):513.

    Article  CAS  Google Scholar 

  52. Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining: 2013. p. 1025–33.

  53. Liu Y, Wu M, Miao C, Zhao P, Li X. Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction. PLoS Comput Biol. 2016; 12(2):e1004760.

    Article  Google Scholar 

  54. Ezzat A, Zhao P, Wu M, Li X, Kwoh CK. Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization. IEEE/ACM Trans Comput Biol Bioinform. 2017; 14(3):646–56.

    Article  CAS  Google Scholar 

  55. Chen X, Yan CC, Zhang X, You ZH, Huang YA, Yan GY. HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction. Oncotarget. 2016; 7(40):65257–69.

    PubMed  PubMed Central  Google Scholar 

  56. Yang Z, Ren F, Liu C, He S, Sun G, Gao Q, et al. dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genomics. 2010; 11(Suppl 4):1–8.

    Article  Google Scholar 

Download references

Acknowledgements

Authors would like to thank the reviewers for their helpful comments on the original manuscript. Authors are grateful to the conference committee of The 29th International Conference on Genome Informatics (GIW 2018).

Funding

This work is supported by a grant from the National Science Foundation of China (NSFC 61772362) and the Tianjin Research Program of Application Foundation and Advanced Technology (16JCQNJC00200). Publication costs are funded by the NSFC 61772362.

Availability of data and materials

The code and all supporting data files are available from https://github.com/guofei-tju/FKL-Spa-LapRLS.

About this supplement

This article has been published as part of BMC Genomics Volume 19 Supplement 10, 2018: Proceedings of the 29th International Conference on Genome Informatics (GIW 2018): genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-19-supplement-9.

Author information

Authors and Affiliations

Authors

Contributions

FG, YD and LJ conceived and designed the experiments; LJ performed the experiments and analyzed the data; YX wrote the paper. FG and JT supervised the experiments and reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fei Guo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1

Table S1. The top 50 predicted miRNAs related to Colon Neoplasms. (XLSX 11 kb)

Additional file 2

Table S2. The top 50 predicted miRNAs related to Gastric Neoplasms. (XLSX 10 kb)

Additional file 3

Table S3. The top 50 predicted miRNAs related to Pancreatic Neoplasms. (XLSX 11 kb)

Additional file 4

Table S4. The top 50 predicted miRNAs related to Colorectal Neoplasms. (XLSX 10 kb)

Additional file 5

Table S5. The top 50 predicted miRNAs related to Esophageal Neoplasms. (XLSX 11 kb)

Additional file 6

Table S6. The top 50 predicted miRNAs related to Kidney Neoplasms. (XLSX 10 kb)

Additional file 7

Table S7. The top 50 predicted miRNAs related to Breast Neoplasms. (XLSX 11 kb)

Additional file 8

Table S8. The top 50 predicted miRNAs related to Lymphoma. (XLSX 11 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, L., Xiao, Y., Ding, Y. et al. FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association. BMC Genomics 19 (Suppl 10), 911 (2018). https://doi.org/10.1186/s12864-018-5273-x

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s12864-018-5273-x

Keywords