FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association

Background In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations. Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations. Result First, we extract three miRNA similarity kernels and three disease similarity kernels. Then, we combine these kernels into a single kernel through the Fast Kernel Learning (FKL) model, and use sparse kernel (Spa) to eliminate noise in the integrated similarity kernel. Finally, we find the associations via Laplacian Regularized Least Squares (LapRLS). Based on three evaluation methods, global and local leave-one-out cross validation (LOOCV), and 5-fold cross validation, the AUCs of our method achieve 0.9563, 0.8398 and 0.9535, thus it can be seen that our method is reliable. Then, we use case studies of eight neoplasms to further analyze the performance of our method. We find that most of the predicted miRNA-disease associations are confirmed by previous traditional experiments, and some important miRNAs should be paid more attention, which uncover more associations of various neoplasms than other miRNAs. Conclusions Our proposed model can reveal miRNA-disease associations and improve the accuracy of correlation prediction for various diseases. Our method can be also easily extended with more similarity kernels. Electronic supplementary material The online version of this article (10.1186/s12864-018-5273-x) contains supplementary material, which is available to authorized users.


Background
MicroRNAs (miRNAs) are some of non-coding RNAs with 20 ∼ 25 nucleotides [1]. In the process of post-transcription, miRNAs are a part of messenger RNA (mRNA) sequences and affect protein synthesis [2][3][4]. Some previous studies have proved that miR-NAs are related to various diseases including cancers. For example, the expression level of hsa-mir-21 leads Human MicroRNA Disease Database (HMDD) [8] collects 572 miRNAs, 378 Disease and 10368 miRNAdisease associations. The miR2Disease [9] includes 349 miRNAs, 163 disease and 3273 miRNA-disease associations. The dbDEMC contains of 2224 miRNAs, 36 cancer types and 20037 miRNA-disease associations through the high-throughput methods. Thus, these associations promote the development of the computing methods.
Up to now, it has achieved excellent performance that people find the potential disease-miRNA associations by the computational methods [10][11][12][13][14]. Most of these methods are based on the assumption that miRNAs with high similarity apt to be related with similar diseases and vice versa [15,16]. Xuan et al. [17] proposed HDMP that achieves a score for one miRNA by weighting k most similar neighbors, and a larger score has higher possibility to associate with a specific disease, but HDMP can't work for a new disease without known related miRNAs. Jiang et al. [18] devised a hypergeometric distribution-based model to calculate the score of each miRNA for a specific disease, and the miRNA with larger score tend to cause this disease. Scores of above two methods are based on miRNA neighbor information, which ignores entire informations of miRNA similarity network. Many models find miRNAdisease associations based on the similarity networks [19][20][21][22][23]. Chen et al. developed the RWRMDA model [24], which uses the information of miRNA functional similarity network and known miRNA-disease association network, and utilizes the random walk model to find the potential miRNA-disease association. However, RWR-MDA is faced with the same problem as HDMP, because of the initial nonzero vector. Therefore, Chen et al. [25] proposed WBSMDA to find the potential association by integrating the miRNA functional similarity network, disease semantic similarity and known miRNA-disease association network. For the similarity between two miR-NAs/diseases, WBSMDA integrates Gaussian Interaction Profile (GIP) kernel similarity for miRNA and disease, and calculates the association probability for miRNA-disease pair using Within-Score and Between-Score of disease and miRNA. Gu et al. [26] developed NCPMDA by constructing novel similarity kernel for miRNA and disease via the matrix operation and calculating the space projection scores of miRNA and disease. The final score between miRNA and disease is calculated by combining two space projection scores. The predictive performance of NCPMDA is superior over the previous methods when working for a disease without any known related miRNAs [13].
Many previous models are based on defining a cost function and minimizing this cost function. Chen et al. [27] developed RLSMDA, a semi-supervised method, which minimizes the Regularized Least Squares cost function and uncovers the potential miRNAs associated with various diseases. After that, Chen et al. [28] proposed LRSSLMDA, which is used to reveal the potential association between miRNA and disease. LRSSLMDA constructs comprehensive statistical features and graph theoretic features by combining the miRNA and disease similarity kernels. Then, Laplacian regularization term is used to add objective function. Experimental results demonstrate that LRSSLMDA is a valuable computational model. In addition, many previous methods are based on machine learning algorithms [29,30], matrix completion [31][32][33] and graph theory [34]. For example, Shen et al. [35] proposed CMFMDA that uses WKNKN to estimate association probability for unknown associations between miRNA and disease, and uses Collaborative Matrix Factorization to uncover the potential association. You et al. [36] developed PBMDA that constructs a heterogeneous graph by integrating five networks, gets all scores of paths for a miRNA-disease pair, and calculates the miRNA-disease association possibility through the sum of all path score. PBMDA gets a remarkable performance to find the potential miRNA-disease association.
All above methods have achieved remarkable results, but there are still different limitations or restrictions. For example, most of the existing methods are based on the assumption that miRNAs with high similarity apt to be related with similar diseases. About constructing miRNA and disease similarity kernel, most researches use the functional similarity and GIP kernel similarity for miRNA, and use the semantic similarity and GIP kernel similarity for disease. To integrate two similarity kernels, lots of works only tend to accumulate or average [29,37,38]. Therefore, there is an urgent need to propose an effective method for integrating multiple miRNA and disease similarity kernels [39].
In this paper, we firstly extract the miRNA functional similarity, the miRNA sequence similarity and GIP kernel similarity for miRNA, and the disease semantic similarity, disease functional similarity and GIP kernel similarity for disease. Then, we use the Fast Kernel Learning method to construct one miRNA similarity kernel and one disease similarity kernel. Finally, we propose a novel Sparse Laplacian Regularized Least Squares method to uncover the miRNA-disease association. Here, three evaluation methods are used to assess performance, including global Leave-One-Out Cross Validation (global LOOCV), local Leave-One-Out Cross Validation (local LOOCV) and 5-fold cross validation (5-fold CV). In these three evaluation methods, our method obtains the remarkable performance (AUCs of 0.9563, 0.8398 and 0.9535, respectively) compared with other nine models. And also, we use case studies of eight Neoplasms for further analyzing the performance of our method. We find that 47 of top 50 candidates are confirmed to have associations with Lymphoma in global verification, and all top 50 candidates are confirmed to have associations with Breast and Colorectal Neoplasms in local verification. Moreover, we find that some of the miRNAs need to be paid more attention to uncover more associations with various neoplasms, including hsa-mir-106b, hsa-mir-19b, hsa-mir-29c, hsamir-1, hsa-mir-29a and so on.

Methods
We firstly use three miRNA similarity kernels and three disease similarity kernels to uncover potential miRNAdisease associations, respectively. Then, we combine these similarity kernels into a miRNA similarity kernel and a disease similarity kernel using Fast Kernel Learning, and sparse two similarity kernels after combination. Finally, we use Laplacian Regularized Least Squares to construct a loss function and get predicted association matrix from miRNA and disease space, respectively. Figure 1 is the flow chart of our method.

Human miRNA-disease associations dataset
In this paper, the set of miRNAs is denoted by , and the set of diseases is denoted by D = d j n j=1 , where m and n are the numbers of miR-NAs and diseases respectively. The associations between miRNAs and diseases can be downloaded from HMDD database, which include 5430 associations between 495 miRNAs and 383 diseases. The associations are represented by a binary matrix Y ∈ R m×n , where y i,j ∈ {0, 1}. if a miRNA m i is association with a disease d j , y i,j is set to 1; otherwise, y i,j is set to 0;

MiRNA similarity
Basing on the assumption that miRNAs with high similarity tend to be associated with the same disease, we extract three classes of miRNA similarity, including functional

MiRNA functional similarity
In the previous works, the MISIM method [40] proposed by Cui et al calculated the score of miRNA functional similarity. We extract 495 functional similarity score through MISIM and construct kernel K m 1 ∈ R m×m to represent the miRNA functional similarity network, in which K m 1 (m i , m j ) is the functional similarity score between miR-NAs m i and m j .

MiRNA sequence similarity
All 495 miRNA sequences are downloaded from miRBase database [41]. We extract miRNA sequence similarity using the Needleman-Wunsch Algorithm and get kernel K m 2 ∈ R m×m to represent the miRNA similarity of sequence network, in which K m 2 (m i , m j ) is the similarity of sequence score between miRNA m i and m j .

GIP kernel similarity for miRNAs
GIP the kernel similarity [29,38,42] between miRNAs m i and m j is denoted as K m 3 ∈ R m×m and the calculation method is as Eq. (1) where IP(m i ) ∈ R 1×n denotes the interaction profiles of miRNA m i by observing whether miRNA m i is associated with each disease or not, that is to say, the i-th row of the associations matrix Y ; γ m is used for kernel bandwidth control, which is set to − 1 in this paper.

Disease similarity
We extract three classes of disease similarity, including semantic similarity, functional similarity and GIP kernel similarity.

Disease semantic similarity
In the previous research [37,40], disease d(i) can be described as a node in Directed Acyclic Graph(DAG) based on the MeSH [43] database (https://www.nlm. nih.gov/bsd/disted/meshtutorial/themeshdatabase/), and denoted as where is the semantic contribution factor, which is set to 0.5 in this paper.
Then, we define the semantic score of disease d i by Eq. (3).
Therefore, we denote the disease semantic similarity as K d 1 ∈ R n×n and the disease semantic similarity value between d i and d j is calculated by Eq. (4).

Disease functional similarity
The associations between disease-gene and gene-gene are widely used to understand disease similarity [44]. From the HumanNet [45] database, we download the interactions of genes and one interaction has an log likehood score (LLS) that measure the probability of a functional linkage between genes. The LLS scores are normalized by Eq. (5) where LLS(g i , g j ) represents LLS between the i-th and j-th genes; LLS * (g i , g j ) represents the LLS score after normalization; LLS min and LLS max indicate the minimum and maximum LLS scores in HumanNet respectively. The functional similarity score between two genes is defined as Eq. (6) where S HumanNET indicates the gene-gene associations in the HumanNet database; e(i, j) indicates the association between i-th and j-th genes. Then, the functional similarity score between a gene g and a gene set G is defined as Eq. (7).
In many cases, a disease d i is related to many genes, which is defined as gene set G i , the associations between disease and genes are download from SIDD [46]. The disease functional similarity score is defined as Eq. (8)

GIP kernel similarity for diseases
Similar to calculation of GIP kernel similarity for miRNA, GIP kernel similarity for disease is denoted as K d 3 ∈ R n×n , calculated as Eq. (9).
where IP(d i ) ∈ R m×1 denotes the interaction profiles of disease d i by observing whether disease d i is associated with each miRNA or not, that is to say, the i-th column of the associations matrix Y ; γ d is used for kernel bandwidth control, which is set to − 1 in this paper.

Fast kernel learning
Considering that a single similarity kernel cannot cover all information between miRNAs, we integrate K m 1 , K m 2 for K m 3 to get a new integrated similarity kernel K m ∈ R m×m using the method of Fast Kernel Learning (FKL) [47]. We define K m as Eq. (10).
It is believed that K m should be close to the associations metrix Y. We define the miRNAs associations similarity as Eq. (11).
Therefore, we would like to find μ m ∈ R 3×1 using the following Eq. (12) to minimize the distance between K m and Y m .
To avoid overfitting in learning procedure, a regularization term should be added to equation as Eq. (13).
where λ m is set to 200 in this paper. We use the matlab R2017a CVX to solve this optimization problem and obtain the integrate parameter μ m ∈ R 1×3 for miRNA functional similarity, miRNA sequence similarity and GIP kernel similarity. Therefore, the integrated miRNA similarity kernel is defined as Eq. (14).
Similarly, we obtain the integrate parameter μ d ∈ R 1×3 for disease semantic similarity, disease functional similarity and GIP kernel similarity by FKL, and the integrated disease similarity kernel is defined as Eq. (15).

Laplacian regularized least squares
Given the similarity kernels of miRNAs and diseases, we use Sparse Laplacian Regularized Least Squares (Spa-LapRLS) to get a new association matrix, and find potential miRNA-disease associations. It includes Sparse kernel model and LapRLS model.

Sparse kernel model
We use a Top-k Neighbor model to reduce noise in integrated similarity kernel. For the miRNA subspace, we construct a weight matrix w m ∈ R m×m for K m , whose elements are defined as Eq. (16), by the Top-k Neighbor method.
where k satisfies condition 0 < k < m; T(k, i) represents the k-th largest element of the i-th row in K m and T(k, j) represents the k-th largest element of the j-th column in K m . Therefore, we record the denoised miRNA similarity kernel as Eq. (17) Similarity, we also calculate the denoised disease similarity kernel as K * d ∈ R n×n .

LapRLS for miRNA-disease interaction prediction
Given a pair of similarity kernels for miRNA K * m and disease K * d , we first use the Least Squares on the two subspace, and add Laplacian Regularization term to avoid overfitting. For miRNA subspace, the objective function of LapRLS [48] is defined as Eq. (18) min where F m = K * m α m ∈ R m×n is the predictive association matrix from miRNA; j); β m is the regularization coefficients, which is set to 2 −5 in this paper; α m is renewed by the function Eq. (19) in [48].
The derivation of the optimization algorithm are presented in [48]. In this way, the predicted associations matrix for all miRNA-disease pairs from the view of miRNAs are obtained as Eq. (20).
Similarly, we can get the predicted associations matrix for all miRNA-disease pairs from the view of miRNAs as Eq. (21) where F d = K * d α d ∈ R n×m ; β d is the regularization coefficients, which is set to 2 −5 in this paper.
In the end, the predicted associations matrix from the view of miRNA and disease is defined as Eq. (22) where F * ∈ R m×n .

Results and discussion
In this section, we study the performance of our method from different aspects on prediction of unknown miRNAdisease associations. First, we establish three evaluation methods and two assessment indicators to evaluate the accuracy of our method. Second, we analyze the performance of our method with different parameters by using 10-fold CV and local LOOCV. Third, we employ 10-fold CV and local LOOCV to analyze the performance of the FKL model. Fourth, we compare the performance of LapRLS with multiple matrix factorization method. Fifth, we compare the performance of FKL-Spa-LapRLS with nine outstanding methods. Finally, for a further validation, we implement the global and local verifications on eight neoplasms for case studies.

Evaluation criteria
In this paper, we implement 10-fold CV, global LOOCV and local LOOCV to evaluate the prediction accuracy of our method. In the 10-fold CV, all miRNA-disease associations are randomly divided into ten uncrossed groups, one of which is regarded as test set and the other nine groups are used for training set in turns. In the global LOOCV, all 5430 miRNA-disease verified associations are regarded as objective research sample, and each association is left in turns served as a test sample and other known associations are regarded as training sample. In the local LOOCV, only considering miRNAs for a specific disease, for disease d(i), each miRNA related to d(i) is left out as test set, and other associations are regarded as training set. All the miRNA-disease associations in test set are reseted as 0 in the association matrix Y.
In our study, we use Area Under Curve (AUC) and Area Under the Precision-Recall curve (AUPR) to establish the assessment criteria for method prediction. AUC is the area under the receiver operating characteristic (ROC) created by plotting true positive rate against false positive rate at various threshold settings. An AUC value of 1 indicates perfect performance and an AUC of 0.5 indicates random performance. AUPR is the area under the curve created by plotting precision against recall at various threshold setting. The greater the value of AUPR, the better performance of the model.

Parameter selection
In this section, we use 10-fold CV and local LOOCV to analyze several parameters, including γ m , γ d , λ m , λ d , β m , β d and k value.
The γ m and γ d are the parameters in the process of constructing GIP kernel similarity for miRNA and diseases, respectively. We just use GIP kernel similarity to predict potential miRNA-disease associations and use 10-fold CV to evaluate performance of GIP kernel with different parameters. Then, we take γ m and γ d from − 10 to 10 with step 1 and calculate AUCs, respectively. The results are shown in Fig. 2a. It shows that the performance of GIP similarity kernel is sensitive to γ m and γ d , and the optimal AUC is obtained when γ m and γ d equal to 0. However, the K m,3 and K d, 3 are matrices with ones in all elements according to Eqs. (1) and (9) when two parameters equal to 0. Therefore, we adopt suboptimal γ m = −1 and γ d = −1 in this paper. Since most of elements in GIP similarity kernel are more than 1, we need to normalize GIP similarity kernel before integrating multiple kernels.
The λ m and λ d are the regularization coefficients of FKL. We use different λ m and λ d to integrate three miRNA similarity kernels and three disease similarity kernels, respectively. Then we use integrated similarity kernel and LapRLS to uncover potential associations and use 10-fold CV to evaluate performance of FKL with different parameters. The λ m and λ d are gradually varying from 0 to 15000 with step 100 in order to find the best value. The results are shown in Fig. 2b. It can be found that AUC keeps small fluctuation in the range between 0 to 15000. It demonstrates that FKL is insensitive to regularization coefficient. So, λ m and λ d are set to 200 in this paper.
The β m and β d are the regularization coefficients of LapRLS. We take β m and β d from 2 −10 to 2 10 , respectively. We adopt 10-fold CV to evaluate performance of LapRLS with different parameters. The results are shown in Fig. 2c. It can be found that AUC keeps small fluctuation in the range between 2 −10 to 2 −2 , and AUC has obvious change when β m and β d greater than 2 −2 . We select the optimal β m and β d by the highest AUC value and set β m and β d as 2 −5 in this paper.
Meanwhile, k value in the process of sparse kernel is an important parameter in this paper. We use 10-fold CV and local LOOCV to analyze k value. The value of k is taken from 20 to 250 with step 5, are shown in Fig. 3. It can be clearly seen that the process of sparse kernel has positive effect on the discovery of potential miRNA-disease associations. In this study, k value is set to 20 in the 10-fold CV and global LOOCV, and is set to 40 in the local LOOCV.

FKL performance analysis
In this section, we analyze the performance of FKL. First, we compare FKL with single kernel and average kernel by the 10-fold CV and local LOOCV. Then, we compare FKL with two multiple kernels learning method by the 10-fold CV and local LOOCV.

Comparison with single kernel and average kernel
We compare the prediction performance of FKL with three single similarity kernels and an average similarity kernels by using 10-fold CV and local LOOCV methods. The experiments are remarked as following.
The comparison results obtained by the 10-fold CV and local LOOCV are shown in Fig. 4.
In the 10-fold CV, The AUC of FKL is the highest among five curves, and the AUC difference between the FKL model and the K 1 is slight but the difference in AUPR is obvious. Local LOOCV is a measure that can express model performance excellently when we handle a new disease not having known associations with miRNA. In Fig. 4, the AUC of average kernel is greater than FKL kernel. In the process of KFL, we need to find a optimized μ to weight kernels. Here, we get μ m = 0.6610, 0.3390, 1.1562 × 10 −9 and μ d = 1, 9.1453 × 10 −10 , 7.3854 × 10 −10 , that is to say, the miRNA functional similarity kernel and the miRNA sequence similarity kernel are more important than GIP kernel similarity, and disease semantic similarity kernel is the most important in the three kernels. The model loses a part of information in the weighting process. However, a new disease not having any known association a b with miRNA needs more detail information from different aspects. The average kernel method satisfies this requirement of more detail informations. That is why the AUC of FKL model is lower than average kernel, but the AUPR of FKL model is higher than average kernel method. Moreover, AUPR can evaluate the classifier performance better when dealing with unbalanced dataset. Therefore, it demonstrates that the FKL model is most significant in all kinds of models.

Comparison with other multiple kernel learning methods
Several multiple kernel learning methods have been proposed to predict microRNA-disease associations, including Kronecker regularized least squares (KRLS) [39,49] and kernelized Bayesian matrix factorization (KBMF) [32,50]. We compare FKL with these two methods to integrate the similarity kernels to predict potential associations, respectively. Then, we use 10-fold CV and local LOOCV to evaluate performance of these three methods. The comparison results are shown in Fig. 5 In local LOOCV, it can be observed that the best AUC of 0.8398 and the best AUPR of 0.2480 are also obtained by FKL. It shows that FKL is excellent at the aspect of uncovering associations between miRNAs and diseases.

Comparison with matrix factorization
The matrix factorization (MF) methods are widely used for different bioinformatics applications, including Protein-Protein interactions (PPI) prediction, drug-target interaction (DTI) prediction, drug response prediction, and so on. Therefore, we compare sparse LapRLS with four MF methods, including Similarity-Regularized Matrix Factorization(SRMF) [51], Collaborative Matrix Factorization (CMF) [52], Neighborhood Regularized Logistic Matrix Factorization (NRLMF) [53] and Graph Regularized Matrix Factorization (GRMF) [54]. We use the same integrated similarity kernels and these five methods to predict potential associations, and adopt 10-fold CV to evaluate performance of different methods. The results are shown in Fig. 6. In 10-fold CV, it can be observed that the best AUC of 0.9584 and the best AUPR of 0.6431 are obtained by spa-LapRLS. In local LOOCV, it can be observed that the best AUC of 0.8398 and the best AUPR of 0.2480 are also obtained by sparse LapRLS. It demonstrates that sparse LapRLS is reliable for predicting miRNA-disease associations.

Case studies
In this section, we study several important diseases to further validate the predictive power of our method. We utilize the known miRNA-disease associations included in HMDD to find the potential miRNA-disease associations not included in HMDD, and verify the predicted results though two independent databases (dbDEMC [56] and miR2Disease [9]). In fact, dbDEMC and miR2Disease are commonly utilized to be benchmark datasets for many models, such as PBMDA and LRSSLMDA. The dbDEMC database includes 2224 miRNAs, 36 cancer types and 20037 miRNA-disease associations by the highthroughput method, and our model predicts the top five disease, including Colon Neoplasms, Gastric Neoplasms, Pancreatic Neoplasms, Colorectal Neoplasms and Esophageal Neoplasms. Furthermore, in previous work, Kidney Neoplasms, Breast Neoplasms and Lymphoma were used to infer their underlying associated miRNAs. Therefore, we use case studies of eight diseases to analyze the performance of FKL-Spa-LapRLS in this section. We implement two methods, global validation and local validation, to evaluate the predicted performance of our method in case studies. In global verification, 5430 known miRNA-disease associations in HMDD are used as a training set to discover the potential associations. For each a b c d Fig. 5 The AUCs and AUPRs of three multiple kernel learning methods by the 10-fold CV. a The AUCs of three models by the 10-fold CV. b The AUPRs of three models by the 10-fold CV. c The AUCs of three models by the local LOOCV. d The AUPRs of three models by the local LOOCV disease, we extract top 50 candidate associations that can't be covered by training set. And we get all of 400 candidate associations that are checked by dbDEMC and miR2Disease databases. In the local validation, all known associations that are related to a special disease are reset to unknown ones. We use other known associations as training set to discover the potential associations. we also extract top 50 candidate associations for this special disease. And we obtain all of 400 candidate associations that are checked by the HMDD, miR2Disease and dbDEMC databases.
The verification results of eight diseases are listed in Table 2. In Table 2, the global verification is the number of confirmed associations by dbDEMC and miR2Disease in top 50 miRNAs. And the local verification is the number of identified associations by HMDD, dbDEMC and miR2Disease. In Table 2, we can find that 47 of top 50 candidates are associated with lymphoma confirmed by global verification, and local verification confirms that all top 50 candidates are associated with breast and Colorectal Neoplasms.
The results of case studies and some special miRNAs are shown in Figs. 7 and 8 (detail results in Additional files 1, 2, 3, 4, 5, 6, 7 and 8). The green lines are the confirmed miRNA-disease associations, the red lines are the unconfirmed miRNA-disease associations, the black nodes are the eight neoplasms, and the brown nodes are the predicted miRNAs associated with diseases. There are 400 associations in Fig. 7, and we can find that most of the miRNA-disease associations are confirmed by the global verification. In addition, there are many miRNAs that are only related to Breast Neoplasms but they have a b c d Fig. 6 The comparison results between our method and other matrix factorization models by the 10-fold CV and local LOOCV. a The AUCs of five models by the 10-fold CV. b The AUPRs of five models by the 10-fold CV. c The AUCs of five models by the local LOOCV. d The AUPRs of five models by the local LOOCV   Fig. 7, we can see that hsa-mir-106b, hsa-mir-19b and hsa-mir-29c are associated with six out of eight diseases, and these miRNAs should be paid more attention to reveal more associations. Moreover, hsa-mir-1 and hsa-mir-29a are expected to be associated with five diseases out of eight diseases, but these associations still have not been verified by valid experiment. In Fig. 8, we can find that most of miRNAs work on various diseases. For a special disease with unknown associations with miRNAs, our method can reveal the miRNAs associated with it, and only 26 associations out of 400 cannot be confirmed by known experiments. These unconfirmed associations need to be paid more attention. Especially for hsa-let-7a, hsa-let-7b, hsa-mir-125b, hsa-mir-126, hsa-mir-145, hsa-mir-155, hsa-mir-181b, hsa-mir-20a, hsa-mir-21, hsamir-34a, hsa-mir-92a, these miRNAs are associated with all diseases. And we find that the related miRNAs among eight Neoplasms are highly similar. Therefore, it is very important to find more diseases related to these n11 miRNAs.

Conclusions
In this paper, we propose a FKL-Spa-LapRLS model to uncover potential miRNA-disease associations. We demonstrate that the KFL model is more importance than the average kernel method using 10-fold CV and local LOOCV, and the process of sparse kernal has a positive effect on noise elimination in similarity network. The LapRLS method contributes to accuracy of finding potential miRNA-disease associations. FKL-Spa-LapRLS has been compared with nine prediction methods that have got excellent performance Fig. 8 The case studies by local verification. The green lines are the confirmed candidate associations. The red lines are the unconfirmed candidate associations. The black nodes are the disease. The brown nodes are the candidate miRNAs. First class miRNA represents miRNA associated with multiple diseases. Second class miRNA represents miRNA associated with one disease. Third class miRNA represents important miRNA associated with more than six diseases for prediction of miRNA-disease associations, including PBMDA, MCMDA, MaxFlow, NCPMDA, WBSMDA, HDMP, RLSMDA, LRSSLMDA and HGIMDA. FKL-Spa-LapRLS has the significantly highest accuracy in 5-fold CV and global LOOCV, albeit weakly lower than NCPMDA and LRSSLMDA in local LOOCV. To further analyze the performance of FKL-Spa-LapRLS, we implement case studies of eight Neoplasms. We find that 47 of top 50 candidates are confirmed to be associated with Lymphoma in global verification and all the top 50 candidates are confirmed to be associated with Breast and Colorectal Neoplasms in local verification, and some miRNAs need to be paid more attention.
Of course, FKL-Spa-LapRLS also have some limitations that need to be improved in the future. For example, our method needs more similarity kernels that are constructed by many information about gene-disease, disease-disease and miRNA-miRNA, and it would lose some detail information in the process of FKL when handling a new disease without the known associations with miRNAs.