Identification of miRNA-mRNA regulatory modules by exploring collective group relationships

Background microRNAs (miRNAs) play an essential role in the post-transcriptional gene regulation in plants and animals. They regulate a wide range of biological processes by targeting messenger RNAs (mRNAs). Evidence suggests that miRNAs and mRNAs interact collectively in gene regulatory networks. The collective relationships between groups of miRNAs and groups of mRNAs may be more readily interpreted than those between individual miRNAs and mRNAs, and thus are useful for gaining insight into gene regulation and cell functions. Several computational approaches have been developed to discover miRNA-mRNA regulatory modules (MMRMs) with a common aim to elucidate miRNA-mRNA regulatory relationships. However, most existing methods do not consider the collective relationships between a group of miRNAs and the group of targeted mRNAs in the process of discovering MMRMs. Our aim is to develop a framework to discover MMRMs and reveal miRNA-mRNA regulatory relationships from the heterogeneous expression data based on the collective relationships. Results We propose DIscovering COllective group RElationships (DICORE), an effective computational framework for revealing miRNA-mRNA regulatory relationships. We utilize the notation of collective group relationships to build the computational framework. The method computes the collaboration scores of the miRNAs and mRNAs on the basis of their interactions with mRNAs and miRNAs, respectively. Then it determines the groups of miRNAs and groups of mRNAs separately based on their respective collaboration scores. Next, it calculates the strength of the collective relationship between each pair of miRNA group and mRNA group using canonical correlation analysis, and the group pairs with significant canonical correlations are considered as the MMRMs. We applied this method to three gene expression datasets, and validated the computational discoveries. Conclusions Analysis of the results demonstrates that a large portion of the regulatory relationships discovered by DICORE is consistent with the experimentally confirmed databases. Furthermore, it is observed that the top mRNAs that are regulated by the miRNAs in the identified MMRMs are highly relevant to the biological conditions of the given datasets. It is also shown that the MMRMs identified by DICORE are more biologically significant and functionally enriched. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2300-z) contains supplementary material, which is available to authorized users.

There is a growing body of literature showing that multiple miRNAs are coordinated by forming cohesive groups to collectively regulate one or more pathways [16,17]. The collective relationships yielded between a group of miRNAs and a group of mRNAs due to the tendency of the group formation act as a vital force in catering similar functioning miRNAs and mRNAs together. Therefore, the collective relationships between cohesive groups of miRNAs and their targeted mRNAs may provide better understandings on robust and potent regulatory relationships of miRNA-mRNA regulatory modules (MMRMs).
Several algorithms have been proposed to identify MMRMs from expression data using different approaches including Bayesian network learning [18], rule induction [19], association rule mining [20], population-based probabilistic learning [21], probabilistic graphical model [22][23][24], matrix factorization [25], and graph mining [17,26]. Most of these existing methods do not consider the collective relationships between a group of miRNAs and the group of targeted mRNAs in the process of identifying MMRMs. In addition, many of them are either stochastic, or require prior knowledge such as number of modules to be identified, confirmed interactions, target site information [27].
Adapting a greedy overlapping neighborhood expansion clustering method, ClusterONE, which was developed to discover protein complexes from protein-protein interactions networks, Li et al. [27] proposed a clustering algorithm, Mirsynergy to detect synergistic miRNA regulatory modules. However, it requires and depends on the prior knowledge of confirmed gene-gene interactions. Recently Karim et al. [28] coined the notion of collective group relationships, and developed a method by integrating unweighted graphing mining concept and canonical correlation analysis to explore miRNA-mRNA regulatory relationships. However, it is noted that unweighted graph mining techniques are associated with limitation in representing the true interactions, and sometimes fail to capture correct regulatory relationships. Whereas weighted graph mining approaches can greatly improve the detection of the module structures [29], and hence regulatory relationships.
In this paper, we propose an effective computational framework, DIscovering COllective group RElationships (DICORE) to identify MMRMs and hence reveal miRNA-mRNA regulatory relationships from heterogeneous data. In order to extract MMRMs from the given gene expression datasets, we utilize the notion of collective group relationships, which provide MMRMs with additional quantitative strength information. The method finds a deterministic solution to the problem of discovering MMRMs from weighted bipartite graph representation of the given datasets, and rank the collective group relationships based on their strength of collective relationships. We apply DICORE to a dataset for Epithelial to Mesenchymal Transition, a breast cancer dataset, and a multi-class cancer dataset. Based on the knowledge from the literature, it is observed that the identified MMRMs exhibit enriched functionality with biological significance.

Problem statement
Consider two sets of variables X = {X 1 , . . . , X p } and Y = {Y 1 , . . . , Y q } such that X ∩ Y = ∅, representing the attributes of two different types of objects. In this paper, X and Y refer to the expression levels of a set of miRNAs and a set of mRNAs, respectively. With their given datasets, D X and D Y , having n matching miRNA and mRNA expression samples, our goal is to identify any C x ⊂ X and C y ⊂ Y, such that C x and C y are related, as a result of miRNAs in C x collaboratively interacting with mRNAs in C y and vice versa. We call (C x , C y ) a group pair, and the relationship between C x and C y a COllective group RElationship (in short, CORE). The COREs are characterized by both group pairs and the collective relationships among the two cohesive groups in group pairs. Then the group pair (C x , C y ) is an MMRM if the strength of the CORE between C x and C y is significant.
In order to discover COREs, and thus to identify MMRMs, we develop a two stages method, DIscovering CORE (DICORE). Two measures, collaboration score and canonical correlations, are employed in the two stages respectively. In the following, we firstly overview the workflow of DICORE, and then present the details of DICORE, including the definition of the collaboration score and the calculation of canonical correlations.
Overview of DICORE Figure 1 shows the workflow of DICORE. The overall workflow comprises a data pre-processing step and two main stages: (1) forming separate miRNA and mRNA groups and (2) searching for COREs. Given the inputs of miRNA and mRNA expression profiles, we first derive an expression-based interaction weights matrix W using correlation test. We then compute two collaboration score matrices S and T from W for miRNAs and mRNAs based on their functional interaction similarities with common mRNAs (or miRNAs), respectively. Using these collaboration scores as input, we separately generate groups of miRNAs and groups of mRNAs at Stage 1 by an overlapping neighborhood expansion clustering algorithm, in which miRNAs or mRNAs are greedily added to (removed from) each cluster of miRNAs or mRNAs, respectively that maximize cohesiveness score of the cluster. Next in Stage 2, we apply canonical correlation analysis on the groups of miRNAs and groups of mRNAs to obtain significant collective group relationships, which are eventually the MMRMs with strength scores In the data pre-processing step, DICORE first creates a weighted bipartite graph representation of the relationships among the individual variables of the given miRNA and mRNA expression profiles. Taking the variables as the vertices of a weighted bipartite graph G, a weighted edge is introduced between a miRNA variable and a mRNA variable to represent their interaction. Referring to Fig. 1, given p miRNAs and q mRNAs, let W denote the (p × q) miRNA-mRNA interaction weights matrix, where w ij is the interaction weight for miRNA i targeting mRNA j. To compute miRNA-mRNA interaction weights, we calculate the Pearson correlation coefficient (PCC) [25] between each pair of miRNA and mRNA using the R built-in function, cor. The obtained PCCs are within the range of [−1, 1], and the signed correlation coefficients provide two types of valuable information: the absolute values implying the strength of the miRNA-mRNA interactions (the higher the values, the stronger the interactions), and the signs indicating the directions of the associations. However, as the aim of the paper is to identify MMRMs (and thus to uncover miRNA-mRNA regulatory relationships), the collaboration score (explained in the next section) defined for discovering the modules considers the sum of the miRNA-mRNA correlations. In order to cater for both up and down miRNA regulations when calculating the total strength of the interactions, we use absolute values of the PCCs in the interaction weights matrix W, otherwise the signed PCCs or interaction weights will cancel out in Eq. (1).
Due to the higher possibility of dense interactions in the expression profile datasets, complete weighted graph mining may not be able to distinguish correct group structure. Accordingly we used a cutoff threshold η to trade off between the two extreme approaches namely complete unweighted graph mining and complete weighted graph mining.
At stage 1, we separately identify groups of miR-NAs and groups of mRNAs. Referring to Fig. 1, based on the interaction weights matrix W, we firstly calculate the collaboration score between each pair of miR-NAs and create the miRNA-miRNA collaboration matrix, S. The collaboration score between a pair of miRNAs reflects their similarity or collaboration in regulating target mRNAs (more details of collaboration scores are given in the next section). In a similar way, we compute the collaboration score between each pair of mRNAs (which implies their similarity in being regulated by miRNAs) and create the mRNA-mRNA collaboration matrix, T.
The identification of groups of miRNAs (or groups of mRNAs) is formulated as an overlapping clustering problem. Only the miRNAs (or mRNAs) that have strong collaboration between them are put in the same group, i.e. we use their collaboration scores as the similarity measure for the clustering. The clustering process is then aimed at maximizing the overall similarity of the miR-NAs (or mRNAs) within the same group. We define such overall similarity within a group as the cohesiveness of a group (details of the definition is provided in the next section). The underlying clustering algorithm adapts from ClusterONE, which was originally developed for protein protein interaction networks [29]. Adopting the idea from [25], we discard groups with fewer than 5 mRNAs (i.e. minimum size threshold for mRNAs, θ g = 5), as they usually do not provide relevant information. Similarly, we are not interested to consider groups having more than 500 mRNAs. Additionally, in order to avoid 'star-shaped' basic network structure, we choose 3 as minimum size threshold for miRNAs, θ m .
At stage 2, we use canonical correlation analysis to compute the strength of the collective relationships between groups of miRNAs and groups of mRNAs in terms of canonical correlations, and obtain COREs, which is eventually equivalent to MMRMs with additional quantitative information. We considered only the top COREs identified (i.e. the COREs with the higher canonical correlations), having minimum canonical correlation of ρ = 0.50.

Details of DICORE
In the following, we introduce the details of the collaboration score and how CCA is used to measure the strength of the collective group relationships.
The collaboration score expresses the degree of collaboration between two miRNAs (or between two mRNAs) considering their common interactions with mRNAs (or miRNAs). Given miRNA i, miRNA j ( = i) and the interaction weights matrix W, the collaboration score of the two miRNAs is calculated as follows: where l is the number of other possible components that both miRNA i and miRNA j interact with, in this case mRNAs, so l = q. Let S refer to the miRNA-miRNA Similarly, we compute the mRNA-mRNA collaboration score between mRNA i and mRNA j ( = i) by applying Eq. (1) on the transpose of the interaction weights matrix W, where l = p, the number of miRNAs. Let T refer to the mRNA-mRNA collaboration matrix of size q × q, where Notably, if W were a binary matrix, Eq. (1) became the ratio of number of target mRNAs shared between miRNA i and miRNA j over the numbers of target mRNAs possessed separately by miRNA i or miRNA j (or the ratio of number of common miRNAs regulate both mRNA i and mRNA j over the numbers of miRNAs individually regulate mRNA i or mRNA j). An miRNA (or an mRNA) i is then ranked by the total collaboration score as p k=1 s ik or q k=1 t ik .
Using collaboration scores as the similarity measures of pairs of miRNAs or pairs of mRNAs, miRNAs and mRNAs are clustered separately into cohesive groups by using a greedy strategy that maximize the cohesiveness score of groups. Similar to the cohesiveness defined in [29], we define cohesiveness score, cs(C i ) for any group C i as follows: where w int (C i ) denotes the sum of the collaboration scores of all the internal pairs of variables, i.e. each pair only contains variables within the group C i ; w ext (C i ) is the sum of the collaboration scores of all the external pairs, i.e. each pair contains one variable within the group C i and one variable outside the group C i ; and α * |C i | is a penalty term asserting the existence of unidentified interactions in the dataset, practically assuming that every component in C i has α additional interactions that are undetected due to the limitations in the experimental setting. DICORE uses canonical correlation analysis (CCA) [30] to compute the strength of the collective relationships between a group of miRNAs and a group of mRNAs in terms of the group pair's canonical correlations. CCA is commonly used for quantifying the linear association between two sets of variables. Consider A = a X, B = b Y be the corresponding linear combinations of sets of variables X and Y respectively, where a and b are coefficient vectors. Vectors a and b are chosen such that the correlation between A and B, i.e., is maximized, where XX , YY and XY are variance of X, variance of Y , and covariance between X and Y , respectively. The correlation r between the pair of linear combinations in Eq. (3) is called canonical correlation. Specifically, canonical correlation between a group of miRNAs and a group of mRNAs is computed using the R function CCA from the package PMA. The intuition behind applying CCA is twofold. Firstly CCA captures weight scores of all interactions between all miRNAs and mRNAs in both groups of a group pair, while computing the strength of the collective interactions of the group pair. As a consequence, CCA mitigates the loss of weight scores of interactions due to the application of cutoff threshold η earlier. Secondly, it also makes it possible for a group of miRNAs (or a group of mRNAs) to be included in more than one CORE i.e. one module, if the strength of collective interactions satisfies the specified threshold.

Data collection
Three real-world gene expression datasets are used to validate DICORE: an NCI60 dataset for Epithelial to Mesenchymal Transition, a breast-cancer (BR) dataset, and a multi-class cancer (MCC) dataset. The pre-processed differentially expressed gene expression datasets were collected from [31].
Epithelial to Mesenchymal Transition (EMT) is a biological process that enables cells to acquire migratory mesenchymal characteristics by losing epithelial features. The EMTs are associated with embryonic development, wound healing, organ fibrosis, and in the initiation of metastasis for cancer progression. The NCI60 dataset includes 60 cancer cell lines from the National Cancer Institute (NCI). Cell lines categorized as epithelial (11 samples) and mesenchymal (36 samples) were used for this work. As a result of the differential gene expression analysis, 1154 mRNAs and 35 miRNAs were identified to be differentially expressed at significant level (adjusted p-value < 0.05, adjusted by Benjamini-Hochberg (BH) method).
The BR dataset includes expression profiles of the 50 cell lines of breast cancer. The cell lines were categorized as luminal (27 cell lines) and basal (23 cell lines). In the dataset, 89 miRNAs (adjusted p-value < 0.02) and 1500 mRNAs (adjusted p-value < 0.0001) were identified to be differentially expressed.
The MCC dataset includes samples from multiple cancers namely bladder, breast, colon, kidney, lung, pancreas, prostate and uterus. Samples of the dataset classified as normal (21 samples) and tumor (67 samples) were used in this work. In total, 62 miRNAs and 1318 mRNAs were obtained to be differentially expressed at significant level (adjusted p-value < 0.05).
The datasets are available in Additional file 1. We used the expression data to calculate the miRNA-mRNA interaction weights matrix W. We obtained the interaction weights of W by computing the absolute values of the Pearson correlation coefficients between pairs of miRNA and mRNA.
In order to obtain the 'ground-truth' databases of experimentally confirmed miRNA-mRNA interactions, we combined the interactions from four popular interactions databases, namely DIANA-TarBase v7.0 [32], miRTarBase v4.5 [33], miRecords v2013 [34], and miRWalk v2.0 [35]. While miRWalk contains both predicted and experimentally validated miRNA-mRNA interactions, rest of the databases include high quality manually curated experimentally validated miRNA-mRNA interactions published in the literature. Recently published DIANA-TarBase v7.0 alone included more than half a million interactions utilizing cell types from 24 species. We also added a HITS-CLIP database [36], which lists the confirmed targets of two miRNAs, namely miR-200a and miR-200b. We extracted only the confirmed miRNA-mRNA interactions associated with the human miRNAs and mRNAs given in the input datasets, and removed the duplicate entries. Finally, we obtained 'ground-truth' databases of 2147, 5791, and 8733 unique miRNA-mRNA interactions for the 29 miRNAs in the NCI60 dataset (there are no confirmed interactions for the 6 miRNAs with the name prefix hsa-miRPlus-), 89 miRNAs in the BR dataset, and 62 miRNAs in the MCC dataset, respectively. Details of the 'ground-truth' databases are available in Additional file 2.

Results and discussions
We ran the experiment for all values for the cutoff threshold η in the range from 0 to 1 with a step size of 0.05. We only reported the summary and top results for each dataset. In each summary table, #C, mR, miR,r, and t denote the number of COREs identified, average number of mRNAs in COREs, average number of miRNAs in COREs, average strength of the COREs, and time taken for the execution in seconds, respectively. The group distributions and all COREs for all datasets are described in details on our website (visit [37]).

The NCI60 Dataset
The results obtained from the NCI60 dataset are summarized in Table 1. It is clear from the summary that potentially interesting results are obtained for the η values ranging from 0.60 to 0.85. By lowering the values of η, more miRNAs and mRNAs were added to these groups. For more in-depth analysis, we look more closely at some of the particular results.
We obtained the most informative result (in terms of the strength of COREs, and number of experimentally confirmed interactions covered) for η = 0.60, with 8 COREs involving 1 miRNA groups and 8 mRNA groups.  The only group of miRNAs 'm1N60' catered in total 19 miRNAs including the miR-200 family. On the other hand, we got the top mRNAs group (group having highest cohesiveness) 'g1N60' having 348 mRNAs. It included CDH1 (epithelial cadherin or in short E-cadherin, a classical member of the cadherin superfamily, which plays a vital role in EMT such that EMT is also characterized by repression of E-cadherin expression), ZEB1 (E-cadherin transcriptional repressor, which is usually targeted by miR-200 family), and TWIST1 (one of the important EMT inducers). Furthermore, another interesting result was obtained for η = 0.65. We got 6 COREs from 2 groups of miRNAs and 3 groups of mRNAs. The top miRNAs group 'm1N65' catered 14 miRNAs and is a proper subset of 'm1N60' . The second miRNAs group 'm2N65' included total 7 miRNAs including 3 miRNAs from the miR-17 miRNA gene family, namely miR-106b, miR-18a, and miR-18b.

The BR dataset
From the summary given in Table 2, it is seen that higher informative results were obtained for η values from 0.55 to 0.65 from the BR dataset. The most informative result was obtained for η = 0.60. We got 33 COREs from 2 groups of miRNAs and 17 groups of mRNAs. The top group of miRNAs included miR-221 and miR-222, both of them are known to play important regulation role in aggressive breast cancer [38].

Functional enrichment analysis of the COREs
A CORE consists of a group of miRNAs and a group of mRNAs, in which the individual interactions between miRNAs and mRNAs play a vital role. To demonstrate the effectiveness of DICORE, we identified the interactions in the obtained COREs and compared them with the experimentally confirmed interactions found in the 'ground-truth' databases. The confirmed interactions of the top COREs identified from the NCI60 dataset for η = 0.65 are summarized in Table 4. The confirmed interactions for the miRNAs in the miR-200 family included in the top CORE 'C1N65' are illustrated in Fig. 2 using an example CORE, where red nodes are miRNAs and green nodes are experimentally confirmed target mRNAs. The higher number of confirmed interactions demonstrated the effectiveness of DICORE.
We got similar higher experimentally confirmed interactions for top COREs identified from BR and MCC datasets. The experimentally confirmed interactions for top COREs identified from the three datasets are listed in Additional file 3.

Pathway analysis of the COREs
A biological pathway is a group of genes that participate in a particular biological process to perform certain functionality in a cell. To find the controlling factors of a disease, it is meaningful to study the genes by considering their pathway information.
We used the GeneCodis [39] online tool at [40] to conduct pathway enrichment analysis of the COREs with the focus on significant Kyoto Encyclopedia of Genes and Genomes (KEGG) [41] pathways (adjusted  COREs are significantly associated with the KEGG pathway: Pathways in cancer. Since the three datasets are all cancer datasets, the results demonstrate that the identified COREs are closely related to the biological conditions of their respective datasets. Again, we used GeneGo Metacore [42] from GeneGo Inc. to identify the pathways previously discovered in the literature that involve the mRNAs in the identified top  COREs. Table 8 shows the first 10 pathways as well as some other related pathways identified for another top CORE 'C1N65' from the NCI60 dataset. It confirms that 'C1N65' is highly relevant to the biological condition of the dataset. For instance, pathways number 1, 8, 11, 14 and 20 are direct pathways of the development of EMT, and others are important pathways involved in the process of EMT. Moreover, pathway number 1 includes total 12 members, of which 7 were identified in 'C1N65' . The pathway enrichment analysis has clearly justified the use of CCA in ranking the COREs, as the top ranked COREs show higher biological significance, and represent the given datasets. The detailed information of significant pathways identified from the three datasets is summarized in Additional file 4.

Implication of the COREs in cancer
Since all of the input datasets included the expression profiles of miRNAs and genes associated with cancer samples, it is expected that the COREs identified from those datasets are to be related to cancer. To verify this, we used a cancer miRNA benchmark dataset of 147 miRNAs from a review article of [43]. Each of these miRNAs was reported in the literature to be dysregulated in one or more cancers. The NCI60 dataset has 14 miRNAs from the benchmark, and except for miR-205, rest 13 are included in the top COREs. Both the top COREs 'C1N60' and 'C1N65' from the NCI60 dataset included 9 of the 14 miRNAs (namely miR-141, miR-148b, miR-200a, miR-200b, miR-200c, miR-203, miR-301a, miR-32, miR-7), which are associated with different cancers like Glioblastoma, Prostate, Lung, Bladder, Colon, Breast, Esophageal, Colorectal, Hepatocarcinoma, Ovarian, squamous cell carcinoma of tongue (SCCT), and Pancreatic.
Again, among these 147 miRNAs, 34 miRNAs are relevant to breast cancer. The BR dataset has 7 miRNAs out of these 34, of which 4 are identified in the top COREs.

Targets prediction for miRNAs of the COREs
In this section, we report a set of novel miRNA-mRNA interactions for further experiments. These miRNA-mRNA interactions identified by DICORE are the predicted targets of conserved miRNA families in TargetScan v6.2 [4,46]. Fig 3 visualizes the predicted interactions in a model interaction representation of the CORE 'C1N65' , where red nodes are miRNAs, yellow nodes are conserved target mRNAs, and white nodes are poorly conserved target mRNAs. Predicted (conserved) interactions for top COREs from the three databases are given in Additional file 5.

Comparison with other methods
We summarize here the comparison study of the result of DICORE with the results of a few recent methods Mirsynergy [27], SNMNMF [25], and PIMiM [47] reported in [27]. We obtained the same ovarian cancer (OVC) dataset processed in [25]. The original miRNA and gene expression profiles for 385 ovarian cancer samples were downloaded from [48]. The expression dataset contains measurements of 559 miRNAs and 12456 mRNAs.
In case of performing a comparison study, our initial intention was to compare the result of DICORE with the results of two other methods, Mirsynergy [27] and SNM-NMF [25] by applying them to the three cancer datasets (NCI60, BR and MCC) used for validating DICORE.
However, both Mirsynergy and SNMNMF require as their input the gene-gene interactions (GGIs) derived from protein-protein interactions and transcription factor binding sites, which, according to [25] and [27], are to be obtained from the two datasets, BioGrid [49] and TRANSFAC [50]. Unfortunately, we could only manage to get the GGIs associated with the three cancer datasets from BioGrid. As a consequence, the results we obtained from Mirsynergy using these three cancer datasets were not good. Therefore to make a fair comparison with the two methods, we apply our method to the dataset (the OVC dataset) on which Mirsynergy and SNMNMF have had their results reported in the literature.
Similar to the setting used in [27], we used the absolute values of only negative interaction weights of W and same pair of values for the density thresholds, and set 2 for the penalty value in calculating cohesiveness scores. In addition, we set = (θ m , θ g ) = 2 for both groups of miRNAs and mRNAs due to the requirement for calculating CCA, as CCA can not be applied on groups having less than 2 components. Table 9 shows a summary of the performance of the four methods. DICORE identified 56 modules with an average of 8. We report here two interesting modules. Firstly, the module or CORE 'C9O35' consists of 4 miRNAs, namely miR-29c*, miR-29a, miR-29b, and miR-29c from the same miR-29 family. The human miR-29 family of miRNAs is known to be associated with ovarian cancer [43,51]. The pathway analysis of this module also shows association with cancer (see Table 10).
The final module structure of Mirsynergy is heavily depended on the initial clustering of miRNAs and the prior knowledge of gene-gene interactions. If Mirsynergy gets c clusters of miRNAs in the first stage, finally it will produce at most c miRNA regulatory modules. On contrary, DICORE separately performs clustering of miRNAs and mRNAs based on their functional interactions with mRNAs and miRNAs, respectively. This allows two distinct groups of mRNAs functioning differently to be part  of different modules despite the fact that they are interacting with the same group of miRNAs. Furthermore, it also allows a group of miRNAs to interact with more than one group of miRNAs, which is common in biological sense.

Related works
Several computational approaches had been proposed to discover MMRMs. The concept of MMRMs was introduced by [18] to denote groups of co-expressed miR-NAs and their targets mRNAs. They drew a similarity between predicting MMRMs and mining frequent itemset by mapping the set of miRNAs and the set of target mRNAs to a frequent itemset and its cover, respectively. They proposed a prediction method adopting bipartite graphs to model binding structures of the miRNAs and mRNAs at the sequence level. However, prediction based on sequence may not be sufficient to correctly predict the complex interactions. Improved versions of this method had been proposed which also take into account coherent expression patterns between miRNAs and mRNAs, or the (anti)-correlations measured between each pair of miRNAs and mRNAs [19,21,26]. Joung et al. [21] integrated expression profiles of miRNAs and mRNAs with sequence information by using a biclustering approach. The approach reduced false discovery rate significantly. A rule based method was utilized by Tran et al. [19] based on the assumption that miRNAs and mRNAs of a module have similar expression patterns. However, these existing methods for discovering MMRMs suffered from several limitations. For example, Peng et al. [26] proposed a sequential integrative method based on enumerating maximal bicliques in a combined miRNA-gene network. Their method was sensitive to noise in the data, and produced too many star structures (one miRNA, many genes) which were not usable to explore miRNA combinatorial regulation. The functional MMRMs (FMMRMs) are associated with MMRMs with specific biological conditions. For FMMRMs discovery, [22] and [20] proposed different methods at around the same time. Joung and Fei [22] proposed an unsupervised method which applied the authortopic model [53,54] in bioinformatics. The method used the expression profiles of miRNAs and the putative miRNA target information, without considering the expression profiles of miRNAs. As the miRNA target information is predicted at the sequence level, it encountered similar difficulty of [18] in explaining regulation pattern of miRNAs in their target genes in the identified modules. On the other hand, Liu et al. [20] proposed a supervised method which utilized association rule mining method by associating the reverse expression patterns of miRNAs and genes with biological conditions. However, they only considered down-regulation patterns.
In order to discover FMMRMs, Liu et al. [23] applied another probabilistic graphical model, correspondence Latent Dirichlet Allocation (Corr-LDA) [55], that had been applied to automatic image annotation with caption words. By associating topics to functional modules, images to miRNAs, and words to mRNAs, respectively, the method was applied to a mouse model dataset for human breast cancer research. The method simultaneously identified FMMRMs using the expression profiles of both miRNAs and genes, with or without using target relationships between miRNAs and mRNAs. The Corr-LDA was extended and applied to identify functional regulatory module, and each module corresponds to a particular biological function. In the model, each function was represented as a latent topic, and the numeric values of expression data were converted to the counts of expression events, similar to the counts of words in a documents. Another similar semi-supervised method based on a probabilistic model which is closely associated with the Latent Dirichlet Allocation [56] was proposed in [24]. The idea of extracting topics with caption words to FMMRMs discovery by mapping topics to functional modules, documents to samples, and words to mRNAs, respectively.
The main drawback of these methods is that they did not consider the collective relationships in identifying the modules, which result in regulatory modules that may not quite correct modeling of the real biological systems. Recently Karim et al. [28] came up with the idea of collective group relationships, and proposed a method to explore miRNA-mRNA regulatory relationships. They integrated two complementary approaches associated with relationships of complex systems, namely graph mining and CCA to discover collective relationships with both quantitative and qualitative information. However, the proposed method considered unweighted graph, which are prone to make computational inaccuracy due to the approximation of many interaction weights to either 1 (interaction) or 0 (no interaction). Recently Li et al. [27] proposed a clustering algorithm, Mirsynergy to detect synergistic miRNA regulatory modules. They used mRNA and miRNA expression profiles, target site information and gene-gene interactions for ovarian, breast, and thyroid cancers from TCGA [57] and obtained significantly higher enrichment than existing methods. However, it partially used collective relationships in stage 1, and but in stage 2 depended on the prior knowledge of confirmed gene-gene interactions.
This paper presents a novel method that discover MMRMs by considering the collective relationships as the driving force in identifying the miRNA-mRNA regulatory relationships. Furthermore, it uses the idea of ranking the identified modules by the quantitative measure of the strength of the collective relationships between the groups in group pairs.

Conclusion
In this paper, we have used the notation of CORE, and proposed a computational framework DICORE to discover MMRMs. The central idea of DICORE is to consider the collective group relationship, and discover both the groups and collective relationships simultaneously. We have applied a greedy-based overlapping clustering approach adapted from ClusterONE [29] to group miRNAs and mRNAs separately based on their collective interactions with mRNAs and miRNAs respectively, and integrate CCA in order to enrich the identification of groups with both structural link information and strength of collective relationships. We have experimented on three real-world biological datasets. The experimental results have demonstrated that the proposed method DICORE is able to reveal correct group information with structural link information and the strength of collective relationships, and provide useful insights into the structure and functionality of the miRNA-mRNA regulatory relationships in MMRMs.
The proposed framework has also opened a few interesting research windows for further investigation. Instead of using Pearson correlation coefficient to calculate the interaction weights matrix, other approaches including statistical methods like maximal information coefficient [58], regression techniques like Lasso [59], causal inference method like IDA [31] can be applied. Considering the context of the datasets, any of the individual methods or an ensemble method [31] can be tested and reported. Furthermore, the strength of the collective interactions can be determined by applying other similar mathematical models to capture all possible association between two sets of variables. Another interesting future work will be to apply the framework to discover MMRMs from datasets obtained under different biological conditions.