 Methodology article
 Open Access
 Published:
Identification of microRNAmRNA modules using microarray data
BMC Genomics volume 12, Article number: 138 (2011)
Abstract
Background
MicroRNAs (miRNAs) are posttranscriptional regulators of mRNA expression and are involved in numerous cellular processes. Consequently, miRNAs are an important component of gene regulatory networks and an improved understanding of miRNAs will further our knowledge of these networks. There is a manytomany relationship between miRNAs and mRNAs because a single miRNA targets multiple mRNAs and a single mRNA is targeted by multiple miRNAs. However, most of the current methods for the identification of regulatory miRNAs and their target mRNAs ignore this biological observation and focus on miRNAmRNA pairs.
Results
We propose a twostep method for the identification of manytomany relationships between miRNAs and mRNAs. In the first step, we obtain miRNA and mRNA clusters using a combination of miRNAtarget mRNA prediction algorithms and microarray expression data. In the second step, we determine the associations between miRNA clusters and mRNA clusters based on changes in miRNA and mRNA expression profiles. We consider the miRNAmRNA clusters with statistically significant associations to be potentially regulatory and, therefore, of biological interest.
Conclusions
Our method reduces the interactions between several hundred miRNAs and several thousand mRNAs to a few miRNAmRNA groups, thereby facilitating a more meaningful biological analysis and a more targeted experimental validation.
Background
MicroRNAs are short (2022 nt) nonprotein coding RNA sequences that are involved in the posttranscriptional regulation of genes either by mRNA cleavage and degradation or by repressing the translation of mRNA into proteins. Though they are a relatively recent discovery, they are of immense biological interest given their regulatory role in numerous cellular processes, e.g. some miRNAs can act as oncogenes or tumorsuppressors in humans [1–3].
The identification of regulatory miRNAs and their target mRNAs is a major combinatorial challenge because a single miRNA regulates multiple mRNAs and several miRNAs coregulate a single mRNA. The manytomany relationship between miRNAs and mRNAs results in a powerful ability for miRNAs to control cellular protein output and function. Therefore, methods which are able to discover these miRNAbased regulations may provide a means for identifying the key cellular pathways that contribute to a biological event, such as the initiation of cancer. One of the widely used methods for the identification of regulatory miRNAs is based on mRNA microarray expression data. Firstly, the putative miRNAmRNA (miRmR) pairs are identified using a prediction algorithm such as TargetScanS [4], PicTar [5] or miRanda [6]. These algorithms are based on miRNA and mRNA sequence data and return thousands of putative miRmR pairs. Secondly, a gene set test (GST) (e.g. [7, 8]) is performed to determine whether, for a given miRNA, the predicted target mRNAs have expression values different from that for the remaining mRNAs. An miRNA is considered to be regulatory if the GST returns a statistically significant pvalue.
Another method was proposed by Jayaswal et al. [9] and is, henceforth, referred to as the oddsratio (OR) method. The OR method determines whether there is an association between miRNA expression and that of its computationally predicted target mRNAs. If the association is statistically significant, then the miRNA is considered to be regulatory. Unlike the GST, the OR method is based on the expression data from both miRNAs and mRNAs. While both methods identify individual miRNAs as potentially regulatory, they ignore the biological observation that a group of miRNAs may coregulate a group of mRNAs. Another limitation of the two methods is that the identification of regulatory miRNAs does not provide information about the miRmR pairs of interest and these pairs have to be identified separately. Since the experimental validation of an miRNA's regulatory role requires an observable (and statistically significant) change in the expression of its target mRNAs, the knowledge of miRmR pairs is essential.
We denote a group of miRNAs and mRNAs as a module if the miRNAs in a group regulate the mRNAs in the same group. One of the earliest methods for the identification of miRmR modules was proposed by Yoon and De Micheli [10]. This method is based on computationally predicted miRmR pairs and does not utilize miRNA or mRNA expression data. Since miRNAs that are regulatory in one experimental scenario may not be regulatory in another, e.g. miRNAs that are regulatory in breast cancer may not be regulatory in other types of cancer, this method is of limited use.
Recently, a few integrative methods (i.e. methods that combine miRNA and mRNA microarray expression data with miRmR sequence data) have been developed to obtain miRmR modules. For example, Joung and Fei [11] used a combination of putative miRmR pairs and mRNA expression data to identify miRmR modules. A limitation of their method is that the correlation between miRNA and mRNA expression data, which provides statistical evidence for a miRmR pair being regulatory, is not considered. Joung et al. [12] proposed an evolutionary algorithm based method for identifying miRmR modules. While the authors considered correlation among the miRNAs or mRNAs in a module, they ignored the correlation between miRNAs and mRNAs. Therefore, their method has the same limitation as Joung and Fei's method. Peng et al. [13] proposed a method based on the correlation between miRNA and mRNA expression data and this method is, henceforth, referred to as the PengLi method. The authors calculated the correlation coefficients for all computationally predicted miRmR pairs followed by the selection of miRmR pairs that had a statistically significant negative correlation. Next, the negatively correlated miRmR pairs were grouped to obtain modules such that each miRNA in a module targeted all the mRNAs in the module. The major limitation of the PengLi method is that the miRNAs and mRNAs in a module are not required to be enriched (closer than by chance) in terms of their predicted targets and regulators, respectively. For example, though miRNAs belonging to the same family are similar in terms of their target mRNAs (in fact, TargetScanS predicts the same target mRNAs for all the miRNAs in a family), the PengLi method does not take this into account while grouping miRNAs. Another limitation is that the identification of modules is based on a negative correlation between miRNAs and mRNAs. Since some recent studies [14] have shown that miRNAs and their target mRNAs do not always share a negative correlation, a measure of association that incorporates both directions of change is desirable.
To this end, we propose a novel miRmR module identification method that has two components  the first component is the identification of miRNA and mRNA clusters and the second component is the estimation of association between the two types of clusters. The miRNA and mRNA clusters with statistically significant associations represent the potentially regulatory miRmR modules. In the next section, we describe our method and illustrate the method's ability to identify biologically meaningful miRmR modules using a matched miRNA and mRNA microarray data set.
Results
Novel miRmR module identification method
We propose a twostep method (Figure 1) for the identification of miRmR modules. The first step involves the identification of miRNA and mRNA clusters that are enriched in terms of target mRNAs and regulators, respectively. The mRNA (or miRNA) clusters are obtained using computationally predicted miRmR pairs and mRNA (or miRNA) expression data. The second step involves the identification of miRNA clusters that have a statistically significant association to one or more of the mRNA clusters. These associations are obtained using matched miRNA and mRNA expression data.
Step1: Clustering
The clustering of miRNAs or mRNAs requires two input parameters  a measure of dissimilarity between miRNAs or mRNAs and the number of clusters, K. Once a dissimilarity measure has been obtained, clustering is performed using the partition around medoids [15] algorithm. We note that other clustering algorithms, e.g. a hierarchical clustering algorithm [16], can also be used in our module identification method. In the rest of this subsection, we focus on the generation of mRNA clusters and note that a similar approach can be used for the generation of miRNA clusters. We refer to a matrix M = (m_{sw}), where 1 ≤ s ≤ C and 1 ≤ w ≤ D, as a C × D matrix and an element m_{sw} of the matrix as [s, w].
One measure of dissimilarity is the Euclidean distance between pairs of mRNAs. We obtain this using a miRmR map matrix of the form Y × X, where Y denotes the number of mRNAs and X denotes the number of miRNAs. If an miRNA i is predicted to target an mRNA j, then the element [j, i] of the map matrix takes the value 1 and 0, otherwise. Henceforth, we refer to the clustering based on Euclidean distances as unguided clustering or Clust_{UN}.
The unguided clustering does not take into account changes in mRNA expression profiles. If only miRmR pairs are considered, then the Euclidean distance between an mRNA (say m_{1}) and two or more clusters may be identical. Rather than picking one of the clusters at random, we can use mRNA expression profiles to guide m_{1} to the cluster whose members have the expression profiles closest to m_{1}. Therefore, a measure of dissimilarity that combines mRNA expression profiles with predicted miRmR pairs is desirable. Henceforth, we refer to the clustering based on these pairwise measures of dissimilarity as guided clustering or Clust_{GD}.
Our guided clustering method (Figure 1b) is based on multivariate random forest (MRF) [17] and requires two input matrices  a Y × X miRmR map matrix and a Y × T matrix of mRNA expression values, where T denotes the number of conditions (e.g. the number of time points or patient categories). In addition to the two matrices, the following three parameters are required 
(a) Node_size: A node is defined as a collection of mRNAs and splitting a node results in child nodes. The child nodes, in turn, are split further till a terminating condition is reached. The terminating condition is that the number of mRNAs at a node should be greater than a userdefined cutoff value. We refer to this cutoff value as node_size.
(b) Number of trees (N): If splitting a node, say PN, results in two child nodes, say CN_{1} and CN_{2}, and the number of mRNAs in either of the two child nodes is less than node_size, then the node PN is not split further. A node, e.g. PN in the above example, that is not split is referred to as a terminal node. A binary tree that is obtained by splitting each node (excluding terminal nodes) into exactly two child nodes is referred to as a regression tree. If T = 1, then the regression tree is referred to as a univariate regression tree, otherwise it is referred to as a multivariate regression tree [18, 19]. We denote a collection of multivariate regression trees as a MRF and N refers to the number of trees in the forest.
(c) Num_cov: For a multivariate regression tree, the function used to obtain the child nodes is I([j, i] = 1), where [j, i] is an element of the map matrix Y × X and I([j, i] = 1) takes the value one if miRNA i targets mRNA j and the value 0, otherwise. In other words, if a column i of the Y × X matrix is used to split a node, say BN, into two child nodes, then all the mRNAs that are targeted by miRNA i constitute one child node and all the remaining mRNAs constitute the second child node. However, there are X different miRNAs and, therefore, X different ways in which the child nodes, say DN_{1} and DN_{2}, can be obtained from BN. Let h denote the miRNA that is actually used to split the node BN and let x _{l} denote the vector of expression values for the l^{th} mRNA. In other words, x _{l} corresponds to the l^{th} row of the expression matrix Y × T. Let Q_{m} denote the set of mRNAs at node θ and x _{avg} denote the vector of average expression values for the mRNAs in Q_{m}. Let d(x _{l}, x _{avg}) denotes the Euclidean distance between the vectors x _{l} and x _{avg} and let \text{S}(\theta )={\displaystyle \sum _{\text{l}\in {\text{Q}}_{\text{m}}}{\text{{d(}{\text{x}}_{\text{l}},{\text{x}}_{\text{avg}}\text{)}}}^{2}} denote a measure of node homogeneity, i.e. the closeness of mRNAs in terms of their expression profiles. Let f(BN, DN_{1}, DN_{2}) = S(BN)  S(DN_{1})  S(DN_{2}) denote the change in node homogeneity when a node BN is split into child nodes DN_{1} and DN_{2}. Since there are X miRNAs, we obtain X different values of f(BN, DN_{1}, DN_{2}) and h refers to the miRNA that returns the highest value (see additional file 1: "Supplementary Data" for an illustration). In practice, owing to time constraint, it is not feasible to test all the X miRNAs and determine h. Instead, a randomly chosen subset of the X miRNAs is used to obtain h. We denote the number of miRNAs in the subset as num_cov. We typically set num_cov equal to the square root of the number of miRNAs, as recommended by Xiao and Segal [17].
The multivariate regression tree is used to obtain a Y × Y matrix, henceforth referred to as the proximity matrix. Each element of the proximity matrix takes the value 1 or 0. If two mRNAs, say a and b, belong to the same terminal node, then [a, b] = 1, else [a, b] = 0. Though the proximity matrix obtained using a regression tree has low prediction accuracy, the use of a collection of trees greatly improves the prediction accuracy [20]. Therefore, we calculate the proximity matrix for the MRF. If the MRF has N trees, then the element [a, b] of the MRF proximity matrix is obtained using the formula {\displaystyle \sum _{\text{g}=\text{1}}^{\text{N}}{[\text{a,b}]}^{\text{g}}/\text{N}}, where [a, b]^{g} is the value for the g^{th} regression tree. While the MRF proximity matrix (Prox_{matrix}) measures the similarity between mRNAs, 1  Prox_{matrix}, where 1 denotes a Y×Y matrix with every element equal to 1, measures the dissimilarity between mRNAs. This dissimilarity matrix is provided as an input for Clust_{GD} (Figure 1b).
Once the mRNA clusters have been obtained using Clust_{UN} or Clust_{GD}, we determine the clusters where the mRNAs are closer to one another than by chance. Let ES denote the enrichment score and K denote the total number of clusters. Now, for a given cluster k = 1, ..., K, {\text{ES}}_{\text{k}}={\displaystyle \sum _{\text{o}\in \text{k}}{\text{d(o,o}}_{\text{m}}{\text{)/N}}_{\text{k}}}. Here, N_{k} denotes the number of elements in cluster k, o_{m} denotes the median value of the elements in the k^{th} cluster and d(o, o_{m}) denotes the Euclidean distance between an element o, which belongs to cluster k, and o_{m}. We note that o and o_{m} are obtained using the relevant rows of the matrix Y × X. We perform a nonparametric bootstrap test to determine whether ES_{k} can be obtained by chance. Firstly, we obtain 100 random clusters of size N_{k} using sampling without replacement, i.e. the same mRNA does not occur more than once in a random cluster. Secondly, we obtain W_{1}, the number of times the enrichment score for a random cluster of size N_{k} is smaller than or equal to ES_{k}. If W_{1}/100 < 0.05, then ES_{k} cannot be obtained by chance and the k^{th} cluster is considered to be statistically significant or enriched.
Let Pr_{enrich} denote the percentage of clusters that are enriched. Therefore, Pr_{enrich} = N_{enrich}/N_{total}, where N_{enrich} denotes the number of enriched clusters and N_{total} ≤ K denotes the total number of clusters of size greater than one. We only consider clusters of size greater than one because we are interested in mRNAs that are cotargeted and, hence, coregulated by a small group of miRNAs. The Pr_{enrich} values are used to compare the results obtained using Clust_{UN} and Clust_{GD}. The clustering method that returns a higher value of Pr_{enrich} is used to obtain the mRNA clusters that are provided as an input to Step 2 of our method.
Step 2: Identification of statistically significant modules
Let R_{mir} and R_{gene} denote the number of enriched miRNA and mRNA clusters, respectively. Now the number of possible miRmR modules is R_{mir} × R_{gene}. For each module, we obtain the number of miRmR pairs with an association. A miRmR pair is considered to have an association if the pair is computationally predicted, i.e. the Y × X map matrix has the value 1 for the relevant miRmR pair, and there is evidence of a change in miRNA expression producing a change in mRNA expression. To test the latter condition, we assume a linear model of the form u = α + βv, where u denotes the change in mRNA expression, v denotes the change in miRNA expression, α denotes the change in mRNA expression that is independent of a change in miRNA expression, and β denotes the change in mRNA expression due to a unit change in miRNA expression. We estimate the values of α and β using matched miRNA and mRNA expression profiles; the expression profiles correspond to normalized log2 foldchange values with respect to a reference. Next, we test the null hypothesis that β = 0 versus the alternate hypothesis that β ≠ 0 using a ttest. If the pvalue < 0.05, then there is evidence that a change in miRNA expression produces a change in mRNA expression.
One advantage of the linear model is that it allows for changes in miRNA and mRNA expression to be in the same direction or opposite direction. However, with a slight modification, we can test the onesided hypothesis that a change in miRNA expression produces a change in mRNA expression in the opposite direction. Another advantage is that it is computationally faster than the use of correlation coefficient as a measure of association between a miRmR pair. Typically, the statistical significance of correlation coefficient is obtained using a nonparametric bootstrap test. In contrast, the statistical significance of β is determined using a precomputed table of significance for ttest.
Let {\mathit{N}}_{\mathit{m}}^{\mathit{m}\mathit{i}\mathit{r}} denote the number of miRNAs in the m^{th} miRNA cluster and let {\mathit{N}}_{\mathit{n}}^{\mathit{g}\mathit{e}\mathit{n}\mathit{e}} denote the number of mRNAs in the n^{th} mRNA cluster. Let Assoc(m, n) denote the number of miRmR pairs in the (m, n) module with an association. We determine the statistical significance of association between the m^{th} miRNA cluster and n^{th} mRNA cluster using a nonparametric bootstrap test. Firstly, we obtain 100 randomly generated mRNA clusters of size {\mathit{N}}_{\mathit{n}}^{\mathit{g}\mathit{e}\mathit{n}\mathit{e}}. Secondly, we obtain Assoc(m, n*), where n* denotes a randomly generated cluster of size {\mathit{N}}_{\mathit{n}}^{\mathit{g}\mathit{e}\mathit{n}\mathit{e}}. Thirdly, we obtain W_{2}, the number of times Assoc(m, n*) is greater than or equal to Assoc(m, n). If W_{2}/100 < 0.05, then the association between the m^{th} miRNA cluster and n^{th} mRNA is considered to be statistically significant. We consider the (m, n) miRmR module to be potentially regulatory if (i) the association between the m^{th} miRNA cluster and n^{th} mRNA cluster is statistically significant and (ii) each miRNA in the m^{th} cluster targets a majority of mRNAs in the n^{th} cluster.
In the next subsection, we compare the results obtained using Clust_{UN} and Clust_{GD}. We tried different values for node_size and num_cov and obtained the best Pr_{enrich} values for node_size = 5 and num_cov = 15 (see additional file 1: "Supplementary Data" for an example of node_size selection). Also, we set N = 100 as a further increase in the number of trees did not alter the results.
Guided vs. unguided clustering
We used a publicly available leukemia data set [21] to compare Clust_{UN} and Clust_{GD}. This data set contained the mRNA expression profiles of healthy donors and multiple myeloma (MM) patients belonging to four categories  no cytogenetic abnormality, cytogenetic abnormality t(4;14) (with or without RB deletion), cytogenetic abnormality t(11;14) (with or without RB deletion), and RB deletion as a unique cytogenetic abnormality. We identified 3882 differentially expressed (DE) mRNAs; an mRNA was considered to be DE if the average expression profile of healthy donors was different from that of patients in one or more categories. Of the 3882 DE mRNAs, we could obtain the plausible miRmR pairs for only 1492 mRNAs using a combination of four miRmR targetprediction databases  miRBase [22], PicTar, TargetScanS and miRGen [23] (intersection of PicTar(4way) and TargetScanS). In other words, the miRmR mapping information was available for only 38.4% of the DE mRNAs. Also, the total number of miRNAs that targeted one or more of the 1492 DE mRNAs was 215.
We converted this miRmR mapping information into a 1492 × 215 map matrix such that the rows and columns of the matrix corresponded to mRNAs and miRNAs, respectively. An element [j, i] of this matrix took the value 1 or 0 depending on whether the j^{th} mRNA was targeted by the i^{th} miRNA or not. Next, we obtained the log2 foldchange values for leukemia patients with respect to healthy donors and derived the 1492 (mRNA) × 4 (patient categories) expression matrix. While the map matrix was provided as input to both Clust_{UN} and Clust_{GD}, the expression matrix was provided as input to only Clust_{GD}.
Figure 2a shows the Pr_{enrich} values for Clust_{UN} and Clust_{GD} over a range of K values  40, 50, 60, 70, 80, 90, and 100. For this range of K values, the average number of mRNAs per cluster varied from 37.3 to 14.9. Though the average number of mRNAs per cluster was greater than one, an individual cluster occasionally had just one mRNA. A cluster of size one corresponded to an mRNA that had very few regulators (i.e. miRNAs) in common with other mRNAs. Since we were interested in identifying clusters of mRNAs that were cotargeted by miRNAs, we ignored the mRNAs that belonged to clusters of size one during downstream analysis. For Clust_{GD}, the total number of clusters of size greater than one was always equal to K. In contrast, for Clust_{UN}, the number of clusters of size one increased with an increase in K.
For Clust_{GD}, the best Pr_{Enrich} value was 80% and it was obtained for two values of K 50 and 60. For Clust_{UN}, the best Pr_{Enrich} value was 76.05% and it was obtained for K = 90. This shows that an iterative approach, wherein several K values are tried, is required to obtain the best Pr_{enrich} value. Overall, the Pr_{enrich} values for Clust_{GD} were higher than that for Clust_{UN} in 71.4% of the cases.
The mRNA clusters can be obtained using PAM or a hierarchical clustering algorithm. To compare the results obtained using the two clustering algorithms, we provided the output of MRF as an input to an agglomerative hierarchical clustering algorithm [16]. We varied K from 40 to 100 in increments of 10 and compared the Pr_{enrich} values with those obtained earlier using PAM. The clusters obtained using PAM had a higher Pr_{enrich} value for each value of K (see additional file 1: "Supplementary Data").
Identification of miRmR modules
We used a second data set, a timecourse study with matched miRNA and mRNA microarray expression profiles, to illustrate our moduleidentification method. This data set corresponded to a drug study for MM patients [9] and had six time points  0, 2 hrs, 4 hrs, 8 hrs, 24 hrs, and 48 hrs. We identified 9661 DE mRNAs, i.e. mRNAs for which the changes in expression profile over the timecourse were statistically significant. We could obtain plausible miRmR pairs for only 3856 (40%) of the DE mRNAs and the total number of miRNAs that targeted one or more of the 3856 mRNAs was 223. Therefore, for this data set, we obtained a 3856 × 223 miRmR map matrix and a 3856 (mRNA) × 5 (time points) expression matrix. The expression matrix had five time points because time point 0 was used as reference to obtain the log2 foldchange values.
As for the leukemia data set, we obtained the mRNA clusters using Clust_{UN} and Clust_{GD}. We varied K from 50 to 140 in intervals of 10 and the average cluster size varied between 77.1 and 27.5. The Pr_{Enrich} values for Clust_{GD} were higher than or equal to that for Clust_{UN} in 70% of the cases (Figure 2b). This result was similar to that obtained for the previous example. While Clust_{GD} returned clusters of size greater than one for every value of K, Clust_{UN} returned a few clusters of size one for K ≥ 130. The best Pr_{Enrich} values obtained using Clust_{GD} and Clust_{UN} were 84% and 78.9%, respectively. Overall, the highest Pr_{Enrich} value was 84% corresponding to K = 50 and these 42 (0.84 × 50) mRNA clusters were used in Step 2 (Figure 1a) of the moduleidentification method.
In addition to the mRNA clusters, Step 2 requires miRNA clusters as an input. We obtained 135 DE miRNAs and of these DE miRNAs, only 70 were found to target one or more of the 3856 mRNAs. Therefore, for miRNA clustering using Clust_{GD} and Clust_{UN}, we used the 70 (miRNAs) × 3856 (mRNAs) map matrix and a 70 (miRNAs) × 5 (time points) expression matrix. We obtained Pr_{Enrich} values for K varying between 10 and 30 in intervals of 5. For this range of values, the average cluster size varied from 7 to 2.3. Both Clust_{GD} and Clust_{UN} returned the same Pr_{Enrich} value for K ≥ 20. The best value of Pr_{Enrich} was 88.9% and this was obtained for several values of K; the smallest of these values was 20. For K = 20, the clusters returned by Clust_{GD} and Clust_{UN} were similar except for the grouping of miR21. This miRNA belonged to a cluster of size one and 41, respectively, for Clust_{GD} and Clust_{UN}. The miRNA clusters obtained using K = 20 for Clust_{UN} were provided as input to Step 2 (Figure 1a) of the moduleidentification method. We chose Clust_{UN} because it is the simpler of the two clustering methods and returned the same Pr_{Enrich} value as that for Clust_{GD}. We obtained 19 miRNA clusters as potentially regulatory. These included clusters of size one because even a single miRNA can regulate multiple mRNAs.
Next, we determined the associations between miRNA clusters and mRNA clusters. While the linear model described earlier can be used to obtain positive or negative associations, we considered only negative associations because miRNAs act as negative regulators of their direct targets. For each of the 798 (19 × 42) miRmR modules, we determined the statistical significance of association using a nonparametric bootstrap test and obtained a total of nine potentially regulatory modules (Figure 3); see additional file 1: "Supplementary Data" for the list of genes in each module. A potentially regulatory miRmR module contained one or more miRNAs. However, only a small fraction of the mRNAs that were predicted as targets of these miRNAs was present in the module (see additional file 1: "Supplementary Data" for the proportion of predicted target mRNAs present in a potentially regulatory module). The five miRNA clusters that had more than one element and belonged to one or more of the nine potentially regulatory modules were  (a) miR135a and miR135b, (b) miR148a and miR148b, (c) miR15b, miR16 and miR195, (d) miR30b, miR30c and miR30 d, (e) miR204 and miR211 (see additional file 1: "Supplementary Data" for the expression profiles of miRNAs in the five modules).
Biological implication and interpretation
To examine the biological significance of the statistically significant mRNA clusters obtained using Clust_{GD}, we performed a gene ontology overrepresentation analysis using DAVID [24]. We observed that approximately 12% of the mRNA clusters were enriched for gene ontology terms (see additional file 1: "Supplementary Data" for the pvalues and a list of mRNAs). Specifically, the mRNA clusters were enriched for  apoptosis (biological process), endomembrane system (cellular component), membrane fraction (cellular component), symporter activity (molecular function), and protein phosphatase binding (molecular function). In addition, one of the clusters was enriched for genes in the endocytosis biochemical pathway. In the apoptosis cluster, RAC1 was a gene of interest as little is known about the role of RAC1 in myeloma and its progression. RAC1 signalling plays a critical role in apoptosis and tumorigenesis and has been shown to impair p53deficient lymphoma cell growth [25]. Indeed, blockage of RAC1 induces cell cycle arrest or apoptosis in breast cancer cells [26]. In the biochemical pathway cluster, CXCR4 was a gene of interest. This gene is one of the key members of endocytosis biochemical pathway and has been shown to have clinical significance. Specifically, the expression of this chemokine receptor, in addition to CCR1 and CCR2, is correlated with disease state and survival in myeloma patients [27].
Next, we focused on the potentially regulatory miRmR modules. The statistically significant miRmR module containing the miRNAs miR15b, miR16, and miR195 had 84 mRNAs; the miRmR map matrix predicted 81 of these mRNAs as targets of all three miRNAs. Even though all the miRmR pairs in this module were not observed to have a statistically significant association, grouping these mRNAs and miRNAs together reflects the predicted regulatory role of the three miRNAs visàvis the 84 mRNAs.
Although miR15a was not identified as DE over the timecourse, miR15b and miR15a shared 98% of their target mRNAs (based on the miRmR map matrix). This implies that the net effect of a change in the expression of miR15a and miR15b on cellular biology would be similar. MiR15a and miR16 have been previously identified as an oncogenic cluster that is frequently deleted in chronic lymphocytic leukemia [28]. The clustering of these two miRNAs suggests that our method is able to identify biologically relevant miRNA clusters. In addition to the identification of miR15 and miR16 cluster as important in the pathogenesis of a number of malignancies, the mechanisms explaining their role have also been described in literature. This cluster has been shown to modulate multidrug resistance [29], apoptosis [30–32] and cellular proliferation [32, 33]. The role of this cluster in controlling the proliferation of the cell is due to the targeting of several key cellcycle genes such as Cyclin D1 [34, 35] and Cyclin E1 [36]. Furthermore, miR195 was identified as belonging to the same cluster as miR15b and miR16, suggesting that miR195 may play a role in cellcycle in a similar manner to miR15b and miR16. This implied function was confirmed in a study by Xu et al. [37], where ectopic expression of miR195 blocked cell cycle progression by regulating RbE2F signaling possibly via an interaction with Cyclin D.
Our analysis may also have the ability to identify other interactions that were not previously identified, thereby, providing further information on complex gene regulatory pathways. The miRNAs miR15 and miR16 induce apoptosis by suppressing the antiapoptotic protein BCL2 [30], and both miRNAs were found to target a cluster of mRNAs containing among others, G0S2 and BFAR. G0S2 has previously been shown to interact with BCL2 and promote apoptosis by preventing the formation of BCl2/Bax heterodimers [38], and BFAR has been shown to bind to BCL2 [39].
Effect of expression data on miRmR modules
For a matched miRNA and mRNA microarray expression data set, the miRNA and mRNA clusters correspond to groups of DE miRNAs and mRNAs, respectively, which are present in the miRmR map matrix. Our timecourse data set comprised patients with MM and we tested the null hypothesis that the expression pattern of a biological entity (miRNA or mRNA) was the same at all time points. If there was evidence against the null hypothesis, then we considered the biological entity to be DE. In contrast, the MM data set of Gutierrez et al. [21] (described earlier during the comparison of guided and unguided clustering methods) comprised healthy individuals and patients belonging to four categories of cytogenetic abnormalities. We tested the null hypothesis that the expression pattern of a biological entity was the same in all categories of MM patients and in healthy individuals. If there was evidence against the null hypothesis, then we considered the biological entity to be DE. Since the null hypotheses were different for the two data sets, the resulting DE biological entities (i.e. DE miRNAs and mRNAs) were different as well. For example, while miR15b was DE in the timecourse data set, miR15a was DE in the Gutierrez et al. data set.
For the timecourse data set, the number of DE mRNAs and miRNAs that were present in the miRmR map matrix was 3856 and 70, respectively. For the Gutierrez et al. data set, the number of DE mRNAs and miRNAs that were present in the miRmR map matrix was 1492 and 168, respectively. Since the DE miRNAs and mRNAs that were provided as input for clustering were different for the two data sets, the miRmR modules obtained using the two data sets were different. For example, we identified a module comprising miR30b, miR30c, and miR30 d using the timecourse data set and a module comprising miR30a5p, miR30b, miR30c, miR30 d, and miR30e5p using the Gutierrez et al. data set (see additional file 1: "Supplementary Data" for the enriched miRmR modules obtained using the Gutierrez et al. data set). Similarly, for the timecourse data set, we identified a miRmR module comprising miR15b, miR16 and miR195. In contrast, for the Gutierrez et al. data set, we identified a miRmR module comprising miR15a, miR16 and miR195. While the miRmR module (comprising miR15, miR16 and miR195) identified using the timecourse data set had 84 mRNAs, the module identified using the Gutierrez et al. data set had 35 mRNAs. Of these, 16 mRNAs were common to both modules.
Clustering miRNAs from different families
To determine whether miRNAs from different families could be grouped together, we modified the miRmR map matrix so that it contained only one miRNA from each family. The number of cotargeted mRNAs in the modified miRmR map matrix was used to determine the distances between miRNA families. Next, these distances were used to obtain clusters of miRNA families. For the timecourse data set, the median number of mRNAs that were cotargeted by two miRNA families was one. This implied that the distance between any two miRNA families was too large for the families to be clustered together. Indeed, we observed that none of the clusters with two or more miRNAs were statistically significant.
We generated the miRmR map matrix using a combination of four targetprediction algorithms  miRanda (as implemented in miRBase), PicTar, TargetScanS and miRGen (intersection of PicTar (4way) and TargetScanS). For these algorithms, the number of cotargeted mRNAs obtained using miRNAs from the same family was higher than obtained using miRNAs from different families. Consequently, the miRNA clusters and, hence, the miRmR modules contained miRNAs from the same family. It may be possible to obtain miRmR modules comprising miRNAs from different families using a targetprediction algorithm such as TargetMiner [40]. To illustrate this, we reanalyzed the timecourse data set using a miRmR map matrix based on TargetMiner. The new miRmR map matrix corresponded to 80 DE miRNAs and 4806 DE mRNAs. The value of K, corresponding to the highest Pr_{enrich} value, for miRNAs and mRNAs was 15 and 60, respectively. While the number of enriched miRNA clusters (including clusters of size one) was 15, the number of enriched mRNA clusters was 59. Therefore, the number of possible miRmR modules was 885 (15 × 59) and nine of these were found to be statistically significant (see additional file 1: "Supplementary Data" for the enriched miRmR modules obtained using TargetMiner). We observed that two of the statistically significant miRmR modules contained miRNAs from different families  hsamiR302b, hsamiR373, and hsamiR520e.
Discussion
We present a method for identifying regulatory modules  miRNA clusters that potentially regulate mRNA clusters. While some methods, e.g. [10, 11] can be used to obtain miRNA and mRNA clusters without matched miRmR expression data, our results show that matched expression data is necessary for identifying the cluster pairs (i.e. miRmR modules) of potential biological interest. For example, the miRNAs  miR15b, miR16 and miR195 cotarget 341 mRNAs (based on the miRmR map matrix) and less than 25% of these (84 mRNAs) were included in a potentially regulatory miRmR module. The remaining cotargets belonged to other mRNA clusters and were not found to have a statistically significant association to the miRNA cluster (miR15b, miR16, and miR195).
Our method enables the identification of mRNA clusters that are coregulated by multiple miRNAs as well as those that are regulated by just one miRNA. Since the miRNA clusters are derived from the miRmR map matrix, miRNAs that share very few cotargets with other miRNAs belong to clusters of size one. If a miRNA is not statistically similar (in terms of cotargets) to other miRNAs, then the associated mRNA cluster is regulated by just one miRNA. On the other hand, if two or more miRNAs are statistically similar (in terms of cotargets), then the associated mRNA cluster is coregulated by multiple miRNAs.
Our method can be used to identify miRmR pairs that have a negative or positive association. A positive association may be indicative of an indirect relationship between the miRmR pair, e.g. Ritchie et al. [14] showed a positive association between hsamiR92 and PCNA. Therefore, our method can be used to obtain miRmR modules that represent either direct regulation or indirect regulation of mRNAs. The current measure of association can also be extended to capture other desired and more complex relationships. For example, in case of timecourse data, there may be a time lag between changes in miRNA expression and that of its target mRNAs. Since the output of our method is a network of interactions between miRNAs and mRNAs, capturing the timelag information can enable the identification of variations in modules over time. These timedependent modules could perhaps be used as an initial step in the construction of gene regulatory networks.
A potentially regulatory miRmR module has a statistically significant association between the miRNA cluster and mRNA cluster. However, every miRmR pair in the module is not required to have a statistically significant association. This may enable us to identify relationships that were hitherto hidden owing to the noise in expression data. The module comprising 84 mRNAs and three miRNAs (miR15b, miR16 and miR195) had mRNAs (e.g. G0S2 and BFAR) with a statistically significant negative association to all three miRNAs. In contrast, some mRNAs (WDTC1, PEX13, FAM54B, and PPP1R11) had a statistically significant negative association to only miR195. Since these associations were obtained using a short timecourse data set (five time points), which has the problem of noise, we recommend that all three miRNAs be considered as coregulators of the four mRNAs. Unlike our method, the PengLi method requires every miRmR pair in a module to have a statistically significant association. Therefore, the PengLi method would split the 84 mRNAs into several modules, e.g. the four mRNAs that had a negative association to only miR195 would form one module. In this regard, the PengLi method can be treated as a special case of our method.
We note that the miRmR modules will vary based on the input miRmR map matrix. For the timecourse data set, the miRmR modules obtained using a combination of miRanda, PicTar, TargetScanS and miRGen were different from that obtained using TargetMiner. For a miRmR module containing multiple miRNAs, all the miRNAs could be from the same family. This is because the miRNAs from the same family share multiple targets in the map matrix. It is possible to change the map matrix and consider miRNA families instead of individual miRNAs. This would ensure that a miRmR module with multiple miRNAs contains miRNAs from different families. However, such a map matrix should be considered only if the miRNAs in a family have similar expression profiles. Otherwise, the association between individual miRNAs in the family and mRNA would vary, resulting in a weak (and perhaps not statistically significant) overall association between the miRNA family and mRNA. This, in turn, would reduce the number of enriched miRmR modules.
Conclusions
We present a miRmR module identification method that facilitates better biological interpretation of highthroughput miRNA and mRNA microarray data. A typical analysis of microarray data returns numerous mRNAs (> 1000) and miRNAs (> 100) as DE. Since each DE miRNA potentially targets hundreds of DE mRNAs, the number of potential miRmR interactions exceeds 1,00,000. Our method converts this massive amount of information into a small number of potentially regulatory miRmR modules, thereby, enabling biologists to identify the miRNAs and mRNAs for further experimentation. For example, for the timecourse data set, our method reduced the number of potential miRmR interactions from over 1,00,000 to just nine miRmR modules.
Methods
Preprocessing of "leukemia data set"
We obtained a publicly available data set, GSE16558 [21], from the Gene Expression Omnibus [41, 42] repository. This data set corresponds to a MM study comprising healthy donors and patients belonging to four categories  no cytogenetic abnormality, cytogenetic abnormality t(4;14) (with or without RB deletion), cytogenetic abnormality t(11;14) (with or without RB deletion), and RB deletion as a unique cytogenetic abnormality. The miRNA expression profiles were obtained using TaqMan lowdensity arrays. These expression profiles were normalized using the mean of RNU44 and RNU48, as suggested by the authors [21]. The mRNA expression profiles were obtained using Affymetrix Human Gene 1.0 ST arrays and the preprocessing steps included RMA background correction [43], quantile normalization [44] and summarization of mRNA expression using the median polish algorithm.
After normalization, we obtained the log2 foldchange values for MM patients with respect to healthy donors. For each Affymetrix probeset, we tested the null hypothesis that there was no difference in log2 foldchange values over the four patient categories. Next, we adjusted the pvalues for multiple comparisons using the Benjamini and Hochberg (BH) [45] method and obtained DE probesets, i.e. probesets with adjusted pvalues less than 0.05. Since one or more probesets map to the same mRNA, we considered an mRNA to be DE if at least one of the associated probesets was DE. Similarly, we obtained the DE miRNAs.
Preprocessing of "timecourse data set"
We considered a short timecourse data set comprising matched miRNA and mRNA microarray expression data. This data set corresponded to a drug study involving MM patients [9] and had six time points  0, 2 hrs, 4 hrs, 8 hrs, 24 hrs, and 48 hrs. The miRNA expression data was obtained using Exiqon arrays (V 8.1) and the mRNA expression data was obtained using Affymetrix arrays (Human Genome U133 Plus 2.0 array). For the twocolor miRNA data, data preprocessing steps included background intensity subtraction and withinarraynormalization [46] using the global loess method. For the singlecolor mRNA data, data preprocessing steps included RMA background correction [43], quantile normalization [44] and summarization of mRNA expression using the median polish algorithm. We note that each miRNA corresponded to two probes on an Exiqon array and each mRNA corresponded to one or more probesets on the Affymetrix array. Henceforth, we use the terms probe and probeset interchangeably.
For each probeset, we tested the null hypothesis that there was no difference in expression profiles over the timecourse. The pvalues were adjusted for multiple comparisons using the BH [45] method. We considered a probeset to be DE if the adjusted pvalue was less than 0.05. Since one or more probesets map to the same miRNA or mRNA, we considered an miRNA or mRNA to be DE if at least one of the associated probesets was DE.
Generation of miRNAmRNA map matrix
We obtained the plausible miRmR pairs by combining the results from four targetprediction databases  miRBase, PicTar, TargetScanS and miRGen (intersection of PicTar(4way) and TargetScanS). Specifically, we considered a miRmR pair to be plausible if it was predicted by at least two of the four databases. Therefore, an element [j, i] of the miRmR map matrix took the value 1 if the j^{th} mRNA was predicted as a target of the i^{th} miRNA by two or more targetprediction databases. Otherwise, the element [j, i] took the value 0.
Generation of expression matrix for clustering
Since each miRNA or mRNA corresponded to one or more probesets, the median of the log2 foldchange values over the relevant probesets was considered to be the log2 foldchange value for the miRNA or mRNA of interest. If there were T conditions, e.g. T time points, then the median log2 foldchange value was calculated for each condition and a Z × T expression matrix obtained, where Z denotes the number of miRNAs or mRNAs.
Abbreviations
 BH:

Benjamini and Hochberg
 Clust_{GD}:

Guided clustering
 Clust_{UN}:

Unguided clustering
 DE:

Differentially expressed
 GST:

Gene set test
 MM:

Multiple myeloma
 MRF:

Multivariate random forest
 miRNA:

microRNA
 miRmR:

miRNAmRNA
 OR:

Oddsratio
 Pr_{Enrich}:

Percentage of clusters that are enriched.
References
Calin GA, Cimmino A, Fabbri M, Ferracin M, Wojcik SE, Shimizu M, Taccioli C, Zanesi N, Garzon R, Aqeilan RI, et al: MiR15a and miR161 cluster functions in human leukemia. Proceedings of the National Academy of Sciences of the United States of America. 2008, 105 (13): 51665171. 10.1073/pnas.0800121105.
He L, Thomson JM, Hemann MT, HernandoMonge E, Mu D, Goodson S, Powers S, CordonCardo C, Lowe SW, Hannon GJ, et al: A microRNA polycistron as a potential human oncogene. Nature. 2005, 435 (7043): 828833. 10.1038/nature03552.
Lu J, Getz G, Miska EA, AlvarezSaavedra E, Lamb J, Peck D, SweetCordero A, Ebet BL, Mak RH, Ferrando AA, et al: MicroRNA expression profiles classify human cancers. Nature. 2005, 435 (7043): 834838. 10.1038/nature03702.
Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120 (1): 1520. 10.1016/j.cell.2004.12.035.
Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, et al: Combinatorial microRNA target predictions. Nature Genetics. 2005, 37 (5): 495500. 10.1038/ng1536.
John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS: Human microRNA targets (vol 2, pg 1862, 2005). Plos Biology. 2005, 3 (7): 13281328. 10.1371/journal.pbio.0030264.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: Gene set enrichment analysis: A knowledgebased approach for interpreting genomewide expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102 (43): 1554515550. 10.1073/pnas.0506580102.
Efron B, Tibshiran R: On Testing the Significance of Sets of Genes. Annals of Applied Statistics. 2007, 1 (1): 107129. 10.1214/07AOAS101.
Jayaswal V, Lutherborrow M, Ma DDF, Yang YH: Identification of microRNAs with regulatory potential using a matched microRNAmRNA timecourse data. Nucleic Acids Research. 2009, 37 (8): 10.1093/nar/gkp153.
Yoon SR, De Micheli G: Prediction of regulatory modules comprising microRNAs and target genes. Bioinformatics. 2005, 21: 93100. 10.1093/bioinformatics/bti1116.
Joung JG, Fei ZJ: Identification of microRNA regulatory modules in Arabidopsis via a probabilistic graphical model. Bioinformatics. 2009, 25 (3): 387393. 10.1093/bioinformatics/btn626.
Joung JG, Hwang KB, Nam JW, Kim SJ, Zhang BT: Discovery of microRNAmRNA modules via populationbased probabilistic learning. Bioinformatics. 2007, 23 (9): 11411147. 10.1093/bioinformatics/btm045.
Peng XX, Li Y, Walters KA, Rosenzweig ER, Lederer SL, Aicher LD, Proll S, Katze MG: Computational identification of hepatitis C virus associated microRNAmRNA regulatory modules in human livers. Bmc Genomics. 2009, 10:
Ritchie W, Rajasekhar M, Flamant S, Rasko JEJ: Conserved Expression Patterns Predict microRNA Targets. Plos Computational Biology. 2009, 5 (9): 10.1371/journal.pcbi.1000513.
Kaufman L, Rousseeuw PJ: Finding groups in data: An introduction to cluster analysis. 1990, John Wiley & Sons
Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2001, Springer
Xiao Y, Segal MR: Identification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests. Plos Computational Biology. 2009, 5 (6): 10.1371/journal.pcbi.1000414.
Segal MR: TreeStructured Methods for Longitudinal Data. Journal of the American Statistical Association. 1992, 87 (418): 407418. 10.2307/2290271.
Phuong TM, Lee D, Lee KH: Regression trees for regulatory element identification. Bioinformatics. 2004, 20 (5): 750U606. 10.1093/bioinformatics/btg480.
Breiman L: Random forests. Machine Learning. 2001, 45 (1): 532. 10.1023/A:1010933404324.
Gutierrez NC, Sarasquete ME, MisiewiczKrzeminska I, Delgado M, De Las Rivas J, Ticona FV, Ferminan E, MartinJimenez P, Chillon C, Risueno A, et al: Deregulation of microRNA expression in the different genetic subtypes of multiple myeloma and correlation with gene expression profiling. Leukemia. 2010, 24 (3): 629637. 10.1038/leu.2009.274.
GriffithsJones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Research. 2006, 34: D140D144. 10.1093/nar/gkj112.
Megraw M, Sethupathy P, Corda B, Hatzigeorgiou AG: miRGen: a database for the study of animal microRNA genomic organization and function. Nucleic Acids Research. 2007, 35: D149D155. 10.1093/nar/gkl904.
Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for annotation, visualization, and integrated discovery. Genome Biology. 2003, 4 (9): 10.1186/gb200349r60.
Bosco EE, Ni WJ, Wang L, Guo FK, Johnson JF, Zheng Y: Rac1 targeting suppresses p53 deficiencymediated lymphomagenesis. Blood. 115 (16): 33203328. 10.1182/blood200902202440.
Yoshida T, Zhang YQ, Rosado LAR, Chen JJ, Khan T, Moon SY, Zhang BL: Blockade of Rac1 Activity Induces G(1) Cell Cycle Arrest or Apoptosis in Breast Cancer Cells through Downregulation of Cyclin D1, Survivin, and XLinked Inhibitor of Apoptosis Protein. Molecular Cancer Therapeutics. 9 (6): 16571668. 10.1158/15357163.MCT090906.
de Broek , Leleu X, Schots R, Facon T, Vanderkerken K, Van Camp B, Van Riet I: Clinical significance of chemokine receptor (CCR1, CCR2 and CXCR4) expression in human myeloma cells: the association with disease activity and survival. Haematologicathe Hematology Journal. 2006, 91 (2): 200206.
Calin GA, Dumitru CD, Shimizu M, Bichi R, Zupo S, Noch E, Aldler H, Rattan S, Keating M, Rai K, et al: Frequent deletions and downregulation of microRNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proceedings of the National Academy of Sciences of the United States of America. 2002, 99 (24): 1552415529. 10.1073/pnas.242606799.
Xia L, Zhang DX, Du R, Pan YL, Zhao LN, Sun SR, Hong L, Liu J, Fan DM: miR15b and miR16 modulate multidrug resistance by targeting BCL2 in human gastric cancer cells. International Journal of Cancer. 2008, 123 (2): 372379. 10.1002/ijc.23501.
Cimmino A, Calin GA, Fabbri M, Iorio MV, Ferracin M, Shimizu M, Wojcik SE, Aqeilan RI, Zupo S, Dono M, et al: miR15 and miR16 induce apoptosis by targeting BCL2. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102 (39): 1394413949. 10.1073/pnas.0506654102.
Guo CJ, Pan Q, Li DG, Sun H, Liu BW: miR15b and miR16 are implicated in activation of the rat hepatic stellate cell: An essential role for apoptosis. Journal of Hepatology. 2009, 50 (4): 766778. 10.1016/j.jhep.2008.11.025.
Satzger I, Mattern A, Kuettler U, Weinspach D, Voelker B, Kapp A, Gutzmer R: MicroRNA15b represents an independent prognostic parameter and is correlated with tumor cell proliferation and apoptosis in malignant melanoma. Int J Cancer. 2009
Bonci D, Coppola V, Musumeci M, Addario A, Giuffrida R, Memeo L, D'Urso L, Pagliuca A, Biffoni M, Labbaye C, et al: The miR15amiR161 cluster controls prostate cancer by targeting multiple oncogenic activities. Nature Medicine. 2008, 14 (11): 12711277. 10.1038/nm.1880.
Chen RW, Bemis LT, Amato CM, Myint H, Tran H, Birks DK, Eckhardt SG, Robinson WA: Truncation in CCND1 mRNA alters miR161 regulation in mantle cell lymphoma. Blood. 2008, 112 (3): 822829. 10.1182/blood200803142182.
Liu Q, Fu HJ, Sun F, Zhang HM, Tie Y, Zhu J, Xing RY, Sun ZX, Zheng XF: miR16 family induces cell cycle arrest by regulating multiple cell cycle genes. Nucleic Acids Research. 2008, 36 (16): 53915404. 10.1093/nar/gkn522.
Wang F, Fu XD, Zhou Y, Zhang Y: Downregulation of the cyclin E1 oncogene expression by microRNA161 induces cell cycle arrest in human cancer cells. Bmb Reports. 2009, 42 (11): 725730.
Xu T, Zhu Y, Xiong Y, Ge YY, Yun JP, Zhuang SM: MicroRNA195 suppresses tumorigenicity and regulates G1/S transition of human hepatocellular carcinoma cells. Hepatology. 2009, 50 (1): 113121. 10.1002/hep.22919.
Welch C, Santra MK, ElAssaad W, Zhu XC, Huber WE, Keys RA, Teodoro JG, Green MR: Identification of a Protein, G0S2, That Lacks Bcl2 Homology Domains and Interacts with and Antagonizes Bcl2. Cancer Research. 2009, 69 (17): 67826789. 10.1158/00085472.CAN090128.
Zhang H, Xu QL, Krajewski S, Krajewska M, Xie ZH, Fuess S, Kitada S, Pawlowski K, Godzik A, Reed JC: BAR: An apoptosis regulator at the intersection of caspases and Bcl2 family proteins. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97 (6): 25972602. 10.1073/pnas.97.6.2597.
Bandyopadhyay S, Mitra R: TargetMiner: microRNA target prediction with systematic identification of tissuespecific negative examples. Bioinformatics. 2009, 25 (20): 26252631. 10.1093/bioinformatics/btp503.
Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 2002, 30 (1): 207210. 10.1093/nar/30.1.207.
Barrett T, Edgar R, Kimmel Alan, Brian O: Gene Expression Omnibus: Microarray Data Storage, Submission, Retrieval, and Analysis. Methods in Enzymology. 2006, Academic Press, 411: 352369. 10.1016/S00766879(06)110198.
Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4 (2): 249264. 10.1093/biostatistics/4.2.249.
Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185193. 10.1093/bioinformatics/19.2.185.
Benjamini Y, Hochberg Y: Controlling the False Discovery Rate  a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series BMethodological. 1995, 57 (1): 289300.
Smyth GK, Speed T: Normalization of cDNA microarray data. Methods. 2003, 31 (4): 265273. 10.1016/S10462023(03)001555.
Acknowledgements
We thank Dr Tiffany Khong and Dr Andrew Spencer of the Myeloma Research Group, Alfred Hospital, Melbourne, Australia for providing the treated cell line samples. This work was supported in part by ARC through grants DP0770395 (YY, VJ), DP1095320 (VJ), and FT0991918 (YY), Arrow Bone Marrow Transplant Foundation (DDFM, ML) and SVH Haematology Research Funds (DDFM, ML).
Author information
Authors and Affiliations
Corresponding author
Additional information
Authors' contributions
VJ and YHY conceptualized and developed the model, and performed data analysis. DDFM and ML provided biological inputs during model development and interpreted the results. All authors contributed to the writing of the manuscript.
Electronic supplementary material
12864_2010_9978_MOESM1_ESM.DOC
Additional file 1:Supplementary Data. This file contains (1) a simple illustration of the MRF method, (2) enrichment scores for different values of node_size, (3) a comparison of PAM and hierarchical clustering algorithms, (4) mRNA clusters enriched for GO terms, (5) pvalues for potentially regulatory miRmR modules, (6) list of miRNAs and mRNAs in statistically significant miRmR modules, (7) expression profiles of miRNAs in statistically significant miRmR modules, (8) proportion of predicted target mRNAs in a potentially regulatory module, (9) enriched miRmR modules for the Gutierrez et al. data set, and (10) enriched miRmR modules for the timecourse data set obtained using TargetMiner. (DOC 3 MB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Jayaswal, V., Lutherborrow, M., Ma, D.D. et al. Identification of microRNAmRNA modules using microarray data. BMC Genomics 12, 138 (2011). https://doi.org/10.1186/1471216412138
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1471216412138
Keywords
 miRNA Expression
 Child Node
 Differentially Express
 miRNA Family
 mRNA Expression Profile