- Methodology article
- Open Access
BayMiR: inferring evidence for endogenous miRNA-induced gene repression from mRNA expression profiles
BMC Genomics volume 14, Article number: 592 (2013)
Popular miRNA target prediction techniques use sequence features to determine the functional miRNA target sites. These techniques commonly ignore the cellular conditions in which miRNAs interact with their targets in vivo. Gene expression data are rich resources that can complement sequence features to take into account the context dependency of miRNAs.
We introduce BayMiR, a new computational method, that predicts the functionality of potential miRNA target sites using the activity level of the miRNAs inferred from genome-wide mRNA expression profiles. We also found that mRNA expression variation can be used as another predictor of functional miRNA targets. We benchmarked BayMiR, the expression variation, Cometa, and the TargetScan “context scores” on two tasks: predicting independently validated miRNA targets and predicting the decrease in mRNA abundance in miRNA overexpression assays. BayMiR performed better than all other methods in both benchmarks and, surprisingly, the variation index performed better than Cometa and some individual determinants of the TargetScan context scores. Furthermore, BayMiR predicted miRNA target sets are more consistently annotated with GO and KEGG terms than similar sized random subsets of genes with conserved miRNA seed regions. BayMiR gives higher scores to target sites residing near the poly(A) tail which strongly favors mRNA degradation using poly(A) shortening. Our work also suggests that modeling multiplicative interactions among miRNAs is important to predict endogenous mRNA targets.
We develop a new computational method for predicting the target mRNAs of miRNAs. BayMiR applies a large number of mRNA expression profiles and successfully identifies the mRNA targets and miRNA activities without using miRNA expression data. The BayMiR package is publicly available and can be readily applied to any mRNA expression data sets.
MicroRNAs are short (21-25 nt) non-coding RNAs that repress the expression of their direct targets [1–4]. Primary miRNAs (pri-miRNAs) are transcribed from intra/intergenic genomic loci and cleaved by Drosha to form approximately 70-nt hairpin precursors (called pre-miRNAs) that are subsequently cleaved by the RNase III enzyme, Dicer, to generate miRNA duplexes . One strand of the duplex, the mature miRNA, is loaded into the RNA-induced silencing complex (RISC)  and guides it to recognize mRNA targets through partial base pairing with the 3’ UTRs of targets .
The presence of target sites with perfect complementarity to the seed region of miRNAs is a strong predictor of targeting but perfect complementarity is neither sufficient nor necessary [7–10]. Many other determinants have been proposed to specify efficient mRNA-miRNA duplexes including: AU composition flanking target sites , thermodynamic stability of binding sites , evolutionary conservation of the seed [12–14], secondary structure accessibility [6, 15–17], target-site abundance [18, 19], seed-pairing stability , 3’ pairing contribution , loop in position 9-12 of miRNA-mRNA hybrids , and the binding location in the 3’ UTR [8, 17]. Due to the limited number of validated miRNA targets, the exact specificity and sensitivity of current determinants are unclear [20–23]; however, estimates of precision of these determinants, alone or together, are typically reported to be about 50% at a sensitivity of 6-12% [24, 25], suggesting that sequence-based prediction methods are not fully capturing miRNA target preferences.
In mammals, it is estimated that miRNAs primarily and dominantly repress the steady-state expression level of their targets [26–34]. Therefore, down-regulation of an mRNA’s expression when the miRNA is active is evidence of a functional target site on the gene in vivo. Although numerous methods have been introduced to incorporate mRNA and miRNA expression data into miRNA target predictions, existing methods either require paired miRNA-mRNA data [35–48], have only been tested in miRNA transfection assays [28, 29, 49], or do not consider the combinatorial impact of multiple miRNAs on mRNA expression [50, 51].
In this paper, we introduce two new mRNA-miRNA scoring schemes by incorporating genome-wide measures of mRNA expression in target prediction. Neither of these scoring schemes requires miRNA expression data, so can be applied to vast amount of publicly available mRNA expression databases. The first scoring scheme identifies the impact of a miRNA in repressing an mRNA in presence of other targeting miRNAs, cellular activities, and under a wide range of endogenous conditions. This scheme (hereafter called the BayMiR score) is obtained using BayMiR, a sparse Bayesian linear regression model, in which the decrease in expression levels of an mRNA across different conditions is explained in terms of the activity of miRNAs that have conserved target site matches in the 3’ UTR of the transcript. BayMiR infers miRNA activity levels based on the expression profiles of its putative targets (predicted on the basis of conserved seed matches) and then it refines these target predictions using the regression model. We also found that expression variability is significantly higher among mRNAs with more miRNA target sites and, furthermore, that it can be used to identify more likely targets. Accordingly, we used the variance of gene expression levels across a wide range of samples including different cell types, cell lines, and disease/healthy tissues as another mRNA-miRNA scoring scheme. These scores are called “gene variation” index.
BayMiR analysis was conducted on 1,539 human miRNAs and the expression levels of 13,303 genes measured on 5,372 microarray experiments and predicts that approximately 60% of miRNA-mRNA duplexes with matched conserved targets sites have detectable down-regulation signal on gene expression. We evaluated and compared the efficacy of the proposed scores with eight TargetScan scores (a collection of most important sequence based features) as well as Cometa scores (an mRNA expression based miRNA target prediction method) using over-expression miRNAs experiments, validated targets, and GO and KEGG enrichment analysis. Using these benchmarks, we found the BayMiR scores consistently outperform both the sequence and expression scores and identify to what extent down-regulated genes on a global set of microarrays are under control of miRNAs.
BayMiR (Figure 1) calculates the degree to which mRNA down-regulation inferred from a large set of microarrays can be explained by inferred miRNA activity. BayMiR makes this prediction by integrating sequence and expression evidence. Because many targets are under the control of multiple miRNAs [20, 46, 52, 53], BayMiR applies a linear model that relates the target expression vector (measured variable) to a weighted combination of the miRNA activity vectors (regressor variables). BayMiR infers the activity vector of a given miRNA by averaging the normalized expression vectors of its predicted mRNA targets based on sequence-based prediction methods. These miRNA activity vectors are then used as regressors in a Bayesian linear regression model of the “down-regulation” expression vector of each mRNA. The resulting regression coefficients of each miRNA are interpreted as the strength of miRNA-mediated repression of the target mRNA.
We also considered the variability in gene expression of a target mRNA as a determinant to distinguish functional and non-functional targets of a given miRNA. The gene variation index for each mRNA is computed as the variance of gene expression levels across all samples.
Each expression vector consists of the transcriptional abundance of the target in one of 392 biological samples collected from 5,372 microarray experiments. We determine the coefficients of the regression model using a penalized likelihood approach called elastic net regression  (see Methods) modified to assign only positive coefficients. By using this regression model, each sequence-predicted miRNA-mRNA interaction is assigned one coefficient; this coefficient represents how much the inferred activity profile of that miRNA contributes to predicting that mRNA’s “down-regulation” profile (see Methods) when considering the activity profiles of all other miRNAs predicted to target the mRNA. We call these coefficients “BayMiR scores” and interpret a zero BayMiR score as representing a lack of evidence in the expression data for regulation of the mRNA by that miRNA.
BayMiR identifies highly repressed targets on miRNA over-expression assays
To evaluate whether the BayMiR scores reflect the strength of miRNA-mediated repression of mRNA targets, we measured the consistency between the BayMiR scores and relative down-regulation of targets in a set of miRNA over-expression experiments. One expects high scoring targets to be down-regulated more in miRNA over-expression experiments. We note that a similar metric has previously been used to evaluate the efficiency of TargetScan scores [8, 18], and that this set of miRNA over-expression assays were not used in BayMiR to obtain the scores; thus, we are not influencing the results of our evaluation by either selecting bias metrics or by evaluating our model on the training data. We downloaded the data collected by Khan et. al  in which 23 miRNAs were transfected into seven different cell types and the log-fold change of the expression levels of mRNAs were measured. To examine that the degree to which our scores can predict the log-fold change of mRNAs in the miRNA over-expression arrays, for each score, we binned mRNAs into five bins based on their scores and computed the mean of mRNA log-fold changes in each bin. We observed that negative log-fold repression levels decrease consistently as scores decrease for both determinants (Figure 2.(top)). In total, 3,867 out of 10,125 mRNAs are down-regulated in the miRNAs over-expression experiments. We then asked if our scoring schemes can detect repressed targets better than the individual components of the TargetScan context score . When comparing negative mean log-fold changes for messages whose scores were greater than the median score for the corresponding miRNA, BayMiR scores outperforms all TargetScan scores, even the context+score which is a combination of all individual TargetScan scores (Figure 2.(middle)). In addition, when we combined BayMiR scores and the TargetScan context+score the performance further improved (Wilcoxon-Mann-Whitney test: P < 0.001), indicating that BayMiR can augment the TargetScan scoring system to further improve the performance. Target site conservation is another scoring scheme used by TargetScan, so we also compared BayMiR scores with conservation scores for all conserved target sites of all conserved miRNA families and found similar improvements (Figure 2.(bottom)). Our analysis also shows that the gene variation score was a better predictor of log-fold change than seed pairing stability, relative location of seed match in the 3’ UTR, and target abundance; however, it is worse than the other components of the context score on this assay (Figure 2(middle)).
High-scoring BayMiR targets are enriched for validated targets
To test whether the set of experimentally validated targets are enriched among high-scoring BayMiR targets, we measured the significance of overlap between the targets with scores greater than the median and the experimentally validated targets retrieved from TarBase . Enrichment using the hyper-geometric test showed that the validated targets are enriched in the sets of high-scoring genes both for BayMiR and gene variation predicted targets, P < 10-5 and P < 10-4 respectively. A cumulative distribution analysis is also shown in Additional file 1: Figure S1. Number of TarBase validated human targets at mRNA level is 491; number of validated targets with conserved target site is 279 and BayMiR predicts 203 of these conserved validated targets (72.8%). Together these observations support that the hypothesis that repressed targets under the endogenous conditions are more likely to be functional targets.
BayMiR predicts miRNA-induced repression better than Cometa
Next, we used the same evaluation strategy to compare BayMiR scores with an mRNA-miRNA scoring method which also uses large-scale gene expression data. Recently, Gennarino et al.  showed that the target set of a miRNA tend to be co-expressed and based on this property they proposed Cometa, a computational method that scores each sequence-based miRNA target prediction based on how correlated it is with other predicted targets of the miRNA. Examining the down-regulated targets on the miRNA over-expression assays shows that negative mean log-fold expression changes for targets selected by our scoring schemes are significantly higher than those selected by Cometa scores (P < 10-40, Additional file 2: Figure S2). Moreover, our methods’ high scoring targets are significantly more down-regulated compared to Cometa high scoring targets (P < 10-60 Figure 3) on the over-expression assays. Although Cometa targets are also enriched for validated targets, this enrichment is smaller than BayMiR scoring targets (P < 0.01 v.s. P < 10-5).
BayMiR target sets have more consistent GO-BP and KEGG annotations
Many miRNAs participate in the coordinate regulation of biological processes ; as such, we should expect that, in general, better target prediction methods would generate miRNA target sets that have higher enrichment . To test whether BayMiR predicted targets are more consistently annotated with GO and KEGG terms than TargetScan targets, we used Fisher’s exact test with an FDR multiple test correction (see Methods) to score the enrichment of 1,233 GO-BP terms and 259 KEGG pathways within the target sets of each of 1,264 miRNA families. We found a nearly three-fold increase in enriched terms and pathways (F D R < 0.1) within BayMiR-predicted target sets compared to equally-sized random subsets of TargetScan (31,976 vs 11,890, P < 10-200).
Examination of the enriched GO-BP terms and KEGG pathways revealed a wide diversity of biological processes regulated by miRNAs (Additional file 3: Table S1, F D R < 0.1 and Additional file 4: Table S2, F D R < 0.1). We found that 35 % of miRNAs that have BayMiR target sets are enriched for the GO term “regulation of expression” suggesting that miRNAs have substantial influence in gene regulation through their control of other gene regulators.
We also searched for miRNAs with known functions among the miRNAs enriched in our pathway analysis. A list of miRNAs with experimentally supported functions among their enriched pathways are given in Additional file 5: Table S3. Notably the miR-17 family is frequently seen in the list. This family has been extensively studied and shown to play an important role in many cancer-related processes and pathways [58, 59], and references in Additional file 5: Table S3.
When we examined the mRNAs in KEGG pathways targeted by miRNAs, we found that although there are extensive co-regulation of mRNAs by multiple miRNAs, a handful of miRNAs appeared to be responsible for most of the regulation. For example, in the WNT signaling pathway, five miRNAs target 32 out of 46 genes predicted to be targeted by any of the 45 miRNAs with targets in this pathway (Figure 4). Similarly, the 106 genes in “Pathways in cancer” are targeted by 83 miRNAs but only 10 of these miRNAs collectively target more than 75% these genes (Additional file 6: Figure S3). Although some of this consolidation of targeting can be explained with a large variability in number of mRNA targets per miRNA, there is significantly more consolidation than we would expect by chance (Figure 5, P < 10-19) These observations suggest that important miRNA regulators of specific biological processes can be identified in silico through gene set enrichment analysis of BayMiR target sets.
miRNA activity and expression profiles are significantly correlated
To test if miRNA activities obtained using the BayMiR procedure are correlated with the miRNA expression profiles, we downloaded the miRNA expression data from the mimiRNA repository  and computed the correlation between matched activity and expression vectors. After excluding miRNA expression data that are not consistent across multiple resources (according to P > 0.05 reported in the mimiRNA resource) and mapping the biological samples of the miRNA expression data to our biological groups we obtained paired matches for 48 miRNAs. Interestingly, we found that 96% of the pairs (46 out 48) have the Pearson correlation coefficients greater than 0.35 compared to 4% positive correlation obtained from a similar analysis but with the permuted activity vectors (P < 0.05 and Additional file 7: Table S4). This correlation analysis shows that miRNA activities inferred from the mean of inverse expression of their targets are highly correlated with expression data for those miRNAs.
mRNAs harboring miRNA target sites near the both ends of the 3’ UTR have higher endogenous down-regulation signals
To investigate any association between endogenous target repression scores provided by BayMiR and sequence and gene variation determinants, we measured the correlation between the scores of all paired determinants(Figure 6). The heat map shows that BayMiR scores correlate most highly with the position contribution scores. In addition, when we ranked all mRNA-miRNA pairs based on their BayMiR scores, the top 50 percentile of the ranked list have higher position contribution scores than the bottom 50 percentile (P < 10-200, Wilcoxon-Mann-Whitney test and Additional file 8: Figure S7). The position contribution scores provide estimate of expected repression in terms of the distance of targets sites from the both end of the 3’ UTR; target sites near to the ORF or the poly(A) tail are more effective  and more conserved than those in the middle of the 3’ UTR . To further investigate this, we located 1,567,294 conserved target sites matched to the seed region of 1,032 miRNAs on the 3’ UTR of 17,840 mRNAs. The start position of each target site was divided by the length of the 3’ UTR to obtain the relative position of miRNAs on the 3’ UTRs, denoted by 0 < L miRNA < 1. We found that target sites located on the both end of 3’ UTRs (L miRNA < 0.25 or L miRNA > 0.75) are assigned higher BayMiR scores than those on the middle (P < 10-200, Wilcoxon-Mann-Whitney test). Furthermore, we found that target sites located in the terminus close to the poly(A) tail (L miRNA > 0.75) are assigned higher BayMiR scores than to those located on the other terminus (L miRNA < 0.25, P < 10-5, Wilcoxon-Mann-Whitney test). Poly(A) shortening is known as one of the mechanisms of mRNA degradation; this mechanism strongly favors the preference of miRNA target sites near the end of 3’UTR close to the poly(A) tail to recruits mRNA deadenylase complexes . Together these lines of evidence underline the importance of target site position in miRNA targeting.
BayMiR scores are also highly correlated with gene variation scores suggesting that mRNAs with high expression variability are under selective pressure to be miRNA targets.
Large-scale mRNA expression profiling datasets provide a rich resource to study the regulatory impact of miRNAs. Here, we showed that the impact of miRNAs on targets is detectable in normal tissue and unperturbed cell line data. Given a list of miRNAs with partial complementarity to a particular mRNA, our computational technique, BayMiR, scores the relative regulatory impact of the miRNA among other predicting targeting miRNAs. We showed that BayMiR estimates of miRNA regulatory impact better reflect independent measures of this impact than the TargetScan context scores; furthermore, we showed that the context scores and BayMiR can be combined to generate even better estimates. We also demonstrated that the miRNA activity vectors that we infer from mRNA expresssion data are well-correlated with the measured expression levels of these miRNAs.
BayMiR has several features that make it particularly useful for estimating the potential regulatory impact of a miRNA. BayMiR models the combinatorial effect of multiple regulatory miRNAs on a single target which is critical, as most mRNAs are likely to be targeted by multiple miRNAs (Additional file 9: Figure S4). BayMiR is fast; its runtime is less than a minute in the current version, so is easily applied to a subset of or all available gene expression data. Because BayMiR estimates the activity of miRNAs based on mRNA expression data, there is no need for matching miRNA expression profiles. As such, BayMiR predictions can be easily extended when new miRNAs are found and the current version of BayMiR incorporates all miRNAs retrieved from the latest release of miRBase (v.19).
Combinatorial regulation by multiple miRNAs has been described for particular mRNAs [8, 62] and is likely to play a large role in mRNA expression regulation . Indeed, human 3’ UTRs contain conserved seed matches for on average 33 of miRNAs (median = 16) (Additional file 9: Figure S4). This combinatorial regulation may explain the observations that inverse correlation under endogenous condition between miRNA and mRNA expression does not provide strong and consistent evidence of targeting [60, 63] and that the impact of miRNA regulation on mRNA levels can only be seen within the context of other miRNA regulations [46, 63]. Additional file 10: Figure S5 shows a toy example where combinatorial regulation masks inverse correlation between miRNA regulators and their targets.
There are a large number of other methods [49–51, 63–72] that infer either miRNA activity or predict miRNA targets based on the expression levels of their sequence-predicted targets, however, no method both infers miRNA activity and predicts miRNA targets while considering the impact of other miRNAs. For example, Cometa attempts to predict miRNA targets, by identifying tight, co-expressed clusters of sequence-predicted targets ; however it doesn’t account for combinatorial regulation by multiple miRNAs and provides no estimate of miRNA activity. Other methods such as Sylamer , and a number of web-based applications [66–68], identify miRNA seed regions that significantly enriched in the 3’ UTRs of down-regulated transcripts as a way of assessing miRNA activity level in a tissue. However, the performance of Sylamer when applied to endogenous gene expression data is unclear. In addition, it does not take into account multiple targeting effect of miRNAs and has not been used to score the individual miRNA-mRNA pairs. Other methods use paired miRNA-mRNA expression patterns to augment sequence-based target prediction [35–48]. These methods typically require paired miRNA and mRNA measurements in a large number of samples to generate reliable predictions. This type of paired expression data is however rare and unavailable for some miRNAs . On the other hand, there is very large amount of mRNA expression data available for BayMiR. Two intronic miRNA target prediction methods, InMiR and Hoctar [51, 63] predict the intronic miRNA targets using the expression levels of their host genes, and subsequently can also incorporate large mRNA expression data. However, these methods can only be applied to intronic miRNAs and only to those miRNAs whose host gene expression is a good surrogate for their activity. Many host gene expression levels are not good surrogates [63, 74–76].
Our analysis also reveals that mRNAs with more target sites have higher expression variation when compared to a random subset of genes, and expression variance consistently increases as number of target sites do (P < 10-33, Additional file 11: Figure S6). These observations suggest that mRNAs with highly variable expression levels are much more likely to be regulated by miRNAs; our finding is consistent with recent reports that genes regulated by miRNAs have higher expression variability at among humans and between human and other primate species .
miRNA transfection experiments have suggested that the degree of mRNA repression induced by two seeds is equivalent to the product of repression induced by the seeds individually . We have observed a similar effect. The version of BayMiR described here implicitly assumes multiplicative interactions because it log-transforms the mRNA expression levels before performing regression. Applying BayMiR to non-transformed expression levels assumes additive interactions and this version of BayMiR performs much worse in our benchmarks (data not shown).
In this paper, we introduced BayMiR and demonstrated its merits when compared to two the state-of-the-art miRNA computational prediction methods. BayMiR applies a more relevant biological model and uses a large collection gene expression data to decipher the impact of miRNAs on gene expression data. We measured this impact in terms of endogenous target repression scores for about half a million miRNA-mRNA duplexes. This new scoring strategy can be used alone or along with other sequence determinants to predict functional miRNA-mRNA interactions.
BayMiR applies the following linear model to relate the changes in the log-transformed expression level of mRNAs to the activity level of miRNAs:
where denote the change in the expression level of the i th mRNA measured across M samples and is obtained by subtracting the mean from yi; W = [ wm,k]M×K denote the activity levels of K miRNAs across M samples, and each element of represents the contribution of the corresponding miRNA in down-regulating the expression of the i th mRNA; ϵ models error. In our problem K = 1,252; M = 369 and i = 1,… 13,000.
In this linear equation, Δ yi and W and are observed; hi is the desired unknown variable. BayMiR infers h by maximizing its posterior probability of h given Δ y and W:
This inference problem can be written in form of a penalized linear regression optimization given by:
where λ i s are two tuning parameters and wm,: is a row vector representing the expression activity of miRNAs in the m th sample. We solved this optimization using the coordinate-descent method  in which, the objective function is partially optimized with respect to each individual coefficient in an iterative manner given by
where S(x,t) is the soft threshold operator defined as s i g n(x)(|x| − t)+ where (y)+ = 0 if y < 0 and (y)+ = y if y ≥ 0 .
Since miRNA and target mRNA expression data are anti-correlated , for each miRNA, BayMiR uses the negative mean of target expression levels as an estimate of the activity level of the miRNA as follows:
and then each activity vector is normalized . As such, the activity of the miRNA will be deemed to be positive when its sequence-predicted targets are below their mean expression level. BayMiR considers a gene as a potential target of a miRNA if there is a complementary conserved match sites to the seed region of the miRNA.
Processing mRNA expression data
The mRNA expression data were downloaded from the EMBL-EBI repository , available at http://www.ebi.ac.uk/gxa/experiment/E-MTAB-62. The data consists of 5,372 samples profiled on HG-U133A array platforms; As described in , the data were normalized and manually labeled into 369 biological groups covering a wide range of healthy/cancer tissues, conditions, and cell lines. We did the following processing on the retrieved expression data; all probe sets with no gene symbols were excluded. The samples belonging to each biological groups were averaged—the samples within one biological group are highly correlated (ρ > 0.85). An upper/lower threshold defined by l th = Q2 − 1.5(Q 4-Q2) and u th = Q4 + 1.5(Q4 − Q2) respectively, when Q2 and Q4 represent the second and forth quartiles, were specified to detect and modify the extreme outliers. The outliers were then replaced with l th or u th . The gene symbol list in both expression and sequence datasets were updated based on the latest release of the HUGO Gene Nomenclature Committee (HGNC) (Feb.2012) to have consistent gene symbols.
MiRNA-mRNA interaction analysis
We downloaded the list of 19,055 protein coding gene symbols from HGNC database and the list of 1,537 miRNA IDs from MiRbase V.19. We then built seven 19,055×1,532 binary connectivity matrices based on the mRNA-miRNA interactions given by: Targetscan V6.1,  and TarBase . All miRNAs are grouped into 1,251 miRNA families as defined by TargetScan—miRNAs sharing the same seed region. Conserved target sites are also retrieved from the TargetScan repository.
Gene ontology biological process (GO-BP) annotations were downloaded from the Gene Ontology Website on April 15th 2012. The file contains 14,000 annotations for 15,000 genes. The enrichment analysis was performed using Fisher Exact test. The test was performed on BayMiR predicted targets of each of miRNA families. The enrichment pvalues were corrected using Benjamini-Hochberg test  and a FDR cutoff equal to 0.1 was chosen to selected significant enrichment categories. The KEGG enrichment analysis carried out in a similar manner; The list of 253 KEGG human pathways were with associated genes downloaded from http://www.genome.jp/kegg/; Fisher exact test was used to find enriched pathways for BayMiR targets of all miRNA families.
Availability of BayMiR and supporting data
The code for BayMiR is available at http://morrislab.med.utoronto.ca/BayMiR. package includes scripts and instructions to re-generate BayMiR scores from the “E-MTAB-62” file and sequence information, however, a pre-computed version of the BayMiR scores are also uploaded.
We developed BayMiR, a new computational method for predicting the target mRNAs of miRNAs. BayMiR applies a large number of mRNA expression profiles and successfully identifies mRNA targets and miRNA activities without using miRNA expression data. We also showed that gene expression variability can be used to predict miRNA targets. Our analysis revealed the importance of miRNA target sites at 3’ UTR near to the poly (A) tails. The BayMiR package is publicly available and can be applied to any mRNA expression datasets.
Bartel D: MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136 (2): 215-233. 10.1016/j.cell.2009.01.002.
John B, Enright A, Aravin A, Tuschl T, Sander C, Marks D: Human microRNA targets. PLoS Biol. 2004, 2 (11): e363-10.1371/journal.pbio.0020363.
Huang Y, Shen XJ, Zou Q, Wang SP, Tang SM, Zhang GZ: Biological functions of microRNAs: a review. J Physiol Biochem. 2011, 67: 129-139. 10.1007/s13105-010-0050-6.
Ambros V: The functions of animal microRNAs. Nature. 2004, 431 (7006): 350-355. 10.1038/nature02871.
Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Rådmark O, Kim S: The nuclear RNase III Drosha initiates microRNA processing. Nature. 2003, 425 (6956): 415-419. 10.1038/nature01957.
Ameres S, Martinez J, Schroeder R: Molecular basis for target RNA recognition and cleavage by human RISC. Cell. 2007, 130: 101-112. 10.1016/j.cell.2007.04.037.
Lewis B, Shih I: Prediction of mammalian microRNA targets. Cell. 2003, 115 (7): 787-798. 10.1016/S0092-8674(03)01018-3.
Grimson A, Farh K, Johnston W, Garrett-Engele P, Lim L, Bartel D: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell. 2007, 27: 91-105. 10.1016/j.molcel.2007.06.017.
Betel D, Koppal A, Agius P, Sander C, Leslie C: Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol. 2010, 11 (8): R90-10.1186/gb-2010-11-8-r90.
Khorshid M, Hausser J, Zavolan M, van Nimwegen E: A biophysical miRNA-mRNA interaction model infers canonical and noncanonical targets. Nat Methods. 2013, 10: 253-255. 10.1038/nmeth.2341.
Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R: Fast and effective prediction of microRNA/target duplexes. Rna. 2004, 10 (10): 1507-10.1261/rna.5248604.
Friedman RC, Farh KKH, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009, 19: 92-105.
Nielsen C, Shomron N, Sandberg R, Hornstein E, Kitzman J, Burge C: Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna. 2007, 13 (11): 1894-10.1261/rna.768207.
Gaidatzis D, Van Nimwegen E, Hausser J, Zavolan M: Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC Bioinformatics. 2007, 8: 69-10.1186/1471-2105-8-69.
Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E: The role of site accessibility in microRNA target recognition. Nature Genet. 2007, 39 (10): 1278-1284. 10.1038/ng2135.
Tafer H, Ameres S, Obernosterer G, Gebeshuber C, Schroeder R, Martinez J, Hofacker I: The impact of target site accessibility on the design of effective siRNAs. Nat Biotechnol. 2008, 26 (5): 578-583. 10.1038/nbt1404.
Majoros W, Ohler U: Spatial preferences of microRNA targets in 3’ untranslated regions. BMC Genomics. 2007, 8: 152-10.1186/1471-2164-8-152.
Garcia D, Baek D, Shin C, Bell G, Grimson A, Bartel D: Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol. 2011, 18 (10): 1139-1146. 10.1038/nsmb.2115.
Arvey A, Larsson E, Sander C, Leslie C, Marks D: Target mRNA abundance dilutes microRNA and siRNA activity. Mol Syst Biol. 2010, 6: 220-225.
Ritchie W, Flamant S, Rasko J: Predicting microRNA targets and functions: traps for the unwary. Nat Methods. 2009, 6 (6): 397-398. 10.1038/nmeth0609-397.
Barbato C, Arisi I, Frizzo M, Brandi R, Da Sacco L, Masotti A: Computational challenges in miRNA target predictions: to be or not to be a true target?. J Biomed Biotechnol. 2009, 1: 150-157.
Saito T, Sætrom P: MicroRNAs–targeting and target prediction. New Biotechnol. 2010, 27 (3): 243-249. 10.1016/j.nbt.2010.02.016.
Hammell M: Computational methods to identify miRNA targets. Seminars in Cell & Developmental Biology. 2010, Elsevier
Alexiou P, Maragkakis M, Papadopoulos G, Reczko M, Hatzigeorgiou A: Lost in translation: an assessment and perspective for computational microRNA target identification. Bioinformatics. 2009, 25 (23): 3049-3055. 10.1093/bioinformatics/btp565.
Min H, Yoon S: Got target?: computational methods for microRNA target prediction and their extension. Exp Mol Med. 2010, 42 (4): 233-10.3858/emm.2010.42.4.032.
Guo H, Ingolia N, Weissman J, Bartel D: Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature. 2010, 466 (7308): 835-840. 10.1038/nature09267.
Mukherji S, Ebert M, Zheng G, Tsang J, Sharp P, van Oudenaarden: MicroRNAs can generate thresholds in target gene expression. Nat Genet. 2011, 43 (9): 854-859. 10.1038/ng.905.
Lim L, Lau N, Garrett-Engele P, Grimson A, Schelter J, Castle J, Bartel D, Linsley P, Johnson J: Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 2005, 433 (7027): 769-773. 10.1038/nature03315.
Sood P, Krek A, Zavolan M, Macino G, Rajewsky N: Cell-type-specific signatures of microRNAs on target mRNA expression. Proc Natl Acad Sci USA. 2006, 103 (8): 2746-10.1073/pnas.0511045103.
Filipowicz W, Bhattacharyya S, Sonenberg N: Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?. Nat Rev Genet. 2008, 9 (2): 102-114.
Baek D, Villén J, Shin C, Camargo F, Gygi S, Bartel D: The impact of microRNAs on protein output. Nature. 2008, 455 (7209): 64-71. 10.1038/nature07242.
Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N: Widespread changes in protein synthesis induced by microRNAs. Nature. 2008, 455 (7209): 58-63. 10.1038/nature07228.
Humphreys D, Westman B, Martin D, Preiss T: MicroRNAs control translation initiation by inhibiting eukaryotic initiation factor 4E/cap and poly (A) tail function. Proc Natl Acad Sci USA. 2005, 102 (47): 16961-10.1073/pnas.0506482102.
Khan A, Betel D, Miller M, Sander C, Leslie C, Marks D: Transfection of small RNAs globally perturbs gene regulation by endogenous microRNAs. Nat Biotechnol. 2009, 27 (6): 549-555.
Vivek J, David M, Yee Y: Identification of microRNA-mRNA modules using microarray data. BMC Genomics.12,
Liu B, Liu L, Tsykin A, Goodall G, Green J, Zhu M, Kim C, Li J: Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics. 2010, 26 (24): 3105-3111. 10.1093/bioinformatics/btq576.
Sales G, Coppe A, Bisognin A, Biasiolo M, Bortoluzzi S, Romualdi C: MAGIA, a web-based tool for miRNA and genes integrated analysis. Nucleic Acids Res. 2010, 38 (suppl 2): W352—W359-
Yu-Ping W, Kuo-Bin L: Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data. BMC Genomics.10,
Jayaswal V, Lutherborrow M, Ma D, Yang Y: Identification of microRNAs with regulatory potential using a matched microRNA-mRNA time-course data. Nucleic Acids Res. 2009, 37 (8): e60-e60. 10.1093/nar/gkp153.
Ruike Y, Ichimura A, Tsuchiya S, Shimizu K, Kunimoto R, Okuno Y, Tsujimoto G: Global correlation analysis for micro-RNA and mRNA expression profiles in human cell lines. J Human Genet. 2008, 53 (6): 515-523. 10.1007/s10038-008-0279-x.
Li X, Gill R, Cooper N, Yoo J, Datta S: Modeling microRNA-mRNA interactions using PLS regression in human colon cancer. BMC Med Genom. 2011, 4: 44-10.1186/1755-8794-4-44.
Muniategui A, Nogales-Cadenas R, Vázquez M, Aranguren X, Agirre X, Luttun A, Prosper F, Pascual-Montano A, Rubio A: Quantification of miRNA-mRNA interactions. PloS one. 2012, 7 (2): e30766-10.1371/journal.pone.0030766.
Huang G, Athanassiou C, Benos P: mirConnX: condition-specific mRNA-microRNA network integrator. Nucleic Acids Res. 2011, 39 (suppl 2): W416-W423.
Nam S, Li M, Choi K, Balch C, Kim S, Nephew K: MicroRNA and mRNA integrated analysis (MMIA): a web tool for examining biological functions of microRNA expression. Nucleic Acids Res. 2009, 37 (suppl 2): W356-W362.
Wuchty S, Arjona D, Li A, Kotliarov Y, Walling J, Ahn S, Zhang A, Maric D, Anolik R, Zenklusen J: Prediction of associations between microRNAs and gene expression in glioma biology. PLoS One. 2011, 6 (2): e14681-10.1371/journal.pone.0014681.
Huang JC, Babak T, Corson TW, Chua G, Khan S, Gallie BL, Hughes TR, Blencowe BJ, Frey BJ, Morris QD: Using expression profiling data to identify human microRNA target. Nat Methods. 2007, 4: 1045-1049. 10.1038/nmeth1130.
Huang J, Morris Q, Frey B: Detecting microRNA targets by linking sequence, microRNA and gene expression data. Research in Computational Molecular Biology. 2006, Springer, 114-129.
Huang J, Frey B, Morris Q: Compating sequence and expression data. Pacific Symposium on Biocomputing, Volume 13. 2008, 52-63.
van Dongen S, Abreu-Goodger C, Enright A: Detecting microRNA binding and siRNA off-target effects from expression data. Nat Methods. 2008, 5 (12): 1023-1025. 10.1038/nmeth.1267.
Gennarino VA, D’Angelo G, Dharmalingam G, Fernandez S, Russolillo G, Sanges R, Mutarelli M, Belcastro V, Ballabio A, Verde P: Identification of microRNA-regulated gene networks by expression analysis of target genes. Genome Res. 2012, 22 (6): 1163-1172. 10.1101/gr.130435.111.
Gennarino VA, Sardiello M, Avellino R, Meola N, Maselli V, Anand S, Cutillo L, Ballabio A, Banfi S: MicroRNA target prediction by expression analysis of host genes. Genome Res. 2008, 19: 481-490. 10.1101/gr.084129.108.
Peter M: Targeting of mRNAs by multiple miRNAs: the next step. Oncogene. 2010, 29 (15): 2161-2164. 10.1038/onc.2010.59.
Krek A, Grun D, Poy M, Wolf R, Rosenberg L, Epstein E, MacMenamin P, da Piedade I, Gunsalus K, Stoffel M: Combinatorial microRNA target predictions. Nat Genet. 2005, 37 (5): 495-500. 10.1038/ng1536.
Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Soft. 2010, 33: 1-
Papadopoulos G, Reczko M, Simossis V, Sethupathy P, Hatzigeorgiou A: The database of experimentally supported targets: a functional update of TarBase. Nucleic Acids Res. 2009, 37 (suppl 1): D155-D158.
Ulitsky I, Laurent L, Shamir R: Towards computational prediction of microRNA function and activity. Nucleic Acids Res. 2010, 38 (15): e160-e160. 10.1093/nar/gkq570.
Huang JC, Morris QD, Frey BJ: Bayesian inference of microRNA targets from sequence and expression data. J Comput Biol. 2007, 14: 550-563. 10.1089/cmb.2007.R002.
Volinia S, Calin G, Liu C, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C, Ferracin M: A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci USA. 2006, 103 (7): 2257-2261. 10.1073/pnas.0510565103.
Uren A, Kool J, Matentzoglu K, De Ridder J, Mattison J, Van Uitert M, Lagcher W, Sie D, Tanger E, Cox T: Large-scale mutagenesis in ip19ARFi and ip53i deficient mice identifies cancer genes and their collaborative networks. Cell. 2008, 133 (4): 727-741. 10.1016/j.cell.2008.03.021.
Ritchie W, Flamant S, Rasko J: mimiRNA: a microRNA expression profiler and classification resource designed to identify functional correlations between microRNAs and their targets. Bioinformatics. 2010, 26 (2): 223-227. 10.1093/bioinformatics/btp649.
Funakoshi Y, Doi Y, Hosoda N, Uchida N, Osawa M, Shimada I, Tsujimoto M, Suzuki T, Katada T: Hoshino Si: Mechanism of mRNA deadenylation: evidence for a molecular interplay between translation termination factor eRF3 and mRNA deadenylases. Genes Dev. 2007, 21 (23): 3135-3148. 10.1101/gad.1597707.
Doench JG, Sharp PA: Specificity of microRNA target selection in translational repression. Genes Dev. 2004, 18 (5): 504-511. 10.1101/gad.1184404.
Radfar M, Wong W, Morris Q: Computational prediction of intronic microRNA targets using host gene expression reveals novel regulatory mechanisms. PLoS One. 2011, 6 (6): e19312-10.1371/journal.pone.0019312.
Cheng C, Li L: Inferring microRNA activities by combining gene expression with microRNA target prediction. PLoS One. 2008, 3 (4): e1989-10.1371/journal.pone.0001989.
Cheng C, Fu X, Alves P, Gerstein M: mRNA expression profiles show differential regulatory effects of microRNAs between estrogen receptor-positive and estrogen receptor-negative breast cancer. Genome Biol. 2009, 10 (9): R90-10.1186/gb-2009-10-9-r90.
Liang Z, Zhou H, He Z, Zheng H, Wu J: mirAct: a web tool for evaluating microRNA activity based on gene expression data. Nucleic acids Res. 2011, 39 (suppl 2): W139-W144.
Alexiou P, Maragkakis M, Papadopoulos G, Simmosis V, Zhang L, Hatzigeorgiou A: The DIANA-mirExTra web server: from gene expression data to microRNA function. PLoS One. 2010, 5 (2): e9171-10.1371/journal.pone.0009171.
Le Brigand K, Robbe-Sermesant K, Mari B, Barbry P: MiRonTop: mining microRNAs targets across large scale gene expression studies. Bioinformatics. 2010, 26 (24): 3131-3132. 10.1093/bioinformatics/btq589.
Volinia S, Visone R, Galasso M, Rossi E, Croce C: Identification of microRNA activity by Targets’ Reverse EXpression. Bioinformatics. 2010, 26: 91-97. 10.1093/bioinformatics/btp598.
Arora A, Simpson D: Individual mRNA expression profiles reveal the effects of specific microRNAs. Genome Biol. 2008, 9 (5): R82-10.1186/gb-2008-9-5-r82.
Yu Z, Jian Z, Shen S, Purisima E, Wang E: Global analysis of microRNA target gene expression reveals that miRNA targets are lower expressed in mature mouse and Drosophila tissues than in the embryos. Nucleic Acids Res. 2007, 35: 152-164.
Liang Z, Zhou H, Zheng H, Wu J, Liang Z, Zhou H, Zheng H, Wu J: Expression levels of microRNAs are not associated with their regulatory activities. Biol Direct. 2011, 6: 1-4. 10.1186/1745-6150-6-1.
Jayaswal V, Lutherborrow M, Yang Y: Measures of association for identifying MicroRNA-mRNA pairs of biological interest. PloS one. 2012, 7: e29612-10.1371/journal.pone.0029612.
Monteys A, Spengler R, Wan J, Tecedor L, Lennox K, Xing Y, Davidson B: Structure and activity of putative intronic miRNA promoters. RNA. 2010, 16 (3): 495-10.1261/rna.1731910.
Ozsolak F, Poling L, Wang Z, Liu H, Liu X, Roeder R, Zhang X, Song J, Fisher D: Chromatin structure analyses identify miRNA promoters. Genes Dev. 2008, 22 (22): 3172-10.1101/gad.1706508.
Martinez N, Ow M, Reece-Hoyes J, Barrasa M, Ambros V, Walhout A: Genome-scale spatiotemporal analysis of Caenorhabditis elegans microRNA promoter activity. Genome Res. 2008, 18 (12): 2005-10.1101/gr.083055.108.
Lu J, Clark A: Impact of microRNA regulation on variation in human gene expression. Genome Res. 2012, 22 (7): 1243-1254. 10.1101/gr.132514.111.
Friedman J, Hastie T, Höfling H, Tibshirani R: Pathwise coordinate optimization. Ann Appl Stat. 2007, 1 (2): 302-332. 10.1214/07-AOAS131.
Piriyapongsa J, Mariño-Ramírez L, Jordan I: Origin and evolution of human microRNAs from transposable elements. Genetics. 2007, 176 (2): 1323-1337.
Lukk M, Kapushesky M, Nikkilä J, Parkinson H, Goncalves A, Huber W, Ukkonen E, Brazma A: A global map of human gene expression. Nature Biotechnol. 2010, 28 (4): 322-324. 10.1038/nbt0410-322.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological). 1995, 289-300.
This research was supported by Natural Science and Engineering Research Council grants to QM and WW. MHR was partially supported by an Ontario Graduate Scholarship.
The authors declare that they have no competing interests.
Conceived and designed the experiments: MHR QM WW. Performed the experiments: MHR. Analyzed the data: MHR QM. Wrote the paper: MHR QM. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1:Figure S1. Cumulative distribution of scores for the validated targets. Validated targets are assigned higher BayMiR scores and gene variation scores compared to the other putative targets. Shown are the cumulative distributions of BayMiR (left plot) and gene variation scores (right plot) scores for validated targets (blue) and all putative targets (red). (PDF 68 KB)
Additional file 2:Figure S2. Comparing BayMiR and Cometa. BayMiR high scoring targets are more down-regulated in miRNA over-expression assays than Cometa high scoring targets. The cumulative distribution of log-fold change for high-scoring mRNAs; blue, red, and black represent graphs associated with BayMiR, gene variation, and Cometa. (PDF 75 KB)
Additional file 5:Table S3. Validated KEGG pathways. List of miRNAs with proposed functions found in our enriched KEGG list; the third column gives the Pubmed IDs of the references. (PDF 108 KB)
Additional file 6:Figure S3. KEGG “Pathways in cancer”: 68 targets of 10 miRNAs are involved in the pathway (red boxes). 38 genes targeted by the other miRNAs are colored in yellow; and 62 genes involved in the pathway were excluded from the BayMiR target list since their expression variabilities across arrays were very low (white boxes). The miRNA family IDs: miR-17/17-5p/20ab/20b-5p/93/106ab/427/518a-3p/519d,miR-548ah/3609,miR-4729,miR-203,miR-548p,miR-3647-3p,miR-300/381/539-3p,miR-142-5p,miR-545,miR-125a-5p/125b-5p/351/670/4319’. (PDF 401 KB)
Additional file 8: Blue: the position contribution scores of miRNA-mRNA pairs whose BayMiR scores > median B a y M i R s c o r e s . Red: the position contribution scores of miRNA-mRNA pairs whose BayMiR scores < median B a y M i R s c o r e s . (PDF 35 KB)
Additional file 9:Figure S4. The 3' UTR of mRNAs harbor many conserved seed matches. Shown is the cumulative distribution of number of seed matches in the 3'UTR of 14,816 mRNA transcripts with at least one miRNA seed match. (PDF 27 KB)
Additional file 10:Figure S5. Example of combinatorial regulation masking inverse correlation. Shown in green is the expression level of a target gene and in red the expression levels of three targeting miRNAs. The negative correlation of each individual miRNAs with the target is insignificant, but when considered together they explain perfectly the down-regulation impact of miRNAs. (PDF 14 KB)
Additional file 11: Gene expression variability increases as the number of target sites increases in the 3’ UTR of genes. (top) miRNA targets have high expression variation. (bottom) Red and blue demonstrate the cumulative distributions of genes whose variance is larger than median and 75th percentile, respectively. Dark: cumulative distribution of variances corresponding to all genes. (PDF 44 KB)
Authors’ original submitted files for images
About this article
Cite this article
Radfar, H., Wong, W. & Morris, Q. BayMiR: inferring evidence for endogenous miRNA-induced gene repression from mRNA expression profiles. BMC Genomics 14, 592 (2013). https://doi.org/10.1186/1471-2164-14-592
- miRNA Target
- miRNA Family
- mRNA Expression Data
- miRNA Target Site
- miRNA Activity