miRNA-target prediction based on transcriptional regulation
© Fujiwara and Yada; licensee BioMed Central Ltd. 2013
Published: 15 February 2013
Skip to main content
© Fujiwara and Yada; licensee BioMed Central Ltd. 2013
Published: 15 February 2013
microRNAs (miRNAs) are tiny endogenous RNAs that have been discovered in animals and plants, and direct the post-transcriptional regulation of target mRNAs for degradation or translational repression via binding to the 3'UTRs and the coding exons. To gain insight into the biological role of miRNAs, it is essential to identify the full repertoire of mRNA targets (target genes). A number of computer programs have been developed for miRNA-target prediction. These programs essentially focus on potential binding sites in 3'UTRs, which are recognized by miRNAs according to specific base-pairing rules.
Here, we introduce a novel method for miRNA-target prediction that is entirely independent of existing approaches. The method is based on the hypothesis that transcription of a miRNA and its target genes tend to be co-regulated by common transcription factors. This hypothesis predicts the frequent occurrence of common cis-elements between promoters of a miRNA and its target genes. That is, our proposed method first identifies putative cis-elements in a promoter of a given miRNA, and then identifies genes that contain common putative cis-elements in their promoters. In this paper, we show that a significant number of common cis-elements occur in ~28% of experimentally supported human miRNA-target data. Moreover, we show that the prediction of human miRNA-targets based on our method is statistically significant. Further, we discuss the random incidence of common cis-elements, their consensus sequences, and the advantages and disadvantages of our method.
This is the first report indicating prevalence of transcriptional regulation of a miRNA and its target genes by common transcription factors and the predictive ability of miRNA-targets based on this property.
microRNAs (miRNAs) are tiny endogenous RNAs which occur in animals and plants and that direct the post-transcriptional regulation of target mRNAs for degradation or translational repression via binding to the 3'UTRs and the coding exons [1–4]. More than 1,500 miRNA genes have been identified in the human genome . Computational predictions have shown that miRNAs may directly regulate 20-30% of protein-coding genes [6, 7], and, on average, each miRNA can regulate the expression of several hundred genes . Therefore, miRNAs are regarded as important regulators for cell differentiation, proliferation/growth, mobility, and apoptosis [9–11].
To gain insight into the biological role of miRNAs, it is essential to identify the full repertoire of mRNA targets (target genes). A number of computer programs have been developed for miRNA-target prediction . These programs essentially perform two steps. First, they identify potential binding sites in 3'UTRs, which are recognized by the seed region of a given miRNA according to specific base-pairing rules. The seed region is defined as the consecutive stretch of 7 nucleotides starting from either the first or the second nucleotide at the 5' end of a miRNA. Note that they do not take potential binding sites in coding exons into consideration. Second, they evaluate cross-species conservation of the potential binding sites, and regard mRNAs with high conservation as putative target genes. This step successfully reduces many false positive predictions. However, it is increasingly evident that many non-conserved binding sites are also functional .
Accordingly, several programs that do not rely on cross-species conservation have been developed. These programs employ novel features in addition to base-pairing rules in seed regions. Kim et al.  and Yousef et al.  introduced various types of features observed in downstream seed regions (out-seed regions), e.g. structural, thermodynamic and positional features. Robins et al.  and Kertesz et al.  incorporated mRNA secondary structure as a measure of accessibility to miRNA-target binding sites in their prediction programs. Wang & Naqa  and Gennarino et al.  proposed integration of gene expression data into their prediction programs. Nonetheless, almost all the programs had region-limited view of miRNA activity, that is, they focused on potential binding sites in 3'UTRs of mRNAs only.
In terms of genomic organization, miRNAs can be categorized into two classes, namely, intragenic and intergenic miRNAs . Intragenic miRNAs are located within other transcriptional units (host genes). Rodriguez et al.  proposed that such miRNAs are transcribed in parallel with their host genes, suggesting that they share promoters with their host genes. In contrast, intergenic miRNAs are located between other transcriptional units and therefore have their own transcriptional units and promoters. Lee et al.  verified that they are first transcribed as long primary transcripts (pri-miRNAs) by RNA polymerase II. These long pri-miRNAs are then processed into pre-miRNAs and mature miRNAs. Intergenic miRNAs occasionally form a cluster, and these can be simultaneously transcribed as a single polycistronic transcript . Short distances between consecutive intergenic miRNA loci are hallmarks of polycistronic transcription.
We discuss here two questions. (1) Are there common cis-elements between promoters of a miRNA and its target genes? (2) Is it possible to predict miRNA-target genes based on common cis-elements? First, we found that a significant number of common cis-elements were observed in ~28% of experimentally supported miRNA-target data. Second, we demonstrate the statistical significance of the predictive ability of our method. Finally, we discuss the random background resulting from common cis-elements, consensus sequences of these elements, and the advantages and disadvantages of our method. This is the first report indicating prevalent transcriptional regulation of a miRNA and its target genes by common transcription factors and the potential to predict miRNA targets based on this property.
For each set of miRNA-target data, we detected a set of common cis-elements between promoters of the miRNA and its target gene, and evaluated its statistical significance. As a result, we observed at least one common cis-element in 73 (73/97) of the intragenic miRNA-target data and 62 (62/110) of the intergenic miRNA-target data. Among these, 32 (32/97) of the intragenic miRNA-target data and 25 (25/110) of the intergenic miRNA-target data were found to be statistically significant. That is, we observed a statistically significant number of common cis-elements in 57 (57/207) of the miRNA-target data. This corresponds to 27.5% of the data, and clearly shows the prevalence of transcriptional regulation of a miRNA and its target gene by common transcription factors. Although pairs of genes with common cis-elements show, on average, a higher degree of co-expression than those without, gene pairs with higher degrees of expression correlation do not have significantly greater numbers of common cis-elements . Thus, there is a possibility that a greater fraction of the miRNA-target data is actually co-expressed.
We applied our method to 155 mature miRNAs in the prepared miRNA-target data. For comparative purposes, we also applied two existing methods, mi-Randa (Sep. 2008 Rel.)  and RNAhybrid Ver.2.1 , to the same data. These two methods search for potential binding sites in 3'UTRs of mRNAs using the seed region of a given mature miRNA according to specific base-pairing rules. Note that they do not rely on cross-species conservation of potential binding sites as in our method. However, they still have region-limited view of miRNA activity, that is, they do not take potential binding sites in coding exons into consideration. We applied the programs with default parameter sets. A threshold of RNAhybrid, 'minimum free energy', was ≤ −25.0. To test the programs, we applied the collection of 3'UTR sequences used in the miRNA-target prediction program, TargetScan Rel.4.0 . To allow fair comparison with our method, we used only the 3'UTR sequences corresponding to all human protein coding genes of DBTSS (14,728 genes).
Although the predictive ability of our method is not particularly high, its prediction accuracy is comparable to that of miRanda or RNAhybrid (Figure 4). We evaluated statistical significances of their prediction abilities by using the binominal test , and found that p -value of our method was 5.69 × 10−8 while those of miRanda and RNAhybrid were 1.09 × 10−10 and 2.00 × 10−8, respectively. Those clearly show potential to predict miRNA-targets based on common cis-elements.
Prediction accuracy of our method for miRNA-target data whose binding sites are not conserved between related species.
Lists of GO terms ranked according to success rate of miRNA-target prediction by our method.
Success rate (%)
Enzyme linked receptor protein signaling pathway
Negative regulation of cellular metabolic process
Sequence-specific DNA binding
Negative regulation of metabolic process
Response to stress
Negative regulation of macromolecule metabolic process
Multicellular organismal development
Anatomical structure development
Regulation of developmental process
Success rate (%)
Neurological system process
Cell cycle process
Cell cycle phase
Intrinsic to membrane
Integral to membrane
All of the data described in this paper are available from the author on request. We applied our method to all human miRNAs in miRBase rel.12.0, and the results are also available.
We collected experimentally supported human miRNA-target data, and determined the associated promoter regions. Next, we identified potential cis-elements in each promoter based on cross-species conservation, and selected those that were common between the promoters of a particular miRNA and its target genes.
We collected a set of experimentally supported human miRNA-target data from TarBase ver.5.0 . TarBase contains ~1,100 entries of human miRNA-target data, which comprise a collection of pairs of mature miRNAs and their target genes. From this data set, we selected 166 entries that had direct experimental support, e.g. reporter gene assay. By using miRBase rel.12.0  and the UCSC Genome Browser , we identified genomic loci of the miRNAs and the target genes in the human genome (hg18). Since miRBase consists of pre-miRNA data, we assigned mature miRNAs in TarBase to pre-miRNAs of miRBase based on their names and sequences. In some cases, a mature miRNA was assigned to multiple pre-miRNAs. We discarded mature miRNAs that were not assigned to any premiRNAs. In summary, our filtered miRNA-target data set consisted of 71 mature miRNAs, 84 premiRNAs and 117 target genes. The data contained 155 pairs of mature miRNAs and their target genes, and 207 pairs of pre-miRNAs and their target genes.
We classified miRNAs from the miRNA-target data into intragenic and intergenic subsets to identify their promoter regions. We searched for host genes whose genomic loci overlapped with those of the miRNAs on the same strands. Genomic loci of host genes were examined by using five human (hg18) gene annotation tracks (UCSC Genes, RefSeq Genes, human mRNA from GenBank, H-Invitational, and Ensembl Genes) from the UCSC Genome Browser. In cases where host genes were found, the corresponding miRNAs were classified as intragenic miR-NAs. The remaining miRNAs were classified as intergenic miRNAs. Six miRNAs (hsa-let-7a-3, hsalet-7b, has-mir-21, hsa-mir-24-2, hsa-mir-34a, hsamir-129-1) were classified as intergenic miRNAs despite their intersection with host genes, because the fractions of their overlap were relatively small. As a result, from the 207 pairs of pre-miRNAs and their target genes, 97 were classified as intragenic and 110 were classified as intergenic.
Intragenic miRNA promoters were defined as the genomic region −2000/+ 200 bp from the transcription start site (TSS) of the host gene (where +1 is TSS). Genomic locations of TSSs were obtained from DBTSS Ver.6.0 . In cases where alternative TSSs were reported, we selected the TSS for which the 'Number of confident cDNAs' was maximal. If this number was small (≤ 3), we adopted the most upstream TSS provided either by RefSeq  or UCSC Genes.
Intergenic miRNA promoters were defined as the 2,200 bp genomic region upstream from the 5' end of the intergenic pre-miRNAs. In cases where the intergenic miRNAs form a cluster, we identified the most upstream miRNA within the cluster to assign a promoter of a polycistronic transcript. We regarded intergenic miRNAs as clustered, when distances to neighboring miRNAs were ≤ 5,000 bp
We defined the promoters of miRNA target genes using the same approach as that described for host genes above. We discarded coding regions from all promoters according to annotations of UCSC Genes.
We identified putative cis-elements in promoters of miRNA and target genes based on cross-species conservation. We first extracted the promoter regions from multiple sequence alignments of 28 vertebrate genomes as provided by the UCSC Genome Browser. Next, we identified ≥ 6 nt regions that were completely conserved between human, chimp, mouse, rat and dog, and defined these as putative cis-elements.
By comparing putative cis-elements between promoters of a miRNA and its target gene, we searched for ≥ 6 nt identical subsequences, and defined these as common cis-elements. To evaluate the statistical significance of the subsequences, we determined the frequency distribution of common cis-elements that occur by chance alone by applying the following procedure. First, we prepared two sets of TSSs from DBTSS. The former consisted of TSSs whose promoters shows a cross-species conservation distribution similar to that of the miRNA promoters, while the latter consisted of TSSs whose promoters shows a cross-species conservation distribution similar to that of the target gene promoters. Next, we randomly selected a pair of TSSs: one from the former and the other from the latter. Then, we repeated this application 100,000 times. For each pair of TSSs, we determined promoter regions [−2000, +200], and detected common cis-elements according to the above procedure. Then, we recorded the frequency of their incidence for every sequence length. Finally, we summarized these for all pairs of TSSs, and obtained frequency distributions of common cis-elements that occurred by chance for every sequence length. The Bonferroni method was applied to correct for multiple testing . A set of common cis-elements between promoters of a miRNA and its target gene was considered statistically significant where its occurrence distribution by chance was 5% or less.
We developed a method for miRNA-target prediction as described below. Note that the method does not rely on any features of binding sites in 3'UTRs and coding exons. (1) The method assigned a given mature miRNA to a pre-miRNA, and identified its genomic locus. Then, the method determined a promoter region of the pre-miRNA, and identified putative cis-elements. See 'Collecting miRNA-target data' ~ 'Identifying putative cis-elements' for detailed information. (2) For all protein coding genes of an organism from which the miRNA originates, the method determined their promoter regions, and identified putative cis-elements. Since we adopted human as a model organism, the method identified putative cis-elements in 14,728 promoters of all human protein coding genes from DBTSS. See 'Determining promoter regions' 'Identifying putative cis-elements' for detailed information. (3) For each of the protein coding genes, the method compared its putative cis-elements with those of the miRNA, and detected common cis-elements. Then, the method evaluated statistical significance of an occurrence distribution of the common cis-elements, and regarded a protein coding gene whose occurrence distribution was significant as a target. See 'Detecting common cis-elements' for detailed information. In step (1), a mature miRNA was sometimes assigned to multiple pre-miRNAs. In such cases, we applied the method to each of the pre-miRNAs, and took the union of all predicted target genes.
This work was supported by KAKENHI (Grantin-Aid for Scientific Research) No.221S0002 and No.22240032 from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
The publication costs for this article were funded by the above grants.
This article has been published as part of BMC Genomics Volume 14 Supplement 2, 2013: Selected articles from ISCB-Asia 2012. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/14/S2.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.