Genome-wide search for miRNA-target interactions in Arabidopsis thaliana with an integrated approach
© Ding et al; licensee BioMed Central Ltd. 2012
Published: 11 June 2012
Skip to main content
© Ding et al; licensee BioMed Central Ltd. 2012
Published: 11 June 2012
MiRNA are about 22nt long small noncoding RNAs that post transcriptionally regulate gene expression in animals, plants and protozoa. Confident identification of MiRNA-Target Interactions (MTI) is vital to understand their function. Currently, several integrated computational programs and databases are available for animal miRNAs, the mechanisms of which are significantly different from plant miRNAs.
Here we present an integrated MTI prediction and analysis toolkit (imiRTP) for Arabidopsis thaliana. It features two important functions: (i) combination of several effective plant miRNA target prediction methods provides a sufficiently large MTI candidate set, and (ii) different filters allow for an efficient selection of potential targets. The modularity of imiRTP enables the prediction of high quality targets on genome-wide scale. Moreover, predicted MTIs can be presented in various ways, which allows for browsing through the putative target sites as well as conducting simple and advanced analyses.
Results show that imiRTP could always find high quality candidates compared with single method by choosing appropriate filter and parameter. And we also reveal that a portion of plant miRNA could bind target genes out of coding region. Based on our results, imiRTP could facilitate the further study of Arabidopsis miRNAs in real use. All materials of imiRTP are freely available under a GNU license at (http://admis.fudan.edu.cn/projects/imiRTP.htm).
MicroRNAs (miRNAs)  are a class of 20-nt to 24-nt small non-coding RNA (sncRNA) that has emerged as a key regulator of gene activity. MiRNAs regulate virtually every aspect of biology, including developmental timing, differentiation, proliferation, antiviral defence and metabolism . In plants, miRNAs are processed from larger precursor stem-loops (pre-miRNAs) in the nucleus, mainly by DICER-LIKE1 (DCL1) which excises a double-stranded RNA consisting of a miR and its near-complementary miR* sequence from the other arm of the stem-loop. The miRNA:miRNA* duplex is methylated and translocated to the cytoplasm where it can be loaded into an RNA-induced silencing complex (RISC) that includes a member of the ARGONAUTE (AGO) family as catalytic component. The RISC can then recognize mRNAs containing sequences complementary to the loaded miRNA . In plants, cleavage of the target mRNA is the main mechanism for miRNA action, but there are also direct effects on protein accumulation [4, 5], as reported for many animal miRNAs [6, 7].
Unlike animals, plant miRNAs generally show a near-perfect complementary target mRNA which immensely facilitates computational predictions . Taking advantage of this property, several methods were developed to search for antisense hits to known miRNAs on Arabidopsis mRNAs [9, 10]. While both animals and plants rely to a different extent on RNA complementarity to define their targets, some comparable features are employed in target prediction methods/tools for both animal and plant miRNAs. Notably, RNAhybrid  was first developed to identify miRNA-Target duplexes in D. melanogaster, D. pseudoobscura, and A. gambiae, and was more recently adapted to the specific requirements of plant miRNA target prediction, with outstanding results in Arabidopsis thaliana [12, 13].
Features reported for plant miRNA-target interactions in previous studies can be divided into three categories: (i) duplex pairing, with specific consideration of the seed region (2-8nt) in particular for animals, and the central region in plants (9-11nt), (ii) evolutionary conservation of MTI sites, and (iii) MTI site accessibility.
First, the effect of different seed site matches has been evaluated by different means. In animals, this led to the definition of several canonical seed types that differ in abundance and downstream effect . Recent studies suggested that the majority of functional target sites are formed by less specific seeds of only 6 nt indicating a crucial role of this type, and they also suggest that the majority of functional sites remain uncovered by common prediction methods . Second, not all animal targets are defined by 5' seed matches, but other types of complementarity, including a small number of near-perfect target sites inducing cleavage . This feature was considered in a target prediction method for human/mouse miRNAs . Third, increasing numbers of reports emphasize the importance of multiplicity of target sites in Arabidopsis. As example, some primary and secondary trans-acting siRNAs are generated from regions of Arabidopsis genes with two or more known miRNA/siRNA complementary sites [19, 20]. Fourth, miRNA-target interaction is determined by multiple factors. Considering the extent of influence of each of these factors in recognition mechanisms is still unclear to date, and current predictive approaches are often based on only some factors . As result, our study clearly shows that current methods can give different predictions on identical miRNA/mRNA sets (see Results).
All the problems mentioned above exist in predicting miRNA-target interactions of both animal and plant miRNAs. One successful attempt to address these is by integrating diverse approaches and datasets in a comprehensive manner that may substantially improve animal MTIs prediction. To date, miRNAmap 2.0 , miRecords , miRGator 2.0 , miRGene 2.0 , miRror  provide miRNA targets by integrating extensively adopted target prediction methods. Moreover, Tarbase , miRDB , miR2Disease  and miRTarBase  are established to provide experimentally validated MTIs.
Here, we present the first integrated MTI identification toolkit--imiRTP (integrated miRNA Target Prediction) for Arabidopsis thaliana, the most studied plant model species. It integrates 4 powerful predictors based on different factors, and is evaluated on 142 experimentally validated MTIs and 25,688 MTIs predicted by CleaveLand  on several Degradome data sets. Besides integrating different prediction methods and data sources, imiRTP also offers 4 effective filters to select high quality MTIs and supports diverse outputs to facilitate further analysis.
Considering target prediction is extremely computation time consuming, imiRTP submits input miRNA and/or transcript sequences to corresponding online predictors and then collects results for local analysis to save time. Moreover, data will be processed before submission with the aim to further save time. First, all input miRNAs are searched against a pre-processed miRNA dataset, consisting of all Arabidopsis miRNAs in miRBase (Release 17). Matched miRNA will not be submitted to online predictors; instead, their MTIs will be searched from already computed prepared genome-wide results. Second, MIR genes contained in transcript sequences will be removed. Then, all online collected and locally searched MTIs are combined together. In post-processing, imiRTP provides four filters to narrow results, thus substantially improving the predictions.
We downloaded all 255 Arabidopsis thaliana miRNAs arising from 231 pre-miRNAs in the miRNA database miRBase  (http://www.mirbase.org/). Transcript sequences (CDS, 5'UTR and 3'UTR) were downloaded from the central database TAIR  (http://www.arabidopsis.org/, Release 9).
A previous study  has collected 102 experimentally validated Arabidopsis MTIs from several publications [34–36], and these MTIs have been used as benchmark in comparison [13, 37]. As the most complete online experimentally validated MTI database, miRTarBase has accumulated nearly 4,000 MTIs by manually surveying over 1,100 pertinent publications after systematic text mining to filter research articles related to functional studies of miRNAs . Sixty two experimentally validated Arabidopsis thaliana MTIs were downloaded from miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/, Release 2.4). After removing duplicates, a total of 142 experimentally validated MTIs were collected.
Recently, high-throughput CLIP-Seq (HITS-CLIP , PAR-CLIP ) and Degradome-seq [34, 36] methods have been applied to identify the sites of Argonaute interaction and miRNA cleavage, respectively. 25,688 degradome sequencing supported Arabidopsis thaliana MTIs were downloaded from starBase  (http://starbase.sysu.edu.cn/, Release 2.0). These MTIs are predicted by CleaveLand (version 2.0)  with a cutoff of 7.
Summary of target prediction tools for plant miRNA
20 M/200 M
50 kb/40 M
WMD3  is based on principles of artificial miRNAs, which support the notion that extensive base pairing with targets is required for plant miRNA function. It predicts targets using previously determined parameters of target selection for natural miRNAs. The number of mismatches (cutoff 5) and hybridization energy ratio (≥ 70%) are two critical parameters when searching targets. In this method, a GU wobble pair is counted as 1 mismatch.
UEA_sRNA is included in the UEA toolkit  which identifies plant sRNA (miRNA/siRNA) targets. The rules used for target prediction are based on factors suggested in previous studies [35, 43]. Both seed region and central region are considered. Mismatches in the central region (9-11nt) are not allowed. The hybridization energy ratio is computed as the MFE (minimum free energy) of miRNA:miRNA* instead of the traditional optimal energy that is calculated by the miRNA and its perfect reverse complement. Considering the similar rules used in UEA_sRNA and Targetfinder , the latter is not integrated in imiRTP.
TAPIR offers potential plant miRNA targets using a fast (FASTA) search engine and a precise (RNAhybrid) engine. A miRNA target score is modified from a previous study . Mismatch, gap and wobble pairs inside and outside the core region (2-12nt) are counted differently. Again, the hybridization energy ratio is considered (≥ 70%). Considering a previous assessment , we chose to integrate the fast FASTA algorithm. Another Smith-Waterman-like alignment, Target-align, has been shown to perform better than TAPIR . However it is not practical for prediction of MTIs on genome-wide scale since too many parameters are considered.
psRNATarget  is designed for plant sRNA target prediction with an efficient distributed computing back-end pipeline that runs on a Linux cluster. This tool can rapidly search for potential MTIs and it is the first to provide multiplicity information and functional type (cleavage or translation) determined by the occurrence of a mismatch in central region (9-11nt). Unfortunately, the multiplicity is only reported but not considered in prediction.
In Table 1, we list the input limitations of existing online predictors. Here, the third mission of the pre-processing procedure is rearranging user-submitted files, in order to guarantee flexibility when working with imiRTP.
The secondary stage of imiRTP includes several effective filters. The user can remove predicted MTIs with one filter or various ensembles. When comparing performance among predictors, there are usually two levels: mRNA level and target-site level. Here, we use the latter. Putative target sites that overlap at least 90% will be grouped into one common MTI.
MTI predicted by single method is usually not as credible as those identified by multiple methods. Keep this in mind, we introduce the multi-method filter to help user to select multiple predicted MTIs.
Strong miRNA targets tend to have multiple target sites instead of one single site . Considering the number of putative miRNA site per mRNA can therefore significantly enhance MTI prediction. In reported plant miRNA target prediction tools, the importance of the target site multiplicity was generally underestimated. Default cutoff of multiplicity is 2.
The frequently considered free energy of interaction of a miRNA and its target is generally not a very good predictor . An effective MTI needs an open structure on the target site to begin the hybridization reaction, an issue which has been extensively explored in animals [47–49]. The RNAup program in Vienna package is used to calculate secondary structure in target site regions . The RNAup takes into account the hybridization energy and the free energy needed to open the target site, which turns out to be the real accessibility. The default threshold of accessibility is set to the highest value that observed for all 125 validated MTIs recovered by imiRTP.
Degradome-seq (also known as PARE and GMUCT) directly sequences degradome tags derived from the 5' ends of uncapped mRNAs and delivers an empirical overview of cleaved sRNA targets without computational predictions or overexpression. In this work, 25,688 Degradome-seq results were collected from , which were then used to filter computationally predicted MTIs at the mRNA level. The default cutoff 4.5 is suggested by .
The imiRTP toolkit accepts user-submitted miRNA and/or transcript sequences for analysis, i.e. (i) searching user-submitted miRNAs against included TAIR transcripts; and (ii) searching user-submitted miRNAs and user-submitted transcripts. After all files are successfully submitted, imiRTP will search targets based on selected predictors and combine all results into one group.
Once the submitted analysis is completed, imiRTP outputs details of predicted MTIs to one file and outputs statistics of every unique miRNA-target (mRNA level) predicted by which predictor to another file. A sort tool is implemented to easily browse through the results. In addition, imiRTP allows users to extract several essential columns from results related to single miRNAs or mRNAs, which greatly facilitate further analysis, i.e. motif discovery, SNP detection and regulatory network analysis.
imiRTP was implemented in C++. Online interaction programs were implemented in C# and java. Both are tested on Windows platform. Computation of the accessibility profiles in the post-processing steps is performed with the help of RNAup program. When calculating the accessibility, imiRTP extracts a maximum of 100nt flanking sequences on both upstream and downstream of a target site instead of using the whole sequence, thus greatly reduces the calculating time.
Performance of four predictors*
1) Additional prediction Percentage of predicted target-site level MTIs that do not belong to the reference set. Smaller is better.
2) True positive Percentage of MTIs that each predictor or predictor ensembles can recover from the reference set. Since all validated MTIs are on mRNA level, predicted target-site level MTIs are first collapsed to mRNA level. Larger is better.
3) Filter power Geometric mean of filter ratio and additional prediction. Filter ratio is defined as the fraction of predicted MTIs that are filtered by a given filter compared to the number of total predictions. Smaller is better.
First, we tested the four selected predictors with the reference set and compiled the results in Table 2. At default settings TAPIR_FASTA identifies the most validated MTIs (116/142), while WMD3 identifies the least (90/142). This demonstrates the specific importance of the core region (2-12nt), which covers both the seed region and central region. UEA_sRNA gets the lowest additional prediction with only 362 predicted MTIs, which is considerably smaller than other three methods. For one reason, UEA_sRNA uses stringent parameters. The other reason might be the special hybridize energy ratio computed by this method.
We next compared the effect of other filters. To achieve reliable results, we constructed a benchmark set by selecting several best combinations of predictors and/or multi-method cutoff from the different groups mentioned above (highlighted in Table 2, and in Additional file 2, Tables S1 & S2).
As for accessibility, the default cutoff for RNAup program is -15.87 kcal/mol, which is the maximum value of 119 out of all 125 validated MTIs that could be recovered by imiRTP. Thermodynamic related features are considered by all four online predictors but with different calculation methods. For example, the RNAup program has been employed by psRNATarget, while RNAcofold was set as default in WMD3, and the flanking regions around the target site are different. Here, we employ RNAup to compute the accessibility uniformly. Because of the specific cutoff (-15.87 kcal/mol), only 6 validated MTIs, whose accessibilities couldn't be calculated by RNAup, are removed. Correspondingly, it results in the highest true positives. On the other hand, the filter ratio is extremely low (Additional file 2, Table S5), which then gives rise to contain a little bit more additional predictions. When validating the effectiveness of the three local filters, results also show that the combinations of multiple methods always achieve similar or better predictions than any single method.
Plant miRNA binding sites occur typically in the coding region of target genes , whereas in animals, they are most often found within the 3'UTR . However, new findings indicate that both animal and plant miRNAs can target 5'UTR, 3'UTR and coding regions [53, 54]. We therefore run the programs specifically on Arabidopsis thaliana CDS, 5'UTR and 3'UTR sequences downloaded from TAIR (version 9).
In order to verify the effectiveness of the imiRTP toolkit, we collected 142 experimentally validated MTIs from previous studies within a reference set. We find that four online predictors integrated within imiRTP give highly different target-site level results (Additional file 2, Table S1), especially on the genome-wide scale (Additional file 2, Table S9).
Degradome-seq is a novel technology that is independent of computational methods. However, if we directly compare the results of imiRTP and CleaveLand at the target-site level, the fraction of common MTIs decreases dramatically from ~50% to ~2% (data not shown). Moreover, even the combination of all four predictors could only identify 88.03% (125/142) validated MTIs.
Besides psRNATarget, all existing target prediction methods and Degradome-seq for plant miRNA are looking for features specific to target cleavage. As a consequence, a portion of additional predictions might in fact be true positive, although many of them are indeed false positives.
All these results indicate the importance of integrating multiple methods and the introduction of translational repression related factors. Additionally, our results have indicated that proper filters can efficiently identify potential MTIs from large candidate sets.
Future work mainly includes the following parts. (i) Integrate additional existing and novel target prediction methods, (ii) support more plant species, (iii) improve and consider other efficient filters, like computing the accessibility with Raccess  and RNAplex  and selection of predicted MTIs by evolutionary conservation, (iv) maintain imiRTP with future experimentally and Degradome-seq validated MTIs.
In this work, we propose the first integrated miRNA target prediction toolkit for Arabidopsis thaliana. The imiRTP toolkit brings new features compared to existing methods. The ability to use different predictors and filters to search qualified MTIs, the rich output results, and the use pre-computed results should make imiRTP a useful and efficient resource for the plant research community.
This article has been published as part of BMC Genomics Volume 13 Supplement 3, 2012: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S3.
We are grateful to Molly Megraw (Duke University) for providing comments and code. JD, DL and SG were supported by Major State Basic Research and Development Program of China (973 Program) under grant no.2010CB126604. JG was also supported by the Shuguang Program of Shanghai Education Foundation.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.