- Open Access
MicroPC (μPC): A comprehensive resource for predicting and comparing plant microRNAs
BMC Genomicsvolume 10, Article number: 366 (2009)
Plant microRNA (miRNA) has an important role in controlling gene regulation in various biological processes such as cell development, signal transduction, and environmental responses. While information on plant miRNAs and their targets is widely available, accessible online plant miRNA resources are limited; most of them are intended for economically important crops or plant model organisms. With abundant sequence data of numerous plants in public databases such as NCBI and PlantGDB, the identification of their miRNAs and targets would benefit researchers as a central resource for the comparative studies of plant miRNAs.
MicroPC (μPC) is an online plant miRNA resource resulted from large-scale Expressed Sequence Tag (EST) analysis. It consists of 4,006 potential miRNA candidates in 128 families of 125 plant species and 2,995 proteins (4,953 EST sequences) potentially targeted by 78 families of miRNA candidates. In addition, it is incorporated with 1,727 previously reported plant mature miRNA sequences from miRBase. The μPC enables users to compare stored mature or precursor miRNAs and user-supplied sequences among plant species. The search utility allows users to investigate the predicted miRNAs and miRNA targets in detail via various search options such as miRNA family and plant species. To enhance the database usage, the prediction utility provides interactive steps for determining a miRNA or miRNA targets from an input nucleotide sequence and links the prediction results to their homologs in the μPC.
The μPC constitutes the first online resource that enables users to comprehensively compare and predict plant miRNAs and their targets. It imparts a basis for further research on revealing miRNA conservation, function, and evolution across plant species and classification. The μPC is available at http://www.biotec.or.th/isl/micropc.
MicroRNA (miRNA) is a class of non-coding small RNAs with a single strand molecule of 18–22 nucleotides. It plays a pivotal role in gene regulation in various species by targeting mRNAs at the post-transcriptional levels . In plants, miRNA involves in organ development and environmental responses [2–4].
Despite accumulating studies in the identification of plant miRNAs and their targets based on both computational [5–12] and experimental [13–16] approaches, existing public resources focusing on plant miRNAs and their targets are limited to economically important crops or specially implemented for plant model organisms such as Arabidopsis, rice, and maize. Among them, miRU  is a web server for plant miRNA target prediction utilizing the concepts of nearly perfect binding between a miRNA and its targets. Plant MPSS (Massively Parallel Signature Sequencing) databases  of Arabidopsis, rice, grape, and rice blast fungus (Magnaporthe grisea) were developed to provide an online resource for mRNA and small RNA analyses. ASRP (Arabidopsis Small RNA Project) [19, 20] provides a repository with tools for analysis and visualization of small RNA sequences cloned from various ecotypes and tissues of Arabidopsis. CSRDB (Cereal Small RNA Database)  consists of maize and rice small RNA sequences generated by high-throughput pyrosequencing with the provided maps to the available rice and maize genomes.
Here, we developed MicroPC (μPC) as a comprehensive online resource for predicting and comparing plant miRNAs and their targets. The μPC is resulted from applying previous steps and criteria [7, 22–25] with known plant mature miRNA sequences from miRBase [26, 27] and large-scale EST sequences from PlantGDB [28, 29]. The μPC database consists of 4,006 predicted miRNA candidates in 128 families of 125 plant species and 2,995 proteins (4,953 EST sequences) potentially targeted by 78 families of potential miRNA candidates. It characterizes the predicted miRNAs and their targets according to plant classification (e.g., monocots, dicots, mosses) and incorporated with previously discovered 1,727 plant mature miRNAs from miRBase. With stored data, it offers users a comparative and integrative view of previously reported miRNAs and our predicted miRNA candidates among various plant species. The sequence alignment feature enables users to explore the relation and evolution of either mature or precursor miRNA sequences within a family across plant species. Moreover, the search utility provides various search options for users to explore the predicted miRNA candidates and their potential targets in detail. The prediction utility helps to predict potential miRNA candidates for an input nucleotide sequence and to scan for EST sequences of a plant species that might be targeted by an input mature miRNA sequence.
Construction and content
We obtained 1,727 (1,022 with experiments and 705 without experiments) known mature miRNA sequences in 418 families of 21 plant species from miRBase release 12.0 (September 2008) [see Additional file 1]. To prepare and filter source sequences, we used UNAFold  to predict the secondary structures of the obtained non-experimental stem-loop sequences and applied the criteria in  to remove structural sequences which have (1) the number of mismatches between miRNA and miRNA* greater than 4, and/or (2) the number of asymmetric bulges greater than 1, and/or (3) the size of the asymmetric bulge greater than 2 [see excluded sequences in Additional file 2]. We then manually curated remaining 513 sequences and deployed them together with all sequences with experiments as source sequences for miRNA prediction. All 5,306,503 EST sequences of 173 plant species were downloaded from PlantGDB in November, 2008 [see Additional file 3]. We compiled 1,372 RNA families from Rfam 9.1 (January, 2009)  to filter out non-coding RNAs (ncRNAs) that are not miRNAs in the step of miRNA prediction. For EST functional annotation, the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL protein data sets release 14.9 were obtained from UniProt . These data sets will be updated periodically when their new releases are available.
Prediction of potential miRNA candidates
The prediction of plant miRNA candidates was primarily inferred from the study of Zhang et al. , consisting of six main steps (Figure 1A). First, known mature miRNA sequences from miRBase were searched against EST sequences of 173 plant species from PlantGDB using BLASTN  version 2.2.18 (with E-value < 100 and window size = 4) for identifying homologs. Second, EST sequences containing homolog mature miRNA sequences with the number of mismatches fewer than 4 were searched against UniProt protein database using BLASTX (with E-value < e-20) for removing protein-coding sequences. EST sequences that did not significantly hit any proteins in UniProt from the second step would be further searched against Rfam database using BLASTN (with E-value < e-8) for removing homologs of non-coding RNAs other than miRNAs. Fourth, remaining EST sequences were predicted for the secondary structure of primary miRNA (pri-miRNA) and precursor miRNA (pre-miRNA) by UNAFold which resulted in RNA secondary structures ranked by their minimal free energies (MFEs). Next, predicted precursor miRNA sequences were searched against UniProt protein database using BLASTX (with E-value < e-20) for removing possible protein-coding sequences. An EST sequence will be determined as a miRNA gene if its pre-miRNA structures with lowest MFE satisfy the criteria proposed by Ambros et al. : (1) the mature miRNA sequence has to be within one arm of the hairpin secondary structure, (2) the number of mismatches between mature miRNA sequence and its miRNA* sequence is fewer than 6, and (3) there is no muti-loop in pre-miRNA hairpin secondary structure. The total 1,153 EST sequences were predicted as potential miRNA candidates in 128 families of 125 plant species. These sequences with their secondary structures, MFEs, source mature miRNA sequences from miRBase, and other prediction detailed results are collected in the μPC database.
Prediction of ESTs targeted by miRNA candidates
The prediction of plant miRNA targets is a straightforward process since plant mature miRNAs can bind with specific mRNA targets through perfect or nearly-perfect complementarities [13, 34]. Sequences of potential miRNA candidates were scanned for complementary sites on EST sequences that significantly hit some proteins in UniProt database based on BLASTX (E-value < e-20). An EST sequence would be considered as a potential target if it passes the criteria used in Rhoades et al. [22, 35]: (1) the number of mismatches (including G:U pair ) between sequence of potential miRNA candidate and its complementary site on the EST is fewer than 4, and (2) there is no gap in the complementary site. As specific positions of mismatches affect miRNA targeting [37, 38], we also collected these positions to enable the query of predicted targets with specific mismatch positions of interest. Total 2,995 proteins (4,953 EST sequences) potentially targeted by 78 families of miRNA candidates were assigned with Gene Ontology (GO) , categorized by GO ID (e.g., GO:0000166) and GO term (e.g., nucleotide binding), and stored in the μPC database. Figure 1B shows the procedure of potential miRNA target prediction.
Prediction accuracy and coverage
To assess the prediction accuracy and coverage, 936 experimental precursor miRNA sequences from miRBase were used as input for the miRNA prediction procedure and 761 out of 936 (81.3%) were correctly predicted as miRNA genes [see Additional file 4]. Other sequences were mostly ruled out during the secondary structure prediction. With one thousand repeats of 936 randomly generated sequences, the average number of random sequences predicted as miRNA genes is extremely low (0.00064%). To assess whether variable source mature miRNA sequences affect prediction results, we applied the same prediction steps to the 936 precursor miRNA sequences with leave-one-out of source mature miRNA sequences. The removal of common miRNAs or miRNAs with several members in a family has little or no effects on prediction results. In contrast, a removed species-specific miRNA sequence such as miRNA 1063 in moss will reduce the number of prediction coverage [see Additional files 5 and 6]. These measurements suggest that our miRNA prediction procedure could predict potential miRNA candidates with confidence and has limitation in coverage due to homology-based approach.
Utility and discussion
The μPC offers three main interactive pages for comparing, searching, and predicting plant miRNAs and their targets as described below.
The comparative utility enables users to explore the conservation of known and predicted miRNAs among plant species in integrative views. Users may select miRNAs of interest and submit for their availability among plant species. Cells with different colors in a matrix view of selected miRNAs and plants represent available miRNAs from three sources: our EST analysis, all previous reports from miRBase, or both. Figure 2A reveals the conservation of miRNA 156, miRNA 159, miRNA 172 across plant groups, while miRNA 444 seems to be monocot-specific. miRNA 1062 and miRNA 1076 appear to be moss-specific and miRNA 1082, miRNA 1091, miRNA 1094, miRNA 1095, and miRNA 1098 seem to be spike moss-specific. Besides the comparative matrix view, users may explore the multiple sequence alignment and evolution of either mature or precursor miRNA sequences from miRBase, our prediction results, and user-supplied sequences via incorporated CLUSTALW  and Jalview  (Figures 2B and 2C).
The search utility provides various ways for users to access predicted miRNA candidates and potential miRNA targets in detail. Users may search for predicted miRNAs by miRNA family (e.g., miR156), plant species (e.g., Arabidopsis thaliana), mature miRNA sequence (e.g., ugacagaagagagugagcac), and miRBase AC or ID (e.g., MIMAT0000166 or ath-MIR156a). The miRNA search result mainly includes a list of EST accession numbers potentially encoding miRNAs, their referenced mature miRNA sequences from miRBase, and a link to detailed prediction results such as the secondary structure of pre-miRNA sequence, mature miRNA sequence, positions, number of mismatches, GC content, and MFEI (minimal folding free energy index) . Users could also export the EST source sequence and its secondary structure of pre-miRNA for a specific record (Figure 3A). Users may search for ESTs potentially targeted by miRNA candidates through miRNA family, plant species, UniProt accession number (e.g., [Swiss-Prot:P93015]; SPL3 protein in Arabidopsis), and GO term (e.g., rhythmic process). Figure 3B shows an example of the search by UniProt accession number [Swiss-Prot:P93015], where all targeted ESTs annotated with [Swiss-Prot:P93015] will be returned. The "Export Result" button on both the miRNA and Target search result pages allows users to download a search result as a text file containing detailed prediction results of all records.
Tools for miRNA and miRNA target predictions
The prediction utility helps to predict whether an input nucleotide sequence (< 3000 bp) is a potential miRNA candidate, and to scan for EST sequences of a plant species potentially targeted by an input mature miRNA sequence (in range of 18–26 nt). For miRNA prediction, users could reduce the number of hit mature miRNAs from miRBase via E-value and the maximum number of alignments setting. It will return all mature miRNA sequences that satisfy setting conditions and found within the input sequence. They are also linked to their potential miRNA homologs in the μPC database. The input sequence with a specific mature miRNA positioned could be further predicted by UNAFold for its secondary structure and substructure of stem-loop with mature miRNA sequence on one arm (Figure 4). To predict miRNA targets, users may reduce the number of possible targets via numbers of allowed mismatches and G:U pairs. Its result includes targeted EST accession numbers from PlantGDB, target site alignment and position, numbers of mismatches and G:U pairs, and its functional annotation.
The μPC represents the first online resource that facilitates users with the inclusive comparing and predicting plant miRNAs and miRNA targets across plant species and classification. The comparative utility enables users to explore the conservation and evolution of stored miRNAs and user-provided sequences across plant species. Users may explore miRNAs and their targets predicted by large-scale EST analysis via search utility and predict a miRNA or its possible targets for an input sequence via prediction utility. With stored data and incorporated utilities, the μPC conveys a basis for additional research on plant miRNA functions and evolution.
Availability and requirements
MicroPC (μPC) is freely available at http://www.biotec.or.th/isl/micropc.
The comparative miRNA sequence requires Java applet to display output via Jalview.
The web browser requires a PDF viewer plug-in (e.g., Acrobat Reader) to view and print the PDF files of secondary structures.
Bartel DP: MicroRNAs. Genomics, biogenesis, mechanism, and function. Cell. 2004, 116: 281-297. 10.1016/S0092-8674(04)00045-5.
Zhang B, Pan X, Cobb GP, Anderson TA: Plant microRNA: A small regulatory molecule with big impact. Developmental Biology. 2006, 289 (1): 3-16. 10.1016/j.ydbio.2005.10.036.
Sunkar R, Chinnusamy V, Zhu J, Zhu J-K: Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends in Plant Science. 2007, 12 (7): 301-309. 10.1016/j.tplants.2007.05.001.
Mallory AC, Vaucheret H: Functions of microRNAs and related small RNAs in plants. Nat Genet. 2006, 38: S31-S36. 10.1038/ng1791.
Wang X-J, Reyes J, Chua N-H, Gaasterland T: Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biology. 2004, 5 (9): R65-10.1186/gb-2004-5-9-r65.
Bonnet E, Wuyts J, Rouze P, Peer Van de Y: Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci U S A. 2004, 101 (31): 11511-11516. 10.1073/pnas.0404025101.
Zhang B, Pan X, Wang QL, Cobb GP, Anderson TA: Identification and characterization of new plant microRNAs using EST analysis. Cell Res. 2005, 15 (5): 336-360. 10.1038/sj.cr.7290302.
Zhang B, Pan X, Cannon CH, Cobb GP, Anderson TA: Conservation and divergence of plant microRNA genes. The Plant Journal. 2006, 46: 243-259. 10.1111/j.1365-313X.2006.02697.x.
Zhang B, Wang Q, Wang K, Pan X, Liu F, Guo T, Cobb GP, Anderson TA: Identification of cotton microRNAs and their targets. Gene. 2007, 397: 26-37. 10.1016/j.gene.2007.03.020.
Xie FL, Huang SQ, Guo K, Xiang AL, Zhu YY, Nie L, Yang ZM: Computational identification of novel microRNAs and targets in Brassica napus. FEBS Letters. 2007, 581 (7): 1464-1474. 10.1016/j.febslet.2007.02.074.
Yin Z, Li C, Han X, Shen F: Identification of conserved microRNAs and their target genes in tomato (Lycopersicon esculentum). Gene. 2008, 414: 60-66. 10.1016/j.gene.2008.02.007.
Sunkar R, Jagadeeswaran G: In silico identification of conserved microRNAs in large number of diverse plant species. BMC Plant Biology. 2008, 8 (1): 37-10.1186/1471-2229-8-37.
Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP: MicroRNAs in plants. Genes Dev. 2002, 16 (13): 1616-1626. 10.1101/gad.1004402.
Park W, Li J, Song R, Messing J, Chen X: CARPEL FACTORY, a Dicer Homolog, and HEN1, a Novel Protein, Act in microRNA Metabolism in Arabidopsis thaliana. Current Biology. 2002, 12 (17): 1484-1495. 10.1016/S0960-9822(02)01017-5.
Sunkar R, Zhu J-K: Novel and Stress-Regulated MicroRNAs and Other Small RNAs from Arabidopsis. Plant Cell. 2004, 16 (8): 2001-2019. 10.1105/tpc.104.022830.
Sunkar R, Girke T, Jain PK, Zhu JK: Cloning and Characterization of MicroRNAs from Rice. Plant Cell. 2005, 17 (5): 1397-1411. 10.1105/tpc.105.031682.
Zhang Y: miRU: an automated plant miRNA target prediction server. Nucl Acids Res. 2005, 33 (suppl_2): W701-704. 10.1093/nar/gki383.
Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC: Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucl Acids Res. 2006, 34 (suppl_1): D731-735. 10.1093/nar/gkj077.
Gustafson AM, Allen E, Givan S, Smith D, Carrington JC, Kasschau KD: ASRP: the Arabidopsis Small RNA Project Database. Nucl Acids Res. 2005, 33 (suppl_1): D637-640.
Backman TWH, Sullivan CM, Cumbie JS, Miller ZA, Chapman EJ, Fahlgren N, Givan SA, Carrington JC, Kasschau KD: Update of ASRP: the Arabidopsis Small RNA Project database. Nucl Acids Res. 2008, 36 (suppl_1): D982-985.
Johnson C, Bowman L, Adai AT, Vance V, Sundaresan V: CSRDB: a small RNA integrated database and browser resource for cereals. Nucl Acids Res. 2007, 35 (suppl_1): D829-833. 10.1093/nar/gkl991.
Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP: Prediction of Plant MicroRNA Targets. Cell. 2002, 110 (4): 513-10.1016/S0092-8674(02)00863-2.
Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, Dreyfuss G, Eddy SR, Griffiths-Jones SAM, Marshall M, et al: A uniform system for microRNA annotation. RNA. 2003, 9 (3): 277-279. 10.1261/rna.2183803.
Lai E: Predicting and validating microRNA targets. Genome Biology. 2004, 5 (9): 115-10.1186/gb-2004-5-9-115.
Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, et al: Criteria for Annotation of Plant MicroRNAs. Plant Cell. 2008, 20 (12): 3186-3190. 10.1105/tpc.108.064311.
Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucl Acids Res. 2006, 34 (suppl_1): D140-144. 10.1093/nar/gkj112.
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucl Acids Res. 2008, 36 (suppl_1): D154-158.
Dong Q, Lawrence CJ, Schlueter SD, Wilkerson MD, Kurtz S, Lushbough C, Brendel V: Comparative Plant Genomics Resources at PlantGDB. Plant Physiol. 2005, 139 (2): 610-618. 10.1104/pp.104.059212.
Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V: PlantGDB: a resource for comparative plant genomics. Nucl Acids Res. 2008, 36 (suppl_1): D959-965.
Markham NR, Zuker M: UNAFold: software for nucleic acid folding and hybriziation. Methods Mol Biol. Edited by: Keith JM, Totowa NJ. 2008, Humana Press, 453: 3-31.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucl Acids Res. 2005, 33 (suppl_1): D121-124.
The UniProt C: The Universal Protein Resource (UniProt). Nucl Acids Res. 2008, 36 (suppl_1): D190-195.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. Journal of Molecular Biology. 1990, 215: 403-410.
Llave C, Xie Z, Kasschau KD, Carrington JC: Cleavage of Scarecrow-like mRNA Targets Directed by a Class of Arabidopsis miRNA. Science. 2002, 297 (5589): 2053-2056. 10.1126/science.1076311.
Jones-Rhoades MW, Bartel DP: Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell. 2004, 14: 787-799. 10.1016/j.molcel.2004.05.027.
Strobel SA, Cech TR: Minor groove recognition of the conserved G.U pair at the Tetrahymena ribozyme reaction site. Science. 1995, 267: 675-679. 10.1126/science.7839142.
Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D: Specific Effects of MicroRNAs on the Plant Transcriptome. Developmental Cell. 2005, 8 (4): 517-527. 10.1016/j.devcel.2005.01.018.
Brodersen P, Sakvarelidze-Achard L, Bruun-Rasmussen M, Dunoyer P, Yamamoto YY, Sieburth L, Voinnet O: Widespread Translational Inhibition by Plant miRNAs and siRNAs. Science. 2008, 320 (5880): 1185-1190. 10.1126/science.1159151.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene Ontology: tool for the unification of biology. Nature Genet. 2000, 25: 25-29. 10.1038/75556.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressivemultiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
Clamp M, Cuff J, Searle SM, Barton GJ: The Jalview Java alignment editor. Bioinformatics. 2004, 20 (3): 426-427. 10.1093/bioinformatics/btg430.
Zhang B, Pan X, Cox S, Cobb G, Anderson T: Evidence that miRNAs are different from other RNAs. Cellular and Molecular Life Sciences (CMLS). 2006, 63 (2): 246-254. 10.1007/s00018-005-5467-7.
The authors would like to thank Drs. Anan Jongkaewwattana, Sithichoke Tangphatsornruang, and Supawadee Ingsriswang for helpful comments and discussions on the results. We thank Eakasit Pacharawongsakda and Somrak Numnark for their technical assistances in web interfaces and database construction. This work was supported by grant FC0033 B21 (SPA B1-1) from Cluster Program Management Office (CPMO), National Science and Technology Development Agency (NSTDA), Thailand.
WM performed the predictions and developed the database and its web interface. DW provided some annotation results, participated in the design of database, and drafted most of the manuscript. All co-authors read and approved the final manuscript.
Wuttichai Mhuantong and Duangdao Wichadakul contributed equally to this work.