MicroRNA-encoding long non-coding RNAs
- Shunmin He†1, 3,
- Hua Su†1, 3,
- Changning Liu†2, 3,
- Geir Skogerbø1,
- Housheng He1,
- Dandan He1,
- Xiaopeng Zhu1,
- Tao Liu1,
- Yi Zhao2Email author and
- Runsheng Chen1Email author
© He et al; licensee BioMed Central Ltd. 2008
Received: 30 November 2007
Accepted: 21 May 2008
Published: 21 May 2008
Recent analysis of the mouse transcriptional data has revealed the existence of ~34,000 messenger-like non-coding RNAs (ml-ncRNAs). Whereas the functional properties of these ml-ncRNAs are beginning to be unravelled, no functional information is available for the large majority of these transcripts.
A few ml-ncRNA have been shown to have genomic loci that overlap with microRNA loci, leading us to suspect that a fraction of ml-ncRNA may encode microRNAs. We therefore developed an algorithm (PriMir) for specifically detecting potential microRNA-encoding transcripts in the entire set of 34,030 mouse full-length ml-ncRNAs. In combination with mouse-rat sequence conservation, this algorithm detected 97 (80 of them were novel) strong miRNA-encoding candidates, and for 52 of these we obtained experimental evidence for the existence of their corresponding mature microRNA by microarray and stem-loop RT-PCR. Sequence analysis of the microRNA-encoding RNAs revealed an internal motif, whose presence correlates strongly (R2 = 0.9, P-value = 2.2 × 10-16) with the occurrence of stem-loops with characteristics of known pre-miRNAs, indicating the presence of a larger number microRNA-encoding RNAs (from 300 up to 800) in the ml-ncRNAs population.
Our work highlights a unique group of ml-ncRNAs and offers clues to their functions.
The transcriptional output from the genomes of prokaryotic or eukaryotic organisms can be divided into protein-coding mRNAs and non-protein coding RNAs (ncRNAs). Most known ncRNAs are relatively short, but longer messenger-like ncRNAs (ml-ncRNAs) are being detected in increasing numbers [1, 2]. Like mRNAs, these RNAs are the products of RNA polymerase II, and are often spliced, capped and polyadenylated . As of now, about one-third of the full-length cDNAs obtained in mice and humans, respectively, appear to be ml-ncRNAs [1, 2, 4], and several of these have been found to play essential roles in vivo. For example, female mice heterozygous for an internal deletion in the Xist gene undergo primary nonrandom inactivation of the wild-type X chromosome, indicating a critical role of Xist RNA for chromosome selection in X inactivation . RNA interference knockdown of the 6.7 kb ncRNA TUG1 in the retina of newborn mice resulted in malformed or nonexistent outer segments of transfected photoreceptors , and the activity of the transcription factor NFAT is repressed by the ml-ncRNA repressor NRON . However, most ml-ncRNAs have not yet been characterized, and further elucidation of ml-ncRNA function is an important project for future research on the transcriptome.
MicroRNAs (miRNAs) are usually processed from primary transcripts (pri-miRNAs) to precursor miRNAs (pre-miRNAs) in the nucleus by the RNase III Drosha . Pre-miRNAs are about 70 nt in length and have a stem-loop structure with a 2-nt 3'-overhang [8, 9]. The pre-miRNAs are subsequently transported to the cytoplasm by Exportin-5/Ran-GTP, and are further processed by Dicer to produce a ~22 bp duplex miRNA [8, 10–14]. The duplex is unraveled by an unidentified RNA helicase and one strand (the mature miRNA) is incorporated into the RNA induced silencing complex (RISC) to guide post-transcriptional gene silencing .
Although about the properties of miRNAs are rapidly being unravelled, less is known about the pri-miRNAs. Some pri-miRNAs are thought to be produced by RNA polymerase II, and are capped, polyadenylated and spliced [3, 10, 16]. The genomic loci of a few ml-ncRNAs overlap with known miRNAs , and whole-genome tiling array scans suggest that small RNA loci commonly overlap with longer transcripts, the longer RNAs possibly representing primary transcript of the shorter mature RNAs . The possibility thus exists that a fraction of the existing ml-ncRNAs function as precursors for miRNAs. In this study of mouse ml-ncRNAs, we identified 22 ml-ncRNAs encoding known miRNAs (henceforth labelled miRNA-encoding ncRNAs or me-ncRNAs), and developed a prediction procedure, PriMir, which predicted 97 me-ncRNA candidates among the 34,030 ml-ncRNAs in the FANTOM3 data. For about half of these candidates we obtained experimental evidence for the existence of their corresponding mature miRNA, and further analyses of both known and the candidate me-ncRNAs show that such transcripts frequently share a common motif. Our work specifies me-ncRNAs as a special class of ncRNAs, and suggests a role for these ml-ncRNAs whose functions were previously unidentified.
22 ml-ncRNAs encode known miRNAs
In the mouse genome there are 270 different pre-miRNA hairpins encoding 301 miRNAs (miRBase 8.0 ). In order to estimate how many of the 34,030 mouse ml-ncRNAs (FANTOM3 ) might encode a known miRNA, we identified the positions of all these ml-ncRNAs and pre-miRNAs in the mouse genome (mm7) using BLAT and Blastn, respectively. The result showed that 23 miRNA hairpins are located in exons of 22 ml-ncRNAs. Of these 22 ml-ncRNAs, three represent overlapping transcripts of different lengths that include the same pre-miRNA stem-loop structure, thus encoding the same miRNA [mmu-mir-22; see Table S1 in Additional file 1].
Computational analysis identifies strong me-ncRNA candidates
The next step was to predict the possible miRNA-encoding ml-ncRNAs. PriMir extracted about 184,000 hairpins (length >= 45 and paired bases >= 18) from the 34,030 ml-ncRNAs. To pick out the most likely pre-miRNA candidates we analyzed the conservation rate between mouse and rat for these sequences. In order to establish a threshold for the conservation filter, we aligned the 220 known mouse pre-miRNAs in the training set to the rat genome using Blastn. This resulted in 160 pre-miRNA sequences (> 70%) complying with two criteria: 1) The alignment lengths were larger than 45 nt, and 2) the identity of the alignment was 98% or higher. Therefore, we used these criteria for PriMir filtration, and obtained 4463 non-redundant conserved hairpins between mouse and rat, including 18 hairpins containing known pre-miRNAs.
Here xi is the value of feature i.
To reduce the number of false positives, PriMir score "7" was used as a cutoff value. This is a stringent criterion, as ROC curve analysis (see Methods for details) of the PriMir performance showed that the AUC (area under curve) is approximately 0.99, and that the false positive rate is 0 at a PriMir scores of 7 (see Figure S4A in Additional file 1). We identified 84 pre-miRNA candidates with PriMir scores of 7 or higher, corresponding to 97 potential me-ncRNAs. Among these me-ncRNA candidates, 17 were included in set of 22 known me-ncRNAs; thus, the remaining 80 represent novel me-ncRNA candidates and altogether 102 me-ncNRAs were picked out finally.
To further evaluate the performance of the PriMir prediction software, we carried out cross-validation analysis (see Performance analysis in the Methods part), which gave AUC values between 0.971 and 0.984, suggesting the prediction results are reliable. (see Figure S4B in Additional file 1). During the course of this work, there were published three miRNA prediction algorithms [21–23] that were available for use on a local computer. A comparison between PriMir and these three algorithms suggested that the PriMir method is at least equal to and may in some respect outperform these three methods(see Figure S4A in Additional file 1). Furthermore, in order to get an estimate of which of the 11 stem-loop features contributed most to the identification of the pre-miRNAs, we carried out a simplified analysis of this problem by investigating the effect of each feature when running PriMir on the positive and negative test sets (see Performance analysis in the Methods part). This identified five features with an apparently contribution: the number of paired bases in the 10-bp up- and down- stream extensions of the pre-miRNA; the total bulge size of the pre-miRNA; basepairs in the pre-miRNAs; basepairs in the mature miRNA and the minimum free energy of the pre-miRNA(see Figure S5 in Additional file 1).
Experimental validation of the predicted miRNAs
To experimentally validate the expression of the miRNAs encoded by the predicted me-ncRNA we spotted a microarray  with 168 26-nt probes corresponding to both arms of the 84 predicted pre-miRNAs, and hybridized this to size-fractioned RNA extracted from mouse tissues obtained from different developmental stages (see Methods). The microarray gave positive signals for 46 probes (see Figure S1 in Additional file 1), corresponding to 40 different pre-miRNAs. Of the 46 miRNAs, 14 had already been registered in miRBase 8.0, whereas the remaining 32 miRNAs, corresponding to 30 me-ncRNA candidates, are novel discoveries. (During the course of our work, 5 of the 32 novel miRNAs were also reported in the recent miRBase 9.2. release, thus lending further support to validity of our predictions.) As an additional validation we carried out stem-loop RT-PCR  (followed by sequencing) of the 32 novel miRNAs detected by the microarray, obtaining positive results for 26 of them (see Figure 1C, and Figure S2 in Additional file 1).
The expression levels of the investigated miRNAs appeared to be very low (Northern data, not shown). To make a comparison to the corresponding me-ncRNA expression levels we downloaded expression data for 20 different tissues for 15 of the experimentally supported me-ncRNAs from the Riken Expression Array Database (READ) . The analyses showed that the average expression levels of the me-ncRNAs were similar to those of the entire ml-ncRNA set, and that a few of the me-ncRNAs are relatively high expression levels in a limited number of tissues (P-value < 0.02), such as transcript AK132542 (Accession number in DDBJ) in pancreas, AK008483 in thymus and skin at neonate day 10, AK136882 in liver and pancreas. Thus, there appear to no strong correlation between the expression levels of the me-ncRNAs and their encoded miRNAs.
Together with the 22 me-ncRNAs corresponding to known miRNAs, we altogether obtained a set of 52 experimentally supported me-ncRNAs. Given that the miRNAs may be tissue or cell type specific, and/or only be expressed during a limited time interval or under specific physiological or environmental conditions, we regard the rest 50 as yet unsupported me-ncRNA candidates. See Table S2 in Additional file 1 for more information on all the 102 me-ncRNAs and candidates.
Motifs of the me-ncRNAs
Positional Weight Matrix for the IM sequence
Structure and conservation of me-ncRNA loci
Splicing and conservation characteristics of the me-ncRNAs
Conservation of me-ncRNAs and their corresponding pre-microRNAs between human and mouse.
Number of ES me-ncRNA/pre-miRNA
Number of me-ncRNA/pre-miRNA
Estimated numbers of me-ncRNAs
Discussion and conclusion
Based on hairpin conservation and a comprehensive list of pre-miRNA features, we have designed a computational procedure which detected 80 novel me-ncRNA candidates in the mouse genome and provided experimental support for the expression of a substantial fraction of their encoded miRNAs. Through the above analyses we have shown that the me-ncRNAs differ from other ml-ncRNAs in gene structure and sequence conservation, and that their sequence and expressional characteristics are also different from other pri-miRNAs.
The correlation between the internal motif and the PriMir score
An intriguing aspect of the analysis was the observed correlations between the presence of typical pre-miRNA characteristics (as represented by the PriMir score; PMS) and the occurrence of the internal motif IM within an mRNA-like ncRNA sequence. For the entire mRNA-like ncRNA collection there was a very strong correlation between the IM frequency and PMS, however, in the set of mRNA-like ncRNAs selected for hairpin sequence conservation this correlation was far weaker, despite the frequency of IM being higher in this set than in the entire mRNA-like ncRNA collection. There could be several explanations that would account for this discrepancy. The most straightforward is that the IM is associated with the miRNA encoding function of an ml-ncRNA, and that the processing of a stem-loop hairpin depends on either its interaction with general pri- and pre-miRNA processing factors (as indicated by its PMS value), or on more specific factors (in the case of conserved hairpins). In the first case, the IM would primarily be found associated with hairpins with high PMS values, where in the latter case, conserved hairpins should have a relatively high frequency of IMs, irrespective of PMS value. As the majority of IM-associated hairpins are not well conserved, this might imply that heavy reliance on sequence conservation may not be a particularly useful strategy for detection of a larger subset of me-ncRNAs. The strong correlation between IM and the PM score (which is likely to exemplify the typical pre-miRNA) in the full mRNA-like ncRNA collection therefore invites further work on computational miRNA detection based on other sources than sequence conservation. However, the IM sequence is quite short (containing only 7 partially conserved nucleotides), and further analysis of me-ncRNA sequences may reveal additional elements which could increase its predictive value.
Biogenesis and function of the me-ncRNAs
Previous knowledge on miRNA biogenesis assumes that pri-miRNAs are processed into pre-miRNAs in the nucleus by the Drosha complex, and then transported to the cytoplasm where further processing by Dicer occurs, resulting in the mature miRNA . The question of the sub-cellular localization of me-ncRNAs has not yet been investigated, but a few primary miRNA transcripts have been reported to accumulate in cytoplasm [30, 31], The fact that me-ncRNAs are sufficiently stable to be cloned as full-length cDNAs, and that they retain several mRNA-like characteristics (splicing, capping, polyadenylation) would suggest that they may follow the path of coding mRNAs and be exported to the cytoplasm. Increasing evidence that post-transcriptional miRNA processing is subject to regulatory activity [31–34] and the apparent differences in the expression levels of the me-ncRNAs and their encoded miRNAs found here, further allows for a hypothesis in which me-ncRNAs constitute a miRNA storage form, possibly in addition to other functional properties of the intact me-ncRNA transcript. This storage may be maintained through low transcriptional and degradation activity of the me-ncRNAs, and producing only low levels of mature miRNA release under normal conditions. Upon some triggering event it could then enable a quick release of a larger amount of the mature miRNA through me-ncRNA processing without requiring transcriptional activation of the me-ncRNA locus. This in turn begs the question of whether there might exist a cytoplasmic pathway for miRNA maturation, or if the mature me-ncRNA re-enters the nucleus for processing by Drosha before the miRNA is released. In any case, there is the possibility that me-ncRNAs may have other cellular functions in addition to that of encoding miRNAs, as found for a number of other ml-ncRNAs [35–37], and that they therefore exist in other cellular compartments and are maintained at higher steady state levels than pri-miRNAs whose only role is to generate mature miRNAs.
In fact, the phenomenon of long primary transcripts encoding shorter functional ncRNAs is by not limited to ml-ncRNAs encoding miRNAs. Whole-genome tiling array scans have revealed that many small RNAs have genomic loci that overlap with longer transcripts, and the longer RNAs may represent primary transcripts for the shorter mature RNAs . It is thus not implausible that a fraction of the ml-ncRNAs may serve as vectors or storage forms for short ncRNAs, which are then released when needed to perform their cellular functions. Our finding that a considerable number of ml-ncRNAs actually encodes miRNA could suggest that serving as the primary transcript of various classes of short ncRNAs may be a common function of longer ncRNAs.
Databases and Software
Data collection: Sequences of 34,030 mouse ml-ncRNAs were downloaded from the FAMTOM3 database . Known mouse miRNAs were downloaded from miRBase release 8.0 . The mouse (mm7), rat (rn3) and human (hg17) genome sequences were downloaded from UCSC . Expression profiles for ml-ncRNAs were collected from the Riken Expression Array Database [26, 41].
PhastCons Scores: The conservation scores for alignments of 16 vertebrate genomes with mouse (PhastCons17Scores) were downloaded from the UCSC web site .
Sequence logos: Logos of sequences were generated by web server at UC Berkeley .
Training and background sets
To create a training set we needed to elicit the common features of known pre-miRNAs. In the miRBase release 8.0, there are altogether 270 different hairpins corresponding to 301 mouse miRNAs. First, the pre-miRNAs containing shorter mature microRNAs (<20 nt) or whose mature microRNA sequence extended into the loop region of the predicted stem-loop structure were filtered out. From this set we then removed the pre-microRNAs whose stem-loop structures could not be predicted by RNAfold (using the pre-miRNA and 200 nt flanking sequence in both directions), This left 220 of the 270 hairpins to be used as the training set.
We also needed to construct a background set of non-pre-miRNA hairpins to estimate the background noise. We predicted RNA secondary structures of the 34,030 ml-ncRNAs in FANTOM3 using RNAfold [44, 45], and extracted hairpins from them based on two conditions: 1) The length of the hairpin should be longer than 45 nt, and 2) the number of paired bases in the hairpin should be more than 28 (14 base pairs). This step resulted in about 184,000 predicted hairpins. To create a hairpin background set representing the distribution of 11 features of stem-loop structure sequences with random length, we randomly reduced the lengths of the hairpins and used all of them as the background set.
The eleven features used by PriMir
PriMir predicts pre-miRNAs according to the PMS matrix, which is based on eleven features found in the sequence or secondary structure of known pre-miRNAs [46, 47]. The eleven features are: 1) the total number of paired bases in the 10-bp up- and down-stream extensions of the pre-miRNA; 2) the total bulge size of the pre-miRNA, i.e. the total number of nucleotides in all bulges in the pre-miRNA; 3) the total number of paired bases in the pre-miRNA; 4) the length of the loop in the pre-miRNA; 5) the distance between the mature miRNA and the terminal loop; 6) the sequence bias of the first five bases in the mature miRNA; 7) the total number of paired bases in the mature miRNA portion of the pre-miRNA; 8) the minimum free energy (mfe) of the pre-miRNA stem-loop calculated with the RNAfold program; 9) the length of the pre-miRNA; 10) the GC content of the pre-miRNA; and 11) the GC content of the mature miRNA; (see Figure S3 in Additional file 1).
The reliability of the PriMir prediction method was evaluated by cross-validation analysis. The training and background sets used to establish the PMS Matrix was divided into five equal parts. Four of these parts were selected to establish the PMS Matrix, whereas the remaining one part (from both training and background set) was used to test the performance of PriMir method by using the ROC-curve analysis. The above analysis was repeated 5 times, each time using a different portion of the data as test data set (see Figure S4B in Additional file 1).
To evaluate the performance of PriMir with a ROC curve, we constructed a positive and a negative set of stem-loop hairpins. For the positive set, we aligned the 432 mouse pre-miRNAs in miRBase 10.1 to the rat genome (blastn; identity >= 98%, alignment >= 45 nt) and obtained 208 conserved pre-miRNAs. To obtain a fair appraisal of the PriMir method relative to other methods, we removed those pre-miRNAs that were included in the PriMir training set from the 208 pre-miRNAs, which left us with a positive set of 75 pre-miRNAs. To obtain a negative set, we downloaded 198,536 refseq exons from the mouse mm9 genome (UCSC genome browser) and predicted stem-loop hairpin (length >= 45, paired bases >= 28) with RNAfold. This gave 48,314 hairpins, which were aligned to the rat genome (blastn; identity >= 98%, alignment >= 45 nt). This yielded about 9000 conserved hairpins from which we randomly selected 500 to constitute the negative set.
The me-ncRNA internal motif
We constructed an IM Position Weight Matrix (PWM) according to the MEME  analysis of 30 me-ncRNAs confirmed by stem-loop RT-PCR. Then all 52 experimentally confirmed me-ncRNAs were iteratively analyzed and the IM PWM optimized according to each new result. Table 1 shows the final constringent IM PWM after 4 rounds of iterative analysis. PWM score 7.0 was used in this study as the cutoff value for the occurrence of IM.
The sequences of the 34,030 ml-ncRNAs were mapped to the mouse genome (mm7) through the following steps. Firstly, the sequences were aligned to the genome using BLAT with options -fine -q rna. Then, the coverage (number of matches/full length of transcript sequence) of each alignment was calculated, and the low-quality alignments with coverage of less than 70% were removed. Finally, the alignments were modified according to the positions of exons from neighboring alignments.
We mapped the sequences of hairpins instead of miRNAs to the mouse genome (mm7) using Blastn program. The alignment results with an alignment length equal to the length of the hairpin and an identity of 100% were extracted. Blastn was downloaded from NCBI .
Tissues and total RNA extraction
0-day neonate C57 BL/6 mice were provided by Vitalriver Laboratory Animal Technology Co., Ltd., Beijing. 15-day embryonic C57 BL/6 mice were provided by the Department of Laboratory Animals of Peking University Health Science Center. Male adult mice were provided by the Laboratory Animal Center, Institute of Genetics, Chinese Academy of Sciences. Total RNAs were extracted from (1) brain and thymus of 0-day neonate C57 BL/6 mice, (2) brain of male adult C57 BL/6 mice, and (3) whole body of 15-day embryonic C57 BL/6 mice, using the Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions.
Detection of miRNAs by microarray
This work was carried out at CapitalBio Corp. in Beijing, China, according to their in-house technology for miRNA detection . We designed 168 26-nt oligonucleotide probes corresponding to both arms of the 84 predicted pre-miRNAs. In addition, we designed 8 19–24 nt oligonucleotides possessing no homology with any known RNA sequence and produced 7 complementary oligos to simulate miRNAs by in vitro transcription. To facilitate subsequent hybridization, poly-Ts were added to the 5'-end of the probes, resulting in 42-nt oligonucleotide probes (see Table S3 in Additional file 1). Each probe was printed in triplicate using a SmartArray™ microarrayer. Low-molecular-weight RNAs (<200 nt) were isolated from total RNAs by the PEG precipitation approach , and labeled using T4 RNA ligase  Hybridization was performed using LifterSlip™. Arrays were scanned with a confocal LuxScan™ scanner and Data were extracted from the TIFF images using LuxScan™ 3.0 software.
Detection of miRNAs by stem-loop RT-PCR
Stem-loop RT-PCR experiments were performed to validate the miRNAs detected by microarrays. The procedure was essentially carried out as described by Chen et al.  and all primers were listed in Table S4 (see Additional file 1). Briefly, small RNAs extracted from a mixture of total RNAs (1), (2) and (3) of C57 BL/6 mice (see "Tissues and total RNA extraction" above) using the mir Vana™. Then, PCRs were performed using 1 μl of the RT products as template in a 20 μl reaction volume with Taq DNA polymerase (Invitrogen, Brazil, Cat #10966-030). The reactions were incubated at 94°C for 5 min, followed by 40 cycles of 94°C for 15 sec, 45°C for 30 sec and 60°C for 30 sec, with a final incubation at 60°C for 2 min. The elongated PCR products (about 60 bp in size) were cloned into pGEM-T (Promega A3600) and sequenced at Invitrogen.
This work was supported by the National Key Basic Research & Development Program (973) under Grant Nos. 2002CB713805 and 2003CB715907, National Sciences Foundation of China under Grant Nos. 30630040, 30570393 and 30600729. All data and supplementary can be download through the eBioMed website .
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, Chalk AM, Chiu KP, Choudhary V, Christoffels A, Clutterbuck DR, Crowe ML, Dalla E, Dalrymple BP, de Bono B, Della Gatta G, di Bernardo D, Down T, Engstrom P, Fagiolini M, Faulkner G, Fletcher CF, Fukushima T, Furuno M, Futaki S, Gariboldi M, Georgii-Hemming P, Gingeras TR, Gojobori T, Green RE, Gustincich S, Harbers M, Hayashi Y, Hensch TK, Hirokawa N, Hill D, Huminiecki L, Iacono M, Ikeo K, Iwama A, Ishikawa T, Jakt M, Kanapin A, Katoh M, Kawasawa Y, Kelso J, Kitamura H, Kitano H, Kollias G, Krishnan SP, Kruger A, Kummerfeld SK, Kurochkin IV, Lareau LF, Lazarevic D, Lipovich L, Liu J, Liuni S, McWilliam S, Madan Babu M, Madera M, Marchionni L, Matsuda H, Matsuzawa S, Miki H, Mignone F, Miyake S, Morris K, Mottagui-Tabar S, Mulder N, Nakano N, Nakauchi H, Ng P, Nilsson R, Nishiguchi S, Nishikawa S, Nori F, Ohara O, Okazaki Y, Orlando V, Pang KC, Pavan WJ, Pavesi G, Pesole G, Petrovsky N, Piazza S, Reed J, Reid JF, Ring BZ, Ringwald M, Rost B, Ruan Y, Salzberg SL, Sandelin A, Schneider C, Schonbach C, Sekiguchi K, Semple CA, Seno S, Sessa L, Sheng Y, Shibata Y, Shimada H, Shimada K, Silva D, Sinclair B, Sperling S, Stupka E, Sugiura K, Sultana R, Takenaka Y, Taki K, Tammoja K, Tan SL, Tang S, Taylor MS, Tegner J, Teichmann SA, Ueda HR, van Nimwegen E, Verardo R, Wei CL, Yagi K, Yamanishi H, Zabarovsky E, Zhu S, Zimmer A, Hide W, Bult C, Grimmond SM, Teasdale RD, Liu ET, Brusic V, Quackenbush J, Wahlestedt C, Mattick JS, Hume DA, Kai C, Sasaki D, Tomaru Y, Fukuda S, Kanamori-Katayama M, Suzuki M, Aoki J, Arakawa T, Iida J, Imamura K, Itoh M, Kato T, Kawaji H, Kawagashira N, Kawashima T, Kojima M, Kondo S, Konno H, Nakano K, Ninomiya N, Nishio T, Okada M, Plessy C, Shibata K, Shiraki T, Suzuki S, Tagami M, Waki K, Watahiki A, Okamura-Oho Y, Suzuki H, Kawai J, Hayashizaki Y: The transcriptional landscape of the mammalian genome. Science. 2005, 309 (5740): 1559-1563. 10.1126/science.1112014.PubMedView ArticleGoogle Scholar
- Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A, Hayashi K, Sato H, Nagai K, Kimura K, Makita H, Sekine M, Obayashi M, Nishi T, Shibahara T, Tanaka T, Ishii S, Yamamoto J, Saito K, Kawai Y, Isono Y, Nakamura Y, Nagahari K, Murakami K, Yasuda T, Iwayanagi T, Wagatsuma M, Shiratori A, Sudo H, Hosoiri T, Kaku Y, Kodaira H, Kondo H, Sugawara M, Takahashi M, Kanda K, Yokoi T, Furuya T, Kikkawa E, Omura Y, Abe K, Kamihara K, Katsuta N, Sato K, Tanikawa M, Yamazaki M, Ninomiya K, Ishibashi T, Yamashita H, Murakawa K, Fujimori K, Tanai H, Kimata M, Watanabe M, Hiraoka S, Chiba Y, Ishida S, Ono Y, Takiguchi S, Watanabe S, Yosida M, Hotuta T, Kusano J, Kanehori K, Takahashi-Fujii A, Hara H, Tanase TO, Nomura Y, Togiya S, Komai F, Hara R, Takeuchi K, Arita M, Imose N, Musashino K, Yuuki H, Oshima A, Sasaki N, Aotsuka S, Yoshikawa Y, Matsunawa H, Ichihara T, Shiohata N, Sano S, Moriya S, Momiyama H, Satoh N, Takami S, Terashima Y, Suzuki O, Nakagawa S, Senoh A, Mizoguchi H, Goto Y, Shimizu F, Wakebe H, Hishigaki H, Watanabe T, Sugiyama A, Takemoto M, Kawakami B, Yamazaki M, Watanabe K, Kumagai A, Itakura S, Fukuzumi Y, Fujimori Y, Komiyama M, Tashiro H, Tanigami A, Fujiwara T, Ono T, Yamada K, Fujii Y, Ozaki K, Hirao M, Ohmori Y, Kawabata A, Hikiji T, Kobatake N, Inagaki H, Ikema Y, Okamoto S, Okitani R, Kawakami T, Noguchi S, Itoh T, Shigeta K, Senba T, Matsumura K, Nakajima Y, Mizuno T, Morinaga M, Sasaki M, Togashi T, Oyama M, Hata H, Watanabe M, Komatsu T, Mizushima-Sugano J, Satoh T, Shirai Y, Takahashi Y, Nakagawa K, Okumura K, Nagase T, Nomura N, Kikuchi H, Masuho Y, Yamashita R, Nakai K, Yada T, Nakamura Y, Ohara O, Isogai T, Sugano S: Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet. 2004, 36 (1): 40-45. 10.1038/ng1285.PubMedView ArticleGoogle Scholar
- Erdmann VA, Szymanski M, Hochberg A, de Groot N, Barciszewski J: Collection of mRNA-like non-coding RNAs. Nucleic Acids Res. 1999, 27 (1): 192-195. 10.1093/nar/27.1.192.PubMedPubMed CentralView ArticleGoogle Scholar
- Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M: Global identification of human transcribed sequences with genome tiling arrays. Science. 2004, 306 (5705): 2242-2246. 10.1126/science.1103388.PubMedView ArticleGoogle Scholar
- Marahrens Y, Loring J, Jaenisch R: Role of the Xist gene in X chromosome choosing. Cell. 1998, 92 (5): 657-664. 10.1016/S0092-8674(00)81133-2.PubMedView ArticleGoogle Scholar
- Young TL, Matsuda T, Cepko CL: The noncoding RNA taurine upregulated gene 1 is required for differentiation of the murine retina. Curr Biol. 2005, 15 (6): 501-512. 10.1016/j.cub.2005.02.027.PubMedView ArticleGoogle Scholar
- Willingham AT, Orth AP, Batalov S, Peters EC, Wen BG, Aza-Blanc P, Hogenesch JB, Schultz PG: A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science. 2005, 309 (5740): 1570-1573. 10.1126/science.1115901.PubMedView ArticleGoogle Scholar
- Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark O, Kim S, Kim VN: The nuclear RNase III Drosha initiates microRNA processing. Nature. 2003, 425 (6956): 415-419. 10.1038/nature01957.PubMedView ArticleGoogle Scholar
- Cullen BR: Transcription and processing of human microRNA precursors. Mol Cell. 2004, 16 (6): 861-865. 10.1016/j.molcel.2004.12.002.PubMedView ArticleGoogle Scholar
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116 (2): 281-297. 10.1016/S0092-8674(04)00045-5.PubMedView ArticleGoogle Scholar
- Bohnsack MT, Czaplinski K, Gorlich D: Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. Rna. 2004, 10 (2): 185-191. 10.1261/rna.5167604.PubMedPubMed CentralView ArticleGoogle Scholar
- Yi R, Qin Y, Macara IG, Cullen BR: Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev. 2003, 17 (24): 3011-3016. 10.1101/gad.1158803.PubMedPubMed CentralView ArticleGoogle Scholar
- Saito K, Ishizuka A, Siomi H, Siomi MC: Processing of pre-microRNAs by the Dicer-1-Loquacious complex in Drosophila cells. PLoS Biol. 2005, 3 (7): e235-10.1371/journal.pbio.0030235.PubMedPubMed CentralView ArticleGoogle Scholar
- Lee YS, Nakahara K, Pham JW, Kim K, He Z, Sontheimer EJ, Carthew RW: Distinct roles for Drosophila Dicer-1 and Dicer-2 in the siRNA/miRNA silencing pathways. Cell. 2004, 117 (1): 69-81. 10.1016/S0092-8674(04)00261-2.PubMedView ArticleGoogle Scholar
- Pham JW, Pellino JL, Lee YS, Carthew RW, Sontheimer EJ: A Dicer-2-dependent 80s complex cleaves targeted mRNAs during RNAi in Drosophila. Cell. 2004, 117 (1): 83-94. 10.1016/S0092-8674(04)00258-2.PubMedView ArticleGoogle Scholar
- Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN: MicroRNA genes are transcribed by RNA polymerase II. Embo J. 2004, 23 (20): 4051-4060. 10.1038/sj.emboj.7600385.PubMedPubMed CentralView ArticleGoogle Scholar
- Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A: Identification of mammalian microRNA host genes and transcription units. Genome Res. 2004, 14 (10A): 1902-1910. 10.1101/gr.2722704.PubMedPubMed CentralView ArticleGoogle Scholar
- Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermueller J, Hofacker IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt G, Ganesh M, Ghosh S, Piccolboni A, Sementchenko V, Tammana H, Gingeras TR: RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription. Science. 2007Google Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006, 34 (Database issue): D140-4. 10.1093/nar/gkj112.PubMedPubMed CentralView ArticleGoogle Scholar
- Lee Y, Jeon K, Lee JT, Kim S, Kim VN: MicroRNA maturation: stepwise processing and subcellular localization. Embo J. 2002, 21 (17): 4663-4670. 10.1093/emboj/cdf476.PubMedPubMed CentralView ArticleGoogle Scholar
- Sewer A, Paul N, Landgraf P, Aravin A, Pfeffer S, Brownstein MJ, Tuschl T, van Nimwegen E, Zavolan M: Identification of clustered microRNAs using an ab initio prediction method. BMC Bioinformatics. 2005, 6: 267-10.1186/1471-2105-6-267.PubMedPubMed CentralView ArticleGoogle Scholar
- Xue C, Li F, He T, Liu GP, Li Y, Zhang X: Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics. 2005, 6: 310-10.1186/1471-2105-6-310.PubMedPubMed CentralView ArticleGoogle Scholar
- Jiang P, Wu H, Wang W, Ma W, Sun X, Lu Z: MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res. 2007, 35 (Web Server issue): W339-44. 10.1093/nar/gkm368.PubMedPubMed CentralView ArticleGoogle Scholar
- LUO Ming-Yong, TIAN Zhi-Gang, WANG Ying-Xiong, XU Zhi, ZHANG Liang, CHENG Jing: Construction and application of a microarray for profiling microRNA expression. Progress in Biochemistry and Biophysics. 2007, 34 (1): 1-11.Google Scholar
- Chen C, Ridzon DA, Broomer AJ, Zhou Z, Lee DH, Nguyen JT, Barbisin M, Xu NL, Mahuvakar VR, Andersen MR, Lao KQ, Livak KJ, Guegler KJ: Real-time quantification of microRNAs by stem-loop RT-PCR. Nucleic Acids Res. 2005, 33 (20): e179-10.1093/nar/gni178.PubMedPubMed CentralView ArticleGoogle Scholar
- Bono H, Yagi K, Kasukawa T, Nikaido I, Tominaga N, Miki R, Mizuno Y, Tomaru Y, Goto H, Nitanda H, Shimizu D, Makino H, Morita T, Fujiyama J, Sakai T, Shimoji T, Hume DA, Hayashizaki Y, Okazaki Y: Systematic expression profiling of the mouse transcriptome using RIKEN cDNA microarrays. Genome Res. 2003, 13 (6B): 1318-1323. 10.1101/gr.1075103.PubMedPubMed CentralView ArticleGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15 (8): 1034-1050. 10.1101/gr.3715005.PubMedPubMed CentralView ArticleGoogle Scholar
- Numata K, Kanai A, Saito R, Kondo S, Adachi J, Wilming LG, Hume DA, Hayashizaki Y, Tomita M: Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Res. 2003, 13 (6B): 1301-1306. 10.1101/gr.1011603.PubMedPubMed CentralView ArticleGoogle Scholar
- Hirsch J, Lefort V, Vankersschaver M, Boualem A, Lucas A, Thermes C, d'Aubenton-Carafa Y, Crespi M: Characterization of 43 non-protein-coding mRNA genes in Arabidopsis, including the MIR162a-derived transcripts. Plant Physiol. 2006, 140 (4): 1192-1204. 10.1104/pp.105.073817.PubMedPubMed CentralView ArticleGoogle Scholar
- Bartolomei MS, Zemel S, Tilghman SM: Parental imprinting of the mouse H19 gene. Nature. 1991, 351 (6322): 153-155. 10.1038/351153a0.PubMedView ArticleGoogle Scholar
- Obernosterer G, Leuschner PJF, Alenius M, Marinez J: Post-transcriptional regulation of microRNA expression. RNA. 2006, 12: 1161-1167. 10.1261/rna.2322506.PubMedPubMed CentralView ArticleGoogle Scholar
- Bracht J, Hunter S, Eachus R, Weeks P, Pasquinelli AE: Trans-splicing and polyadenylation of let-7 microRNA primary transcripts. Rna. 2004, 10 (10): 1586-1594. 10.1261/rna.7122604.PubMedPubMed CentralView ArticleGoogle Scholar
- Thomson JM, Parker J, Perou CM, Hammond SM: A custom microarray platform for analysis of microRNA gene expression. Nat Methods. 2004, 1 (1): 47-53. 10.1038/nmeth704.PubMedView ArticleGoogle Scholar
- Wulczyn FG, Smirnova L, Rybak A, Brandt C, Kwidzinski E, Ninnemann O, Strehle M, Seiler A, Schumacher S, Nitsch R: Post-transcriptional regulation of the let-7 microRNA during neural cell specification. FASEB J. 2007, 21 (2): 415-426. 10.1096/fj.06-6130com.PubMedView ArticleGoogle Scholar
- Costa FF: Non-coding RNAs: Lost in translation?. Gene. 2006Google Scholar
- Willingham AT, Gingeras TR: TUF love for "junk" DNA. Cell. 2006, 125 (7): 1215-1220. 10.1016/j.cell.2006.06.009.PubMedView ArticleGoogle Scholar
- Huttenhofer A, Schattner P, Polacek N: Non-coding RNAs: hope or hype?. Trends Genet. 2005, 21 (5): 289-297. 10.1016/j.tig.2005.03.007.PubMedView ArticleGoogle Scholar
- FAMTOM3 database. [http://fantom3.gsc.riken.jp/]
- miRBase database. [http://microrna.sanger.ac.uk/]
- UCSC. [http://genome.ucsc.edu/]
- READ database. [http://read.gsc.riken.go.jp/]
- PhastCons. [http://hgdownload.cse.ucsc.edu/goldenPath/mm7/phastCons17Scores]
- WebLogo. [http://weblogo.berkeley.edu/]
- I.L. Hofacker WF: Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f Chemie. 1994, 125: 167-188. 10.1007/BF00818163.View ArticleGoogle Scholar
- M. Zuker PS: Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information. Nucl Acid Res. 1981, 9: 133-148. 10.1093/nar/9.1.133.View ArticleGoogle Scholar
- Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP: The microRNAs of Caenorhabditis elegans. Genes Dev. 2003, 17 (8): 991-1008. 10.1101/gad.1074403.PubMedPubMed CentralView ArticleGoogle Scholar
- Zeng Y, Cullen BR: Efficient processing of primary microRNA hairpins by Drosha requires flanking nonstructured RNA sequences. J Biol Chem. 2005, 280 (30): 27595-27603. 10.1074/jbc.M504714200.PubMedView ArticleGoogle Scholar
- Bailey TL, Elkan C: The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol. 1995, 3: 21-29.PubMedGoogle Scholar
- NCBI. [http://www.ncbi.nlm.nih.gov/]
- Watanabe T, Takeda A, Mise K, Okuno T, Suzuki T, Minami N, Imai H: Stage-specific expression of microRNAs during Xenopus development. FEBS Lett. 2005, 579 (2): 318-324. 10.1016/j.febslet.2004.11.067.PubMedView ArticleGoogle Scholar
- eBioMed. [http://www.ebiomed.org/pub/mencrna.htm]
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.