Alu-directed transcriptional regulation of some novel miRNAs
© Gu et al. 2009
Received: 28 July 2009
Accepted: 30 November 2009
Published: 30 November 2009
Skip to main content
© Gu et al. 2009
Received: 28 July 2009
Accepted: 30 November 2009
Published: 30 November 2009
Despite many studies on the biogenesis, molecular structure and biological functions of microRNAs, little is known about the transcriptional regulatory mechanisms controlling the spatiotemporal expression pattern of human miRNA gene loci. Several lines of experimental results have indicated that both polymerase II (Pol-II) and polymerase III (Pol-III) may be involved in transcribing miRNAs. Here, we assessed the genomic evidence for Alu-directed transcriptional regulation of some novel miRNA genes in humans. Our data demonstrate that the expression of these Alu-related miRNAs may be modulated by Pol-III.
We present a comprehensive exploration of the Alu-directed transcriptional regulation of some new miRNAs. Using a new computational approach, a variety of Alu-related sequences from multiple sources were pooled and filtered to obtain a subset containing Alu elements and characterized miRNA genes for which there is clear evidence of full-length transcription (embedded in EST). We systematically demonstrated that 73 miRNAs including five known ones may be transcribed by Pol-III through Alu or MIR. Among the new miRNAs, 33 were determined by high-throughput Solexa sequencing. Real-time TaqMan PCR and Northern blotting verified that three newly identified miRNAs could be induced to co-express with their upstream Alu transcripts by heat shock or cycloheximide.
Through genomic analysis, Solexa sequencing and experimental validation, we have identified candidate sequences for Alu-related miRNAs, and have found that the transcription of these miRNAs could be governed by Pol-III. Thus, this study may elucidate the mechanisms by which the expression of a class of small RNAs may be regulated by their upstream repeat elements.
MicroRNAs (miRNAs) are a class of small non-coding RNAs (ncRNAs) about 22 nt in length. They control fundamental cellular activities such as differentiation, proliferation, apoptosis and others in different species by regulating gene expression [1–3]. Although miRNAs were discovered more than a decade ago, their transcription remains insufficiently understood. They are believed to be transcribed by polymerase II (Pol-II) [4–6]. However, new research on ncRNA transcription indicates that polymerase III (Pol-III) may participate in this process [7–9]. Pol-III is usually recognized as transcribing housekeeping ncRNAs and short interspersed nuclear elements (SINEs) such as tRNAs, 5s-rRNAs and Alu [7, 10, 11]. In 2004, a study revealed that the exogenous Pol-III promoter can initiate miRNA transcription . Since then, several lines of evidence have shown that Pol-III can transcribe miRNAs downstream of tRNAs, Alu and other SINEs [7, 13, 14], but whether this is a common mechanism is still not clear.
In the haploid human genome of three billion base-pairs, the sequences of protein-encoding genes constitute about 3%, whereas repeats and transposons constitute up to 45%. Alu elements are among the most abundant transposons, constituting 11% of the human genome . Alu is about 300 nt in full length, including left and right arms with Poly A sequences between them and at the end . More importantly, it affects genome recombination, RNA transcription, alternative splicing, translation, DNA replication and methylation, and other processes [16, 17]. Alu insertion may cause many diseases [18, 19]. Therefore, Alu has gradually attracted more and more attention and has been extensively studied in relation to transcription. It is generally believed to be transcribed by Pol-III through internal promoters, the A box and B box [20, 21]. Because Alu does not code for a terminator, Pol-III usually reads through its sequence until it reaches a downstream terminator [22, 23]. Thus, Pol-III may transcribe sequences downstream of Alu elements.
Therefore, if miRNAs follow Alu elements closely or reside within Alu, they are very liable to be transcribed through Alu by Pol-III. Moreover, it has been demonstrated that Alu can serve as a promoter for miRNA transcription . It has also been found that Pol-III transcribes small RNAs through tRNAs or tRNA-like sequences in Trypanosomatid protozoa, nematodes and plants [24–26], while in the human virus murine gammaherpesvirus 68 (MHV68), Pol-III transcribes downstream miRNAs through tRNA . tRNAs differ from Alu in sequence but are similar in transcription. They both have the A box and B box that are recognized and bound by Pol-III [7, 27]. It is reasonable to presume that Pol-III can transcribe other ncRNAs downstream of Alu elements or other repeats. Taking Alu as an example, we propose the hypothesis that the transcription of a class of new miRNA genes can be linked to their upstream Alu transcription, and on this basis we have conducted a group of comprehensive studies.
Of the 60 sequences, 14 were located between protein-coding genes, 19 within the antisense strands of introns and 27 within introns, as shown in Figure 2D. Twenty-seven sequences derived from introns could be mapped to 65 different introns, suggesting that their sources are multiple loci.
The locations of 24 miRNAs downstream of Alu elements in the human genome.
The same method was also used to explore other repeats. MIR belongs to the same SINE family, and was closely followed by candidate miRNA sequences. Besides the four known miRNAs, a further nine sequences downstream of MIR were verified (Additional file 6). Although the functions of MIR are not yet established, it may have some transcriptional activities. Whether MIR can direct the transcription of its downstream miRNAs awaits further study.
To confirm the involvement of Pol-III in miRNA transcription further and exclude the possibility of Pol-II-mediated miRNA generation, HeLa cells were treated with tagetitoxin at a concentration (4 U/ml) that specifically inhibits Pol-III transcription . As shown in Figure 3F, the levels of mature AluJb-641, AluJo-576611 and AluJo-135090 was dramatically enhanced in heat-treated cells, while the level of let-7a, used as a Pol-III negative control, remained unchanged. More importantly, tagetitoxin specifically inhibited Pol-III and resulted in failure to detect mature AluJb-641, AluJo-576611 and AluJo-135090 bands in the 22 nt marker region. However, the transcription of let-7a, which is driven by Pol-II, was not affected by tagetitoxin. This suggests that the transcription of some miRNAs can be regulated by Pol-III and supports our hypothesis that Alu elements may be the promoter specific for nearby downstream ncRNAs.
To clarify whether such intron-derived miRNAs can be co-expressed when their host genes are transcribed, we investigated the expression profiles of AluJb-641 and its host gene. The former was found to reside within the seventh intron of ATAD3B, which encodes a mitochondrial membrane protein. This intron was 2,401 bp in full length and AluJb-641 was located at locus 192. RT-PCR showed that expression of the host gene was not enhanced by heat shock or cycloheximide stimulation (Figure 3C), which generally induce high expression of Alu and AluJb-641 (Figure 3A). Thus, AluJb-641 may not be co-expressed with its host gene, but along with the expression of its upstream Alu. Our results also showed that ATAD3B was not co-expressed with Alu.
The effect of Alu on other ncRNAs has rarely been studied. In recent years, as ncRNA research has progressed rapidly, more and more repeats have been found to be important for gene transcription. Dieci et al. comprehensively summarized the evidence that Pol-III may transcribe their downstream sequences through tRNAs, repeats or ncRNAs . On the basis of previous studies, we hereby propose the hypothesis that some short interspersed nuclear elements (SINEs) without terminator sequences may trigger the transcription of their downstream miRNAs. Guided by this hypothesis, we investigated the mechanism of miRNA transcription through Alu (Figure 2A). We not only predicted 60 Alu-related miRNAs that might be transcribed through Alu, including the previously reported miR-517a, but also identified 23 of them by Solexa sequencing. More importantly, using induction by heat shock and cycloheximide, we showed that the expression of Alu RNAs was consistent with the expression of three miRNAs downstream of those elements, providing clear-cut evidence that transcription of some miRNAs may be initiated by Alu transcription.
Recently, studies on miRNA transcription by two different groups have indicated that Pol-II or Pol-III is associated with miR-517a transcription [14, 34]. RNA Pol-I is known to be insensitive to α-amanitin; RNA Pol-II is very sensitive to it and RNA Pol-III moderately sensitive. In order to evade the ambiguous effects of α-amanitin, we used tagetitoxin as a specific Pol-III inhibitor . As expected, our results showed that tagetitoxin significantly repressed the expression of Pol-III-mediated miRNAs, but did not affect the transcription of the Pol-II-driven let-7a. Considered together, the inhibitors used for Pol-III may be a key factor in the differences among research results. Although only the example of Alu RNA has been investigated in this article, we cannot exclude the possibility that other SINEs play the same role as Alu, because some known and predicted miRNAs were also found downstream of MIR (Additional file 5). Transcription of these miRNAs seems likely to depend on key sequences such as A/B boxes or other motifs embedded within Alu elements.
Other members of the SINE family, in addition to Alu, all originate from tRNAs and have a classical structure comprising the following three parts: first, a promoter transformed from tRNA; second, a sequence specific for the SINE; third, a sequence needed for reverse transcription, similar to the 3' end of LINE and simple repeats [8, 35]. While the other SINE family members have mostly evolved from tRNAs, they are different from Alu in structure but very similar in transcription. It is generally believed that in both tRNAs and SINEs, the A box and B box within the left arm are used to regulate transcription. Of these two boxes, the box B may play a decisive role [7, 20, 21, 27]. Therefore, the transcription of SINE elements is also very likely to trigger the expression of their downstream small RNAs.
However, among the three newly identified miRNAs that have transcriptional activity and are co-expressed with Alu elements, only AluJo-576611 and AluJo-135090 contain the classic B box, and AluJb-641 has no B box; neither has the upstream AluSg/x-1110454, which has been shown to be able to transcribe miR-517a. Even though AluSg/x-1110454 was mapped to the left arm of the uniform sequence of Alu in our multiple sequencing alignments, its B box was severely mutated. In the B box, the critical nucleotides are the first locus G and the third locus T, but the third locus of AluSg/x-1110454 is G. In the multiple sequencing alignments, one motif, GAGGCTGAGG, was found to be highly conserved. It is present not only in these four sequences but also in uniform sequences of all Alu family members. This motif may be of great significance for the transcription of Alu and its downstream miRNAs, or have a similar function to the B box in regulating Alu transcription. However, this possibility still awaits further study.
When miRNAs were discovered, they were originally thought to have nothing to do with repeats. For example, when miRNAs were predicted, Alu-related repeats were first filtered out. Nonetheless, it has been found over the past two years that miRNAs and repeats are closely interrelated. miRNAs are very likely to originate from repeats and may interact with repeats, or with mRNAs with related repeats embedded in their 3'UTR regions [36–38]. Repeats are now known to be able to direct transcription of miRNAs. Considering these findings together, we believe that with further studies, a regulatory network constituted by miRNAs, repeats and target RNAs will be unveiled layer by layer, enhancing our understanding of the nature of life.
We identified a kind of new miRNAs closely downstream of repeat elements, and found that Pol-III could be involved in transcribing these miRNAs through Alu. Our results elucidated the mechanisms of the Alu-directed transcriptional regulation of some miRNAs and revealed that miRNAs and repeats might be closely interrelated.
Most of our data were downloaded from the UCSC table http://genome.ucsc.edu/ in May 2006: repeats from Variation and Repeats, miRNAs and exons from Genes and Gene Prediction Tracks, EST from mRNA and EST tracks. All data were saved in the bed format and uploaded to Galaxy http://main.g2.bx.psu.edu/ for analysis. In view of our knowledge that pre-miRNAs may be located immediately downstream of SINEs, Pol-III-transcribed sequences are usually less than 500 bp in length and most SINE elements are less than 300 bp, we extended 200 bp downstream of SINEs as our initial sequences for predicting candidate miRNA genes.
Human Genome (hg18) and Primate Genome (panTro2) were also downloaded from UCSC using FTP to prepare for conservation analysis.
The consensus sequences of different primate SINE families were downloaded from giri http://www.girinst.org/.
To ensure the SINEs were expressed, we compared all SINEs and extended sequences with ESTs. Although most ESTs are transcribed by Pol-II , the discovery of non-coding RNA (ncRNA) has led to the inclusion of expressed sequences transcribed by other polymerases in the EST database. Umylny et al. found that 452 Alu ncRNAs longer than 200 bp and probably transcribed by Pol-III are also listed in dbEST . We used a similar method to filter our data except for the differences in SINE lengths. If both SINEs and extended sequences were collected among ESTs, we took them as sequences that are potentially still active.
Thirdly, we folded the 200 bp extended sequences and extracted pre-miRNA candidates from them on the basis of the 11 features. The reliability of PriMir prediction was evaluated by cross-validation. The training and background sets used to establish the PMS Matrix were divided into five equal parts. Four parts were selected to establish the PMS Matrix, and the remaining part (from both training and background sets) was used to test the performance of PriMir by the ROC curve. The above analysis was repeated five times, each time using a different portion of the data as test data set. A PriMir score of "7" was used as cutoff value. This is a stringent criterion, as ROC curve analysis of the performance of PriMir indicates that the AUC (area under curve) is approximately 0.99, and that the false positive rate is 0 at a PriMir score of 7. Comparisons between PriMir and three other algorithms [41–43] suggested that PriMir was at least equal to and in some respects outperformed those three methods. In addition, we used EST and conservation to filter the initial results, which made our results more convincing. The Accession Numbers for the 24 Alu-related miRNAs deposited at Gene Bank are FJ601661 to FJ601684.
Because Alus are primate-specific repeats and there is only one primate genome, panTro2, in UCSC, we only examined conservation between Pan troglodytes and human. In order to find highly conserved miRNAs, we only considered the miRNA candidates with more than 96% similarity to those in Pan troglodytes. The lengths of these pre-miRNA candidates in human should be more than 90% of those in Pan troglodytes.
We used a similar criterion to find other conserved sequences in SINEs located upstream of pre-miRNAs. The criteria for selection were that the lengths of conserved repeat sequences in Pan troglodytes must be more than 90% of those in human and the corresponding similarity must also exceed 90%. If both SINEs and the corresponding pre-miRNAs were conserved and the distance between them was less than 200 nt, we took the pre-miRNAs as our candidates.
If repeats are expressed in exons of protein-coding genes, they probably do not function as pre-miRNA promoters. Few miRNAs so far known are present in exons, so we filtered out those predicted miRNAs and SINEs that overlapped with exons.
Since more than four Ts between a predicted pre-miRNA and SINE are highly likely to intervene in the transcription of long sequences, we deleted each predicted sequence with more than four Ts between the pre-miRNA and SINE.
HeLa and HEK293 cells were obtained from ATCC (American Type Culture Collection). The cells were cultured in DMEM (Dulbecco's Modified Eagle Medium) supplemented with 100 U/ml penicillin, 100 U/ml streptomycin and 10% FBS (fetal bovine serum) at 37°C in a humidified atmosphere containing 5% CO2. For heat shock experiments, cells were heated in 75 cm2 flasks containing 10 ml medium in a 45°C water bath. After 30 min heat shock, they were incubated at 37°C for 1 h before total RNA was extracted. For cycloheximide treatment, the cells were grown in the presence of 100 μg/ml cycloheximide. The incubation time was 3 h for HeLa cells and 6 h for HEK293 cells prior to total RNA extraction.
Tagetitoxin, a specific inhibitor of RNA Pol-III, was obtained from Epicentre Technologies (Madison, WI). HeLa cells were treated with 120 μmol/l tagetitoxin for one day and then by heat shock as described above. Total RNA was extracted from tagetitoxin-treated and untreated cells. Mature miRNA was isolated and amplified by real-time PCR as described below. The effects of tagetitoxin on in vivo transcription were assessed by Northern blotting. Inhibition of RNA Pol-III was evaluated by the levels of transcription of three Alu-associated miRNAs, with Let-7a as a negative control.
We employed Illumina sequencing methods as described previously. Briefly, total RNA was extracted from human cells with Trizol, and the small RNAs were size-fractionated, purified by polyacrylamide gel electrophoresis (PAGE) to enrich for molecules in the range 18-30 nt, and ligated sequentially to 5'- and 3'-end RNA oligonucleotide adapters. The samples were used as templates for cDNA synthesis. The cDNA was amplified over 15-18 PCR cycles to produce sequencing libraries using Illumina's small RNA primer set. The purified PCR products were subjected to Solexa's proprietary sequencing-by-synthesis method on the Illumina 1G Genome Analyzer. All sequencing services were provided by the Beijing Genomics Institute, Shenzhen http://www.genomics.org.cn and the Genome Sciences Centre, Vancouver. Raw data from IlluminaGA were processed using the initial stages of the Solexa software pipeline (Illumina). Low quality reads were trimmed using perl script. Adaptor sequences were accurately clipped using a dynamic programming algorithm. After redundancy was removed, sequences ≥ 18 nt were mapped to the human genome (UCSC hg18) using SOAP. Small RNA sequences were mapped to human shRNA hairpins by BLAST. The count sum of all isoforms of a shRNA gene was used to measure its expression level.
Total cellular RNA was isolated with TRIzol reagent (Invitrogen) according to the manufacturer's instructions. Reverse transcriptase reactions contained 1 μg purified total RNA, 50 nM stem-loop RT primer (Applied Biosystems), 1× RT Buffer, 0.25 mM each dNTP, 5 U/μl M-MLV reverse transcriptase (Promega) and 0.25 U/μl RNase inhibitor (Promega). The mixtures were incubated in a thermocycler for 30 min at 16°C, 30 min at 42°C and 10 min at 85°C, then held at 4°C. All reverse transcriptase reactions, including controls, were run in triplicate.
Real-time PCR was performed using a standard TaqMan MicroRNA Assay kit protocol on a Roter-Gene 6000 (Corbett) Sequence Detection System. The 20 μl PCR reactions included 1 μl RT product, 1× Real-time PCR Master Mix including TaqMan probe, 0.15 μM miRNA specific primer set. The mixtures were incubated at 95°C for 3 min, followed by 40 cycles at 95°C for 15 s and 60°C for 40 s. All reactions were run in triplicate. The threshold cycle (CT) was defined as the fractional cycle number at which the fluorescence passed the fixed threshold. TaqMan CT values were converted to absolute copy numbers using a standard curve from synthetic miRNA Standard.
Oligonucleotides used for Northern blotting
We appreciated Drs. Marco Marra and Ryan Morin for providing us with their deep sequencing data on human embryonic stem (hESC) and human embryoid body (hEB) cells. This work was supported by the Chinese Academy of Sciences "Hundred Talents Program", the 863 program (Grant No: 2006AA02Z131 and 2007AA02Z181) and the National Natural Science Foundation of China (No. 30671042 and 30871250).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.