miRNA arm selection and isomiR distribution in gastric cancer
© Li et al.; licensee BioMed Central Ltd. 2012
Published: 17 January 2012
Skip to main content
© Li et al.; licensee BioMed Central Ltd. 2012
Published: 17 January 2012
MicroRNAs (miRNAs) are small non-protein-coding RNAs. miRNA genes need several biogenesis steps to form function miRNAs. However, the precise mechanism and biology involved in the mature miRNA molecules are not clearly investigated. In this study, we conducted in-depth analyses to examine the arm selection and isomiRs using NGS platform.
We sequenced small RNAs from one pair of normal and gastric tumor tissues with Solexa platform. By analyzing the NGS data, we quantified the expression profiles of miRNAs and isomiRs in gastric tissues. Then, we measured the expression ratios of 5p arm to 3p arm of the same pre-miRNAs. And, we used Kolmogorov-Smirnov (KS) test to examine isomiR pattern difference between tissues.
Our result showed the 5p arm and 3p arm miRNA derived from the same pre-miRNAs have different tissue expression preference, one preferred normal tissue and the other preferred tumor tissue, which strongly implied that there could be other mechanism controlling mature miRNA selection in addition to the known hydrogen-bonding selection rule. Furthermore, by using the KS test, we demonstrated that some isomiR types preferentially occur in normal gastric tissue but other types prefer tumor gastric tissue.
Arm selections and isomiR patterns are significantly varied in human cancers by using deep sequencing NGS data. Our results provided a novel research topic in miRNA regulation study. With advanced bioinformatics and molecular biology studies, more robust conclusions and insight into miRNA regulation can be achieved in the near future.
MicroRNAs (miRNAs) are small non-protein-coding RNAs. Their final functional products are RNA molecules rather than proteins. Although the functional products are different, like many protein-coding genes, miRNA genes also need several maturation steps to form the functional products, single-strand RNAs with approximately 22 nt. in length. After miRNA genes are transcribed, the full-length transcripts (pri-miRNAs) form a hairpin structure (pre-miRNA) plus two un-paired tails, which are trimmed out by Drosha. The pre-miRNA, composed of 5p arm, 3p arm and terminal loop, is further processed by Dicer, trimming out the terminal loop and releasing the miRNA/miRNA* duplex. The miRNA/miRNA* duplex is subsequently processed by RISC, which unwinds the miRNA/miRNA* duplex at the end with weaker hydrogen binding. So, the strand with free 5' end is selectively included into RISC and served as mature miRNA [1, 2].
Although mature miRNAs are derived from full-length transcripts of miRNA genes, the expression of miRNA genes does not guarantee the expression of mature miRNA. In other words, not all of the pri-miRNAs are processed into mature miRNAs [3–5]. This unequal maturation control comes from several regulatory steps. First, Drosha and Dicer have higher affinity to the pri-miRNAs and pre-miRNAs, respectively, whose terminal loops are moderate in size [6, 7]. Furthermore, longer stem (~33 bp) of pri-miRNA is preferred by Drosha . Second, owing to the hydrogen-bonding selection mechanism, the 5p arm and 3p arm of the same pre-miRNA usually have unequal likelihoods to be selected as mature miRNAs .
Up to now, this hydrogen-bonding-based selection rule seems to be the major view point. However, recent studies brought new concepts that challenged the traditional miRNA maturation mechanism. First, previous studies showed that the orthologous pre-miRNAs, although highly similar with each other, preferred the 5p arm in one species but the 3p arm in another species [9, 10]. This result challenged the hydrogen-bonding selection rule, implying that there could be other regulation mechanism controlling the 5p arm or 3p arm selection. Second, with the application of NGS technology, mature miRNAs were often observed to present as different isoforms, named isomiRs [9, 11, 12]. Further analysis has implied that different isomiRs may contribute to regulations in Drosophila development .
In this study, we conducted in-depth analyses on these issues by using NGS technology to quantify the expression profiles of miRNAs and isomiRs in human gastric tissues. By measuring the expression ratios of 5p arm to 3p arm between tissues, we showed the 5p arm and 3p arm miRNA derived from the same pre-miRNAs have different tissue expression preference, one preferred normal tissue and the other preferred tumor tissue, which strongly implied that there could be other mechanism controlling arm selection in addition to the hydrogen-bonding selection rule. Furthermore, by using the Kolmogorov-Smirnov statistics test, we demonstrated that some isomiR types preferentially occur in normal gastric tissue but other types might prefer tumor gastric tissue.
We applied Illumina (Solexa) platform for small RNA sequencing. One pair of normal gastric tissue (G1245N) and gastric tumor tissue (G1245T) were lysed with TissueLyser (Qiagen), followed by RNA extraction with TRIzol reagent (Invitrogen) according to the manufacturer's protocol. Then, the RNA samples were processed and sequenced. The generated sequence reads were processed to remove the 3' end adapter, if applicable. Only the clean reads, reads with adapter detected and trimmed, were used for analysis. Besides, considering the length distribution of mature miRNAs, we selected only the clean reads with length 18 to 25 nucleotides for analysis.
The initially analyzed normal and tumor data sets are not equal in size (19.7 million reads in G1245N and 26.0 million reads in G1245T). Therefore, we tried to normalize them with a regression model. As a result, we got the equation y = 1.3004x-0.1457, where x and y denote the expression levels of miRNAs in G1245N and G1245T library, respectively. After this normalization procedure and plotted in a scatter plot, most of the data points distributed near the line with the slope of 0.9874 and the R2 value was 0.8831. This result showed that the expression levels of most miRNAs did not vary between tissues and the miRNA expression data from the two the libraries was comparable.
The clean reads were grouped into unique clean reads, followed by tabulating the count of each unique clean read. For higher confidence, only the unique reads with read count equal to or larger than two were used for mapping back to human pre-miRNAs (miRBase 16). In order to eliminate ambiguous mapped loci caused by the high similarity between human paralogous mature miRNAs, such as hsa-miR-548a and hsa-miR-548b, we allowed no mismatch at the mapping procedure. Previous reports observed nucleotide additions at the 3' end of miRNAs [12, 15–18], which may cause mismatches at the mapping procedure. Therefore, using Fernandez-Valverde's strategy , we trimmed the last 3' end mismatch one by one until the mapping perfect-match reads are at least 18 nucleotides in length.
With the application of NGS technology, miRNA are reported to exist as isomiRs [9, 11, 12]. As shown in Additional file 1, the isomiRs (the red line alignments) shift from their corresponding miRBase reference miRNAs (dark and light blue bars) in terms of location. When sequence reads were mapped back to mature miRNAs, the alignment shift may result in mismatches. Therefore, in addition to the perfect match constraint, we adopted an alternative procedure. In order to exclude random match, the difference in start position between mature miRNA and mapped reads must be equal to or less than two nucleotides. While, the difference in end position between mature miRNA and mapped reads must be equal to or less than five nucleotides.
In this study, we used stem-loop RT-PCR to validate the 5p arm miRNA of hsa-mir-1307 as described previously . The RT primer (CTCAACTGGTGTCGTGGAGTCGGCAATTCAGTTGAGagccgg) contains a stem-loop sequence and a 6-nt overhang sequence resulting in the binding specificity to mature miRNA. For each RT reaction, 1 g of total RNA was converted into cDNA (miRNA-specific stem-loop RT, 2 nM, 500M dNTP and 0.5l Superscript III, Invitrogen, Carlsbad, CA) and was performed as follows: 16°C for 30 min, followed by 50 cycles at 20 °C for 30 s, 42 °C for 30 s and 50 °C for 1 s. Expression of the miRNA was detected with real-time quantitative PCR (RT-qPCR) by the SYBR Green I protocol (Applied Biosystems, Foster City, CA), 200 nM miRNA-specific forward primer (CGGCGGtcgaccggacctcgac), and 200 nM universal reverse primer. RT-qPCR was performed with the following conditions: 95 °C for 10 min, 95 °C for 15 s and 63 °C for 32 s by 40 cycles. All values were normalized against U6 RNA.
In this study, we generated sequence reads of small RNAs from normal gastric tissue (G1245N) and gastric tumor tissue (G1245T). Totally, 32.1 and 32.4million sequence reads were initially collected from G1245N and G1245T library, respectively. After trimming adapter procedure, we collected 23.8 and 29.9 million reads individually in G1245N and G1245T library for further analysis. Further filtered with length and read count criteria, in G1245N and G1245T library, 19.7 and 26.0 million reads were finally used to quantify miRNA expression level.
Summary of analysis on miRNA reads
# detected pre-miRNAs a
# detected miRNAs a
# miRNAs at opposite arm b
# isomiRs with length equal to miRNA
We arranged the miRNA reads within the mapped pre-miRNAs. As shown in Figure 1, hsa-mir-101-1 encodes mature miRNAs at both arms. The integer digits in middle column denote the read count of each isomiR. The presentations in the right column denote the location offset relative to reference miRNA annotated with miRBase. So, the reads with presentation "0,0" are exactly the same with reference miRNAs. Examining the counts of all reads, it is not guaranteed that the reference miRNAs from miRBase are the most abundant ones, which was also observed by other studies [11, 12, 14]. The mapping result of all pre-miRNAs in G1245N and G1245T library can be accessed in Additional file 2 and 3.
As described in previous study , the additional opposite-arm miRNAs are not necessary to be at lower expression levels than the original ones. Many of the additional opposite-arm miRNAs have higher expression level than the original one (Additional file 2 and 3), which might be different from the previous nomenclature rules. In this study, we totally detected 24 additional opposite-arm miRNAs exclusively in G1245N, 33 ones exclusively in G1245T and 63 ones both in G1245N and G1245T library. The discovery of additional opposite-arm miRNAs is because we mapped the sequence reads back to pre-miRNAs rather than only to mature miRNAs. Our study here provides a new way to further interrogate the miRNA/isomiR expression by carefully examining NGS data.
Owing to 3' end modification [12, 15–18], the altered nucleotides at the 3' end of reads may cause mismatches at the mapping procedure, making the originally perfect match reads fail to be mapped back to miRNAs. Therefore, we trimmed the terminal 3' end mismatch one nucleotide by one nucleotide, followed by analyzing the trimmed fragments. As a result, 2,766,852 reads in G1245N tissue and 3,319,704 reads in G1245T tissue were found to have nucleotide added at their 3' ends, individually accounting for 14.0% and 12.8% of read collection used for mapping. Without this alternative mapping method, these 14.0% and 12.8% more sequence reads can not be mapped back to human pre-miRNAs, which demonstrates the effectiveness of our alternative mapping procedure.
Distribution of the 3' end addition fragments in G1245N and G1245T library
Abun. in G1245N (%)
Abun. in G1245T (%)
Sequence variations at the 3' ends have been often observed in miRNA reads. Previous studies also reported several types of RNA editings, such as A to G transition catalyzed by adenosine deaminase and C to U transition catalyzed by cytidine deaminase [12, 19], responsible for generating such variations. Owing to the existence of isomiR, it is difficult to distinguish whether nucleotide addition or nucleotide modification contributes to such variations. As illustrated in Additional file 4, the terminal nucleotide variation, also called mismatch, could be generated from nucleotide modification from C to U at the terminus of the sequence read with 22 nucleotides, altering the length of the read. Alternatively, it could also be generated from nucleotide addition of U to the terminus of the read with 21 nucleotides, lengthening the read by one nucleotide. Additional molecular studies would be required to elucidate specific mechanism involved.
Inconsistent expression ratios of 5p arm miRNA to 3p arm miRNA between tissues
In summary, our result showed the 5p and 3p preference is not always consistent between biological samples, implying there could be other regulation mechanisms, in addition to the hydrogen-bonding-based selection rule, controlling the selection of 5p or 3p. If so, this regulation mechanism could play important roles in oncogenesis process. This is a novel area to study the relationship between miRNAs and cancers. Therefore, more efforts should be applied in the subsequent studies.
The pre-miRNAs whose arm selection preferences are not consistent with miRBase annotation
For the pre-miRNAs originally annotated to encode miRNAs at both arms, the major arms of hsa-mir-374a, hsa-mir-500a, hsa-mir-625 and hsa-mir-136 are their 5p arms; while, the major arms of hsa-mir-664, hsa-mir-144, hsa-mir-493 and hsa-mir-376a-1 are their 3p arms. According to NGS expression data, we observed that their major arms and minor arms expression levels reversed, leading to an observation different from miRBase annotation. Among them, hsa-mir-374a and hsa-mir-144 are two extreme cases, at which the miRBase-annotated minor arms individually have about 17 or 28 times as high expression levels as the miRBase-annotated major arms have. In summary, our result demonstrated that arm selection preference could vary. In order to solve this debate, more NGS expression data should be included and such phenomenon should be studied further.
Arm selection preference of 5p arm and 3p arm miRNA exchange between gastric normal and gastric tumor tissue
Although derived from the same gene locus and transcribed by the same transcription factors, the 5p arm and 3p arm miRNA might have reversal expression preference. It is likely that such an observation is generated by NGS platform dependent biases. However, it is possible that there could be an unknown selection mechanism, during maturation procedure, controlling the arm selection preference between normal and tumor tissue. According to our data, this idea is reasonable and deserves more efforts to put it into examination.
Previous report showed that different isomiR types may contribute to different regulation in different tissues . In this study, we are curious about whether isomiR distribution patterns are diverse between gastric normal and gastric tumor tissue, namely normal tissue prefers several specific isomiR types and tumor prefers the others. In order to solve this problem, we applied Kolmogorov-Smirnov (KS) test in determining significant difference on isomiR distribution patterns. KS test tries to determine if two datasets differ significantly under the null hypothesis that the samples are drawn from the same distribution. Although KS test has the advantage of making no assumption about the distribution of data, it is sensitive to the median. Besides, when a miRNA has significant low expression level, only few isomiR types can be presented, inducing biased isomiR distribution patterns. Therefore, we selected only the 169 miRNAs with more than eight kinds of isomiR types and read counts more than 1000 in both normal and tumor tissue for comparison.
For the previous two cases, the expression levels between two tissues are almost equal. In Figure 4c, the expression levels of hsa-miR-21 are different between tissues, with fold change (fc) equal to 2.4051. In this case, the isomiR distribution patterns are diverse. The isomiR type peaks shift and both the most and secondary abundant isomiR type between tissues are different. So far, we have shown no matter the fc value is small or large, the isomiR distribution type can be diverse between tissues. Next, in Figure 4d, even the fc value of hsa-miR-30b reaches up to 12.97, the isomiR distribution patterns are not diverse between tissues.
Although isomiRs are highly overlapped with each other, they could have difference at the 5' end, leading to alteration at the seed region. The complementary binding between miRNAs and target mRNAs mainly depends on the binding within the seed region. Therefore, the difference at the 5'end between isomiRs is supposed to alter isomiRs' target genes. As a result, different isomiRs from the miRNAs could have different target genes, involved in different activities or pathways. Hence, it is reasonable that different isomiRs may contribute to various regulation pathways in different tissues. And, this type of isomiR regulation could have significant biological consequences.
In this study, we applied NGS data to quantify miRNA expression profiles between gastric normal and gastric tumor tissue. Our data showed that although derived from the same pre-miRNAs, 5p arm miRNA and 3p arm miRNA can have reversed expression preferences, implying there could be other regulation mechanism controlling 5p or 3p selection. Moreover, although derived from the same mature miRNA, isomiRs can have different expression preference, some prefer normal tissue and the other prefer tumor tissue. Although we examined only one pair of normal and tumor tissue, our results provided a novel research topic in miRNA regulation study. With more tissue samples examined, we can have more robust conclusions and perform the studies with insight into miRNA regulation.
NGS, isomiR, gastric tumor, arm selection.
We thank Yourgene Bioscience http://www.yourgene.com.tw for providing the sequencing service. This work was supported by grants from Academic Sinica and National Science Council of Taiwan.
This article has been published as part of BMC Genomics Volume 13 Supplement 1, 2012: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/13?issue=S1.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.