Revealing editing and SNPs of microRNAs in colon tissues by analyzing high-throughput sequencing profiles of small RNAs

Background Editing and mutations in microRNAs (miRNAs) can change the stability of pre-miRNAs and/or complementarities between miRNAs and their targets. Small RNA (sRNA) high-throughput sequencing (HTS) profiles contain miRNAs that are originated from mutated DNAs or are edited during their biogenesis procedures. It is largely unknown whether miRNAs are edited in colon tissues since existing studies mainly focused their attention on the editing of miRNAs in brain tissues. Results Through comprehensive analysis of four high-throughput sequencing profiles of normal and cancerous colon tissues, we identified 548 editing and/or SNPs in miRNAs that are significant in at least one of the sequencing profiles used. Our results show that the most abundant editing events of miRNAs in colon tissues are 3'-A and 3'-U. In addition to four known A-to-I editing sites previously reported in brain tissues, four novel A-to-I editing sites are also identified in colon tissues. Conclusions This suggests that A-to-I editing of miRNAs potentially is a commonly existing mechanism in different tissues to diversify the possible functional roles of miRNAs, but only a small portion of different miRNAs are edited by the A-to-I mechanism at a significant level. Our results suggest that there are other types of editing in miRNAs through unknown mechanisms. Furthermore, several SNPs in miRNAs are also identified.

Single Nucleotide Polymorphism (SNP) is a common type of DNA sequence variation throughout the human genome. Similar to editing in miRNAs, SNPs in miRNA genes can affect the function of them by modulating the transcription of the primary transcripts, processing of pri-miRNAs and pre-miRNAs, maturation, or miRNA:target interactions [21,22]. Consequently, the SNPs in miRNAs lead to various diseases, such as chronic lymphocytic leukemia [23], papillary thyroid carcinoma [24], progressive hearing loss [25], and breast cancer [26,27]. Several studies focused on the identification of SNPs in human miRNAs [28][29][30][31][32][33][34] have found thousands of SNPs in the pre-miRNAs.
With the fast growth of high throughput sequencing (HTS) technologies, the whole transcriptomes of small RNAs (sRNAs) have become easily available. The huge number of reads from the HTS profiles of sRNAs contain miRNAs that are different from their DNA template, which may caused by either mutations in DNAs or editing on RNAs. Some studies have started to explore the miRNA editing with HTS profiles of sRNAs [5][6][7][8][10][11][12][13]18] However, the existing methods mainly focused discovery of editing sites, especially A-to-I editing, in brain tissues. The editing of miRNAs in other tissues are still awaiting more researches.
Here, we performed a comprehensive analysis for four sRNA HTS profiles of human colon normal and cancerous samples. Our results show that four editing sites previously identified in brain tissues are also identified in colon tissues. Furthermore, we identified several novel miRNA editing sites, including 4 canonical A-to-I editing sites and several editing sites of other types caused by unknown mechanisms. In addition to these editing of miRNAs, 4 SNPs in miRNAs were also identified in our analysis.

Data sets used
We used four sRNA HTS profiles of cancer and adjacent normal tissues of two stage III colorectal cancer (CRC) patients downloaded from the NCBI Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo under series accession number GSE43550 (Shi, Zheng, Ren, Han and Wang, unpublished). One patient underwent surgical resection of CRC tumor directly and the other patient underwent Transcatheter Arterial Infusion chemotherapy (TAI) 1 week before surgical resection. Then, the tumor and adjacent normal samples were used to perform sRNA HTS sequencing using Illumina HiSeq2000 sequencer by following the corresponding protocols. The sequencing profiles of the patient without TAI are named as N_no_TAI (adjacent normal tissue) and T_no_TAI (cancerous tissue), respectively, and the profiles of the patient with TAI are named as N_with_TAI (adjacent normal tissue) and T_with_TAI (cancerous tissue), respectively.
The genome sequences of human were downloaded from UCSC Genome Browser http://genome.ucsc.edu/. The sequences and genomic coordinates of miRNA precursors and mature miRNAs were downloaded from the miRBase release 19 http://www.mirbase.org.

Preprocessing of small RNA HTS sequencing profiles
The original reads with a low-scored 3' tails were removed firstly. This step is to make sure the clear 3' ends of reads. Then, a self-written program was used to remove the 3' adapters. Then, the unique reads with at least 18 nucleotides and their counts were obtained with self-written programs. The four processed profiles consisted of 15 to 25 million reads, representing 374 to 789 thousand unique sequences.

Main computational steps
The first step is to align unique reads in sRNA profiles to the pre-miRNAs with NCBI BLASTN. The second step is to obtain the reads mapped to pre-miRNAs. Next, these mapped reads are aligned to the genome with Bowtie [35]. Then, the result of Bowtie is further process to add frequencies of reads and to remove reads with more than 100 aligned loci in the genome. Then, the processed result of Bowtie was analyzed for the possible cross mapping issues of reads with multiple loci in the genome by using the method introduced by de Hoon et al., [6].
Next, we analyzed the mutation and editing (M/E) sites in pre-miRNAs by using several inputs, including the sequences and secondary structure of pre-miRNAs (predicted by RNAfold [36]), the alignments of reads to pre-miRNAs generated by BLASTN, the reads mapped to pre-miRNAs, the alignments of reads against genome generated by Bowtie, and the results of the cross mapping method [6]. In this step, we employed the alignment of BLASTN to obtain the list of read mapped to a pre-miRNA. We then used a modified Smith-Waterman algorithm to align an sRNA read to a pre-miRNA sequence. Briefly, matched and mismatched nucleotides received rewards of +4 and -3, respectively, in alignment. The affine gap penalty, i.e., the penalty increasing linearly with the length of gap after the initial gap opening penalty, was used for gap opening (-4) and gap extension (-2). The weight of reads were retrieved from the results of the cross mapping method [6]. Reads with more than 100 loci were removed before applying the cross mapping method [6]. Thus, we used the initial alignment of reads to the genome generated with Bowtie to examine whether a read had matched loci in genome if it did not appeared in the results of the cross mapping method, or named as missed reads. If a missed read had loci with better alignment scores than its alignment between a pre-miRNA or more than 100 loci of the same alignment scores as its alignment between a pre-miRNA, it was neglected. Otherwise, a missed read was evenly divided to different mapped loci. Then, the results were compared to reported SNPs [34] and editing sites in the DARNED database [37] in miRNAs.
The targets of hsa-miR-6503-3p and edited hsa-miR-6503-3p were predicted with the HitSensor algorithm [38]. Predicted targets with at least 7 continuous Watson-Crick matches in the seed regions were maintained in the analysis.

P-values of identified mutation and editing sites
The quality of identified M/E sites were evaluated using Equation 1 to excluding the probability of being random sequencing errors.
Where p o is the observed percentage of mutated and/or edited reads, p e is the expected error rate, and n is the number of reads matched to the position of pre-miRNA. Since Z follows a standard normal distribution, P-values of the identified editing or mutation sites can be calculated. p e is related to the score of sequenced nucleotides. For example, a phred score of 20 will lead to an expected p e of 1%. Because there could be many mutation and editing sites, the obtained P-values were corrected with the Benjamini-Hochberg correction method [39].

Criteria used to identify significant mutation and editing sites
Chiang et al., [5] proposed three criteria to evaluate the editing sites of miRNAs, i.e., if (i) the relative level of editing is at least 5%; (ii) at least 10 reads support the editing site; and (iii) the editing site is not in the last two positions at the 3' end of mature miRNAs, then the miRNA position was considered to be edited. In addition to the first two criteria of Chiang et al., [5], two more criteria is used in this study, i.e., (i) the score threshold of sequencing reads is 20; and (ii) a multiple-test corrected P-value of smaller than 0.05. The third criterion of Chiang et al., [5] was not used here because our aim is to identify all editing and mutation sites, including 3' editing sites, in miRNAs.

The naming of the editing and mutations in miRNAs
All identified M/E sites are named by the names of the pre-miRNAs, positions of the sites, the nucleotides from the reference pre-miRNA sequences and the edited/ mutated nucleotide at the sites. For example, hsa-mir-376a-1_49_A_g is used to mean an A-to-I editing detected at the position 49 of the hsa-mir-376a-1 precursor, the position of the reference sequence is "A" and the edited reads have "g" at this site.

Results and discussion
The identified mutation and editing sites We totally identified 548 M/E sites of pre-miRNAs that are significant in at least one of the four sequencing profiles used (in additional file 1 Table S1), with around 420 M/E sites in each of the four profiles (see Figure 1). As shown in Figure 1A, 312 M/E sites were commonly identified in all four libraries, suggesting a good repeatability of the identified M/E sites. Based on the mechanisms of editing or mutation, we classified these significant M/E sites into eight categories as shown in Figure 1B. In addition to editing types with clearly defined mechanisms, i.e., 3'-A, 3'-U, A-to-I and Cto-U, and SNPs, a few other editing sites happened at 3' end of mature miRNAs (classified as 3'-Other in Figure  1B) and other positions of miRNAs (classified as Other) were also identified. We also found that a few editing sites were incorrectly predicted by the reads that were mapped to several loci in the genome. In our analysis, reads with multiple loci might have very small weights, less than 0.05, on loci that were unlikely to generate these reads after applying the cross-mapping correction method [11]. However, the high abundances of these multiple loci reads could still make the editing site become significant because the numbers of the true reads mapped to these loci were relatively small when compared to these incorrectly mapped reads with multiple loci. Thus, these sites were classified as pseudo sites (as Pseudo. in Figure 1B).

Identification of 3' editing events
The most abundant editing events are 3'-U and 3'-A in all libraries ( Figure 1B). In contrast to the reported predominant 3'-U in mature miRNAs from 3' arm of their hairpins [7], our results suggest that a substantial number, 65, of mature miRNAs in the 5' arm also have 5'-U addition (additional file 2 Table S2). Furthermore, 81 mature miR-NAs on the 5' arm of their hairpins have 3'-A (additional file 2 Table S2), such as most of the let-7 family members, suggesting that 3' end editing of some miRNAs might happen after the miRNA:miRNA* duplex is formed but before being loaded into the RISC, which is also noticed in literature [7]. Our results of 3'-A on the 5' arm of let-7 family members are also consistent with results in another study [13]. Some miRNAs have 3'-U and 3'-A addition on both the 5' and 3' mature miRNAs (additional file 2 Table S2 and additional file 3 Figure S1). For example, hsa-mir-143 has 3'-U and 3'-A at end of both miR-143-5p and miR-143-3p (see additional file 1 Table S1). The 3' editing on mature miRNAs of the 3' arm of their pre-miRNAs might be added after the pre-miRNA hairpins are formed but before being cut to miRNA:miRNA* duplex by Dicer [7].
Our results show that same miRNAs may have 3'-U and In summary, these results suggest that the enzymes that contributes to 3'-A (PAPD4 [7]) and 3'-U (TUT7, TUT4 and PAPD4/TUT2 [13]) may share redundant roles in adding U or A to the end of miRNAs. In fact, the same enzyme may catalyze the addition of both uridine and adenine. As reported previously, PAPD4, also known as TUT2, could introduce both 3'-A [7] and 3'-U editing [13] to miRNAs.
In addition to the 3'-U and 3'-A, our results suggest that some miRNAs also have 3'-G and 3'-C editing. For examples, there are 3'-G addition immediately after hsa-miR-145-3p and hsa-miR-10a-5p (additional file 4 Figure S2A, C and S2B,F). In the mean time, these positions also have 3'-A editing (additional file 4 Figure S2D and G). Further, there are 3'-C at the same position after hsa-miR-145-3p (additional file 4 Figure S2E). Furthermore, hsa-miR-194-2 and hsa-miR-21 also have 3'-C after their 3' mature miR-NAs (additional file 5 Figure S3). As reported previously [7,9], 3'-G and 3'-C also happens in some miRNAs in other tissues or cell lines, however their biological relevance is still unclear.
As previously noticed in other tissues, cell lines, and species [7], our results show that the 3'-U and 3'-A addition of miRNAs are also present in colon cancerous and adjacent normal tissues.

A-to-I editing of miRNAs in colon tissues
As shown in Table 1, we totally identified 14 editing sites in pre-miRNAs that satisfy a combination of criteria, i.e., at least ten edited reads, an edited reads percentage of at least 5%, and a multiple-test corrected P-value of less than 0.05, in at least one of the four sequencing profiles used. 8 out of the 14 editing sites are A-to-I editing, representing the largest category. As shown in Table 1, four of these 8 A-to-I sites had been reported previously in brain tissues [3,4].
Four newly identified A-to-I editing sites are shown in Figure 2. hsa-let-7a-a_28_A_g in Figure 2A is close to 3' end of mature let-7a-2. But as shown in Figure 2E, 84 reads with 26 to 29 nucleotides at this locus clearly demonstrate that this is unlikely to be an 3' addition site since there are as many as 5 nucleotides after this A-to-I site. As shown in Figure 2B to 2D, these three editing sites all happen in the regions of mature miRNAs. As reported previously, UAG is a preferred motif for A-to-I editing [3,4,11]. The newly identified A-to-I editing sites also have a preference for UAG motif. Three newly identified sites have a local motif of UAG, as shown in Figure 2E to 2G, and hsa-miR-378b_53_A_g has a local motif of AAG, which is also frequently modified as noticed previously [4].
Existing studies of miRNA editing mainly focused their attention on the A-to-I sites in brain tissues of mammals [5,11,12]. Here our results show that there are some editing sites, including a few A-to-I sites, in colon cancer and corresponding adjacent normal tissues. This suggests that miRNA editing is potentially a widely used mechanism to realize more diverse roles of miRNAs in different tissues and organs. However, only a limited number of miRNAs are edited by the A-to-I mechanism which is similar to previous results in brain [11]. Our results indicate that four known A-to-I editing sites on hsa-miR-376a-1, hsamiR-376a-2, hsa-miR-376c and hsa-miR-411 are also edited in normal and/or cancerous colon tissues, although existing studies mainly reported these sites in brain tissues [3,4,11].

Other types of miRNA editing in colon tissues
As shown in Table 1, we detected six non-canonical Ato-I editing sites in our data sets. Four of them are shown in Figure 3 hsa-miR-378c_30_-_g in Figure 3A is an insertion of "g" to the mature hsa-miR-378c. This Table 1 The identified editing sites in miRNAs in the sequenced stage III CRC samples. P.P. and S.P. means the position in pre-miRNA and seed region, respectively. W and ME means the original nucleotide, as in the miRBase, and edited/mutated nucleotide, respectively. N_no_TAI, T_no_TAI, N_with_TAI (adjacent normal tissue) and T_with_TAI means the sequencing profiles of adjacent normal tissue without TAI, cancer tissue without TAI, adjacent normal tissue with TAI and cancerous tissue with TAI, respectively. MER, MEP, and FDR_P means the number of edited reads, the percentage of edited reads, and the FDR-corrected P-value using method of [39], respectively. (E) to (H) are the reads supporting hsa-miR-378b_53_A_g, hsa-miR-497_83_A_g, hsa-miR-6503_59_A_g and hsa-let-7a-a_28_A_g, respectively. In part (E) to (H), the lower case nucleotides in the reads mean the mismatched nucleotides that are generated in editing or mutation events; and the upper case nucleotides in the precursors mean the mature miRNAs. The three columns after a read are the original number of this read in the library, the length of the read, and the weight of this read at this locus calculated by the cross mapping algorithm [11]. The percentage values after the name of editing site are the percentage of edited or mutated reads. The lines below the pre-miRNA sequences are the secondary structures predicted with RNAfold [36], and numbers in the parent thesis mean the minimum free energy in k-cal/mol. In all parts, the names of the sequencing profiles are shown in the parenthesis after the name of the editing sites.
editing contributes to 21.8% to 32.5% of the sequenced reads of hsa-miR-378c and is significant in all four sequencing profiles used ( Table 1), suggesting that a few miRNAs may have insertions after being transcribed. However, the mechanism of this editing still needs further researches. C-to-U editing is detected in hsa-miR-125b-1 and hsa-miR-125b-2 in all four sequencing libraries analyzed (Table 1 and Figure 3B). Previous studies also found several miRNAs, such as miR-379, miR-140*, and miR376a, may have C-to-U editing, probably occurring only in preor mature miRNAs [12].
Two U-to-G editing are detected at hsa-miR-141 and hsa-miR-429 ( Figure 3C and 3D). And hsa-miR-375 has an G-to-C editing at position 56 of its precursor (Table  1). Similar to the insertion at hsa-miR-378c, the mechanisms of these several editing sites are still unclear and need more studies. Another possible reason for the three non-canonical A-to-I editing in Figure 3F, G and 3H are SNPs. However the low percentages of M/E reads from these loci suggest that these sites are unlikely to be inherited SNPs, and at most they could be somatic SNPs.

Identification of SNPs in miRNAs in the samples used
We identifies four significant SNP sites as shown in Figure 4. hsa-miR-1304_65_C_a, corresponding to SNP rs2155248, universally appears in all four samples with 100% mutated/edited reads percentage ( Figure 4A and 4E).
hsa-miR-196a-2_78_C_u, corresponding to an SNP rs11614913, is only significant in the T_no_TAI sample ( Figure 4B and 4F). 76.9% reads covering this SNP are mutated in this sample ( Figure 4F). Previous study showed that rs11614913 in hsa-miR-196a-2 reduced the expression level of miR-196a, and was significantly associated with decreased breast cancer risk [26]. Other researches also reported that the high levels of miR-196a may promote oncogenic phenotype of colorectal cancer cells [40] and non-small cell lung cancer cell [41]. The appearance of mutated hsa-miR-196a-2 in the T_no_TAI sample does not agree with the oncogenic roles of hsa-miR-196a-2, suggesting that the tumor genesis of this patient may caused by other biological pathways.
hsa-miR-629 31 G c is caused by an SNP rs78212770 ( Figure 4C and 4G) and does not shown in two samples N_no_TAI and T_no_TAI. 31.0% and 21.1% reads in the N_with_TAI and T_with_TAI samples from this locus are mutated (see Figure 4G), suggesting it is a somatic mutation in this patient.
hsa-miR-3662a_62_G_a caused by an SNP rs66683138 appears in 100% reads from this locus in the N_with_-TAI sample ( Figure 4D and 4H). Other samples also have a few reads that support the existence of this SNP in those samples.
In addition to the four SNPs in Figure 4, our results also suggest that hsa-miR-3676_62_G_a is a potential SNP since it happens with 100% reads from this locus in all four libraries (see additional file 1 Table S1 and additional file 6 Figure S4). hsa-miR-3676 is annotated as a tRNA since release 20 of the miRBase. Thus, this result indicates that small RNA sequencing profiles potentially could be used to find SNPs in tRNAs as well. Figure 3 The details of four non-A-to-I identified editing sites in miRNAs. (A) to (D) are the schematic views of hsa-miR-3653, hsa-miR-194-2, hsa-miR-378c and hsa-miR-21, respectively. (E) to (H) are the reads supporting hsa-miR-3653_42_C_u, hsa-miR-194-2_37_A_c, hsa-miR-378c_30_-_g, and hsa-miR-21_32_G_c, respectively. Legend idem to those of Figure 2.
The targets of edited hsa-miR-6503-3p As shown in Table 1 and Figure 2C and 2G, there is an A-to-G editing site in the seed region of hsa-miR-6503-3p which will potentially change the complementarities between this miRNA and its targets. Thus, we predicted the potential targets for both original hsa-miR-6503-3p and edited hsa-miR-6503-3p. As shown in Figure 5, hsa-miR-6503-3p has 717 targets but the edited hsa-miR-6503-3p only has 293 targets, and original and edited hsa-miR-6503-3p only shared 4 targets. These results suggest that the A-to-I editing at position 7 of the seed of hsa-miR-6503-3p severely changes the complementarities between this miRNA and its targets, and changes the function of hsa-miR-6503-3p.

Conclusion
Our results suggest that some miRNAs are edited in colon tissues after analyzing four colon normal or cancerous small RNA HTS sequencing profiles. 3'-A and 3'-U are the two most common editing events in the examined tissues. Four novel A-to-I editing sites on miRNAs, as well as another four A-to-I editing sites previously reported in brain tissues, are characterized in colon normal and/or cancerous tissues. Several editing sites of other types are also identified, however their mechanisms need more researches. Our results identify four SNPs that universally appeared with M/E read percentages of 100% in all of the four samples used or with less than 100% M/E reads percentages in some of the samples used, suggesting inherited or somatic mutations respectively.

Additional material
Additional File 1: Table S1 -The mutation and editing sites of miRNAs identified in colon tissues. The meanings of columns are given in the second sheet.
Additional File 2: Table S2 -miRNAs with 3'-A and 3'-U sites that are significant in all of the four libraries used. The miRNA with * and # means sites that were also reported previously in four and three cell lines, respectively, by Burroughs et al., [7].
Additional File 3: Figure S1 -The number of pre-miRNAs with 3'-A and 3'-U sites in colon tissues. 3'-A, 3'-A CL, 3'-U and 3'-U CL means the number of pre-miRNAs that have 3'-A editing on the mature miRNA of 3' arm, 3'-A editing on the mature miRNA of 5' arm, 3'-U editing on the mature miRNA of 3' arm, and 3'-U on the 5' arm of their hairpin structures.