Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences
© 2013 Zhu et al.; licensee BioMed Central Ltd. 2013
Received: 26 December 2012
Accepted: 21 March 2013
Published: 27 March 2013
Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information.
We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes.
The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes.
KeywordsRNA-seq RNA editing Potential SNP score Constitutive editing Editing box
RNA editing is a post-transcriptional modification process which not only expands the number of functions encoded by our genomes but also provides additional mechanisms of gene regulation. The most predominant form of such editing in higher eukaryotes is adenosine-to-inosine (A-to-I) RNA editing, which is catalyzed by members of ADAR enzyme family (adenosine deaminases that act on RNA) [1, 2]. The resulting inosines preferentially base pair with cytidines (C) and are therefore functionally guanosines (G), although there has been evidence that inosine can also pair with guanosine . Thus, A-to-I editing can have profound effects on downstream RNA processing and function, including recoding of open reading frames, altering the pattern of alternative splicing, interfering with microRNA function, modulating RNAi activity, and playing other roles in gene regulation [1, 2].
The pattern of A-to-I RNA editing, either site-specific or promiscuous, is likely to determine the fate of an edited RNA molecule. The majority of A-to-I editing in the human transcriptome is located within inverted-repeated Alu elements (IRAlus) positioned within introns and UTRs as revealed by the systematic comparison of cDNA or EST libraries to genomic sequences [4–7], and by genome-wide profiling of transcriptomes and genomic DNAs from the same individuals [8–10]. RNAs with extensively edited IRAlus within their 3′UTRs are retained in nuclear paraspeckles [11–13], although this retention is not always complete [12, 14]. Compared to promiscuous A-to-I RNA editing in repetitive elements, site-specific editing in coding regions provides a rich source of genetic recoding that can influence protein function. The best-characterized editing sites in mammals occur in codons of pre-mRNAs encoding glutamate receptor B (GluR-B) and serotonin receptor 2C (5-HT2CR) [15, 16]. In addition, site-specific A-to-I RNA editing outside coding sequences has been shown to interfere with miRNA pathways by affecting microprocessor or Dicer cleavage, RISC loading and mature miRNA function [17–22]. Thus, it is becoming increasingly apparent that A-to-I RNA editing plays important roles in regulating gene expression and product function.
Inosine base pairs with cytidine during reverse transcription and therefore appears as G during sequencing of cDNA. Thus, A-to-I editing sites can be inferred by the presence of G at a given position in a cDNA sequence but only A in the corresponding genomic position [1, 2]. Most recently, the application of next-generation sequencing to cDNAs (RNA-seq) and genomic DNAs from the same human individual followed by extensive computational analyses revealed an additional large number of editing sites in both Alu and non-Alu elements [8–10]. Thus, the emergence of new technologies and approaches has enabled the identification of a growing list of editing sites.
Transcriptome and genomic DNA sequencing datasets are not always available for single individuals. However, RNA-seq data is widespread and available through public datasets and thus represents a relevantly rich source of yet unexplored RNA editing sites. There are two features that currently limit the application of RNA-seq data to identify A-to-I RNA editing without the relevant genomic information. On one hand, the nature of nucleotide mismatches reduces the ability to uniquely align RNA-seq reads to the genome, and therefore reduces the capability to retrieve nucleotide variants. On the other hand, true editing events are often hidden in a background noise caused by sequence errors, mapping errors and genome variations, including genomic single nucleotide polymorphisms (SNPs) and somatic mutations. Thus, it has been challenging to accurately identify editing candidates from transcriptomes in the absence of gDNA information.
To overcome the aforementioned issues, we have developed a new pipeline to accurately predict editing sites from 18 human RNA-seq datasets, even without knowledge of relevant genomic sequences from which the RNA-seq data were derived. We identified 2,245 constitutive A-to-I editing sites that occur in clusters (named “editing boxes”). Some of these are enriched in non-repetitive elements and exhibit an extremely high A-to-I conversion frequency. Importantly, editing sites located in non-repetitive editing boxes were validated in multiple human cell lines using conventional PCR and Sanger sequencing and were proven to be catalyzed by ADAR1. Finally, distinct editing ratios of RNA sites in editing boxes from different tissues/cell lines clearly suggest a spatial and dynamic regulation of A-to-I RNA editing across human tissues.
A computational flow to predict clustered A-to-I editing sites from transcriptomes only
STEP 2: a series of stringent cutoffs to reduce sequencing/mapping errors and to remove known genomic SNPs. As different samples vary in genome coverage and sequencing depth, we used the HPB value (Additional file 4) to normalize the expression level for each transcribed site across samples, and selected a relatively higher cutoff at HPB > 5 for a given site, comparable to RPKM/FPKM > 5 for a gene, to call potential editing candidates in highly expressed sites. In our calculation, 5 HPB represented 8 ~ 19 raw hits for each base in different transcriptomes (Additional file 5). The relatively high HPB in our analysis allowed us not only to locate the position of an editing site, but also to accurately calculate the editing ratio of each site.
Characterization of editing prediction pipeline
# of all m.m. in Alu
# of A-to-I in Alu
A-to-I ration in all m.m.
STEP2: w/o PSS cutoff
STEP3: PSS cutoff
STEP4: in editing box
Non- Alu repetitive
Editing boxes (sites)
Ave. length (nt)
Ave. conversion rate of As to Is
in IR AIu s
within 1 kb to IR AIu s
> 1 kb to IR AIu s
A-to-I editing boxes (sites)
Since it is known that A-to-I editing sites are enriched in Alu elements, we calculated the enrichment of A-to-I conversion in Alu elements after each step of our computational flow. As shown in Table 1A, ~ 60% mismatches in Alu elements were A-to-G/T-to-C conversions after STEP 2, compared to ~ 24% before STEP 2 (data not shown). Furthermore, ~ 83% mismatches in Alu elements were A-to-G/T-to-C conversions after PSS cutoff, indicating PSS could greatly improve the identification of true editing sites. Finally, 100% mismatches identified in Alu elements were A-to-Is after the cluster filtering, while several regions clustered with other types of nucleotide conversions failed to be validated with Sanger sequencing (Table 1B and data not shown).
In total, we identified 2245 constitutive A-to-I editing sites clustered in 266 editing boxes (Additional file 5). Although the editing boxes were largely from Alu elements, we found 7 editing boxes from non-Alu repetitive regions and 21 editing boxes from non-repetitive regions (Table 1B). The average length of non-repetitive editing boxes is 71 nt, which is shorter than that of Alu and non-Alu repetitive editing boxes (Table 1B). However, the average A-to-I nucleotide conversion rate in non-repetitive editing boxes is about 51% of all As, which is higher than Alu and non-Alu repetitive editing boxes (Table 1B), suggesting the surprising result that promiscuous A-to-I editing can occur in non-repetitive regions.
Characterization of predicted constitutive A-to-I sites in editing boxes
We further examined genomic locations of 695 new editing sites in editing boxes. These new sites are located in noncoding regions, including noncoding exons, intergenic regions and introns (Figure 4C). In addition, many editing sites in intergenic regions were located within 10 kb of annotated genes, suggesting these unannotated regions could be extended 3′-UTRs of adjacent genes. Although editing box sites were largely from Alu elements, 50 and 116 editing box sites were from non-Alu repetitive or non-repetitive regions, respectively (Figure 4D). Additional analyses revealed that the majority of these editing boxes were located in or close to IRAlus (within 1 kb to IRAlus) (Table 1C), suggesting promiscuous editing in non-Alu editing boxes could be facilitated by the recruitment of ADAR enzymes to nearby duplex structures. However, 111 new editing sites in non-repetitive regions (from 172 in total, Table 1C) were further than 1 kb from the nearest IRAlus (Figure 4E), suggesting that other mechanisms may be involved in these promiscuous editing events.
Predicted constitutive A-to-I sites from non-repetitive editing boxes are catalyzed by ADAR1
It is known that the majority of A-to-I editing in the human transcriptome occurs within Alu elements [4–6, 8–10, 27]; however, it was unexpected to identify promiscuous editing sites in non-repetitive sequences. Thus, we randomly selected several such editing boxes for validation.
Editing ratios of constitutive A-to-I sites at one editing box in 18 human samples
White Bllod Cell
Li, et al. 200924
Bahn, et al. 2012-BC9
Bahn, et al. 2012-U87MG9
Peng, et al. 20128
Ramaswami, et al. 201210
Comparison of predicted clustered and non-clustered constitutive A-to-I sites
A-to-Is in H9 and HeLa
Numbers of validated sites
22 of 22
7 of 15
Predicted A-to-I sites
Predicted A-to-I ratios
A- > G (+)
A- > G (+)
A- > G (+)
A- > G (+)
A- > G (+)
A- > G (−)
A- > G (+)
A- > G (−)
A- > G (+)
A- > G (−)
A- > G (+)
A- > G (+)
A- > G (+)
A- > G (+)
A- > G (+)
Characterization of promiscuous A-to-I RNA editing from non-repetitive editing boxes
To further test this possibility, we cloned sequences of editing boxes in 3′UTR of egfp or in the upstream region of single Alu or IRAlus in 3′UTR of egfp (Figure 6C). We have previously shown that IRAlus, but not single Alus, can be extensively edited when expressed from plasmid vectors, even during transient transfection . We reasoned that if the adjacent IRAlus recruit ADARs to the nearby editing boxes, we would find more editing sites in editing boxes in vector containing IRAlus than those containing single Alu or no Alu. Otherwise, if editing boxes alone are sufficient to recruit ADARs, we would observe promiscuous editing in all examined vectors. Strikingly, our analyses revealed that sequences in editing boxes in all examined vectors were extensively edited in a similar way as that observed in their endogenous loci (Figure 6C and 6D). These results demonstrated that non-repetitive editing boxes alone can be edited by ADAR1, independent of adjacent IRAlus.
Constitutive A-to-I sites in editing boxes are highly dynamic across human tissues
Taken together, we have developed an approach to quantitatively profile constitutive A-to-I RNA editing from multiple human transcriptomes in the absence of the relevant genomic information. The application of our approach has allowed us to identify a large number of clustered constitutive A-to-I sites, including 695 novel sites. Our analysis also revealed that non-repetitive editing boxes could be promiscuously edited by ADAR1, independent of their adjacent IRAlus. Finally, although functionally unknown, marked differences of editing ratios in the same sites identified in editing boxes clearly suggest a spatial and dynamic regulation of A-to-I RNA editing across human tissues.
RNA-seq datasets, widespread through currently available public databases, are rich sources to search for A-to-I RNA editing sites. However, RNA-DNA mismatches between RNA-seq reads and the genome make the alignment of nucleotide variations to the genome problematic. In addition, transcriptome and genomic DNA sequencing datasets are not always available for single individuals, thus making straightforward prediction of A-to-I editing sites from available transcriptomes even more challenging. In this study, we developed a new computational approach to predict RNA editing from multiple tissues in the absence of the genome information. An additional 695 novel A-to-I editing sites have been identified compared to several other recent studies [8–10, 27] and DARNED database (Figure 4B). We expect to detect more constitutive A-to-I RNA editing sites with additional sets of human transcriptomes as inputs by obtaining a higher PSS value for each A-to-G mismatch site. In addition, discrepancies of reported editing sites could be due to a variety of cell lines/tissues used in different studies (Figure 4B) [8–10, 27].
Very recently, Ramaswami et al. also reported the identification of edited sites from transcriptome data only . Their method was reported earlier  and slightly modified for identifying RNA editing sites in the absence of the related genomic DNA sequencing datasets . In our present study, the pipeline was designed to identify clustered and constitutively edited A-to-Is. In total, 2,245 such editing sites were identified, including 695 new ones. Strikingly, these new sites were still largely missed by Ramaswami et al.  although much larger datasets were used. For example, they identified 181 out of 695 from 40 human lymphoblastoid cell lines, 273 out of 695 from 50 human brain samples, and 339 out of 695 from the same 16 human tissue samples.
Since we focused on clustered A-to-Is which are constitutive edited in at least three human tissues/transcriptomes, limited editing sites were identified in this study. It is also noteworthy that some limitations exist in this approach, including the insufficiency to predict more restricted tissue-specific editing, the inadequacy to identify some true editing sites with 40-60% or >95% editing ratios, and inaccuracies in identifying non-clustered editing sites (about 47% experimental validation, Table 3). For instance, true editing sites, such as A-to-I sites in pre-mRNAs of GluR-B, were not addressed in our study. In addition, true editing sites with low expression or low editing ratios could have been missed due to stringent cutoffs in the computational flow. These true editing sites would be captured if multiple RNA-seq datasets from the same tissue (to achieve a higher PSS value) and higher depths of RNA-seq datasets from individual samples were included in the future analysis. While a few non A-to-Gs (noncanonical editing) sites might be expected, none could be validated as true editing sites. These noncanonical sites could be derived mostly from mis-mapping reads to a highly similar genomic duplicate region, as suggested by Piskol et al. . In the future, more stringent filters are needed for RNA editing prediction to remove this type of mapping errors.
Strikingly, we found that promiscuous RNA editing is not restricted to transcribed inversely orientated repetitive elements, such as IRAlus. Our analysis revealed many predicted constitutive A-to-I editing sites that appeared in clusters and were enriched in non-repetitive editing boxes with an extremely high A-to-I conversion frequency (Table 1B). A recent study suggested that editing of non-Alu sites appeared to be dependent on nearby edited Alu sites, likely by the recruitment of ADAR enzymes to nearby duplex structures . However, we demonstrated that editing boxes alone were sufficient to be edited promiscuously by ADAR1 in expression vectors, and adjacent IRAlus have little effect to facilitate more editing (Figure 6). Although we could identify no consensus sequences in non-repetitive editing boxes, they are likely to form dsRNAs and the edited sites have similar 5′ neighbor preferences as reported recently for other ADAR1 substrates . Thus, these new substrates predicted in this study further expanded our knowledge of the catalytic pattern of A-to-I RNA editing by ADAR1.
RNA-seq datasets from 16 human tissues sequenced by Illumina HiSeq 2000 (Illumina Human Body Map 2.0 Project) and two additional cell lines sequenced by Illumina Genome Analyzer IIx (GAIIx)  were retrieved from Gene Expression Omnibus (GEO:GSE30611 for tissues and GEO:GSE24399 for cell lines). About 40 ~ 80 million 75-nt single reads from each poly(A) + RNA-seq sample were obtained and further trimmed to 70-nt long at both 5′ and 3′ ends for 2 nt and 3 nt, respectively to reduce high sequencing errors at read ends (Additional file 1).
Customized mapping strategy (STEP 1)
A two-round-unique mapping strategy with Bowtie , SOAP , or BWA  was applied to retrieve an increased number of mismatch calling (Figure 1). First Bowtie (v 0.12.8) mapping was performed from 70-bp reads to the hg19 human genome/junction  with up to three mismatches. After removal of multiple-aligned reads, unmapped 70-bp reads were split into two 35-nt fragments. 35-nt fragments from 5′ and 3′ were sequentially applied for the second unique mapping with up to three mismatches. The mapped 35-nt fragments were then extended to the other half with no more than 6 mismatches in total. In addition, reads with a distribution bias of mismatches that indicate higher sequencing errors at read ends are also excluded in this analysis. Other aligners (like BWA) can certainly be used for analysis directly with high mismatch allowance, but new parameters are needed to avoid/remove sequencing and mapping errors. The split scheme allowed us to retrieve more mismatches (up to six editing sites within 70-nt compared with three in default), and improved our capability in identifying the clustered RNA editing sites (Figure 2).
Removal of sequencing errors and annotated gSNPs (STEP 2)
As the strand information of these RNA-seq datasets was not available, we referred plus strand of (“+”) chromosomes as reference for mismatch calling. In addition to trim 75-nt reads from both ends to 70-nt, we carried out the following stringent criteria for mismatch calling: (i): Each mismatch site must have a Hits Per Billion-mapped-bases (HPB) > 5. Since multiple RNA-seq datasets with different sequencing depths were used in this study, we developed HPB to normalize the expression level for each base across samples, and selected a HPB > 5 for each mismatch site (comparable to RPKM/FPKM > 5 for genes, Additional file 4) to focus on highly expressed mismatches. (ii): To improve the predicted editing accuracy and reduce false positives, we used mismatch ratio > 5% as a cutoff. Mismatch ratios were calculated by using mismatched hits vs all hits on the same sites. For example, G:(A + C + G + T + N) > 5% for A-to-G mismatch in a corresponding genomic position as A, and etc. (iii): To reduce random sequencing errors and to improve the correct assignment of sequence reads, we used effective signal > 95% as a cutoff. For example G:(C + G + T + N) > 95% for A-to-G mismatch, and etc. (iv): Require at least two individual reads with the same type of nucleotide conversion. (v): We finally filtered out gSNPs from the common SNP database (build 135, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp135Common.txt.gz) and 1000 Genome database (http://evs.gs.washington.edu/EVS/, downloaded on July 15, 2012).
Removal of unannotated gSNPs by customized PSS (STEP 3)
PSS was set up to further reduce unknown genomic noise by taking advantage of multiple human tissue RNA-seq datasets. Notably, most mismatches showed low ratios (< 20%) from multiple human tissues, while some showed high mismatch ratios (>60%) (Figure 3A, and Additional file 12). In contrast, mismatch ratios of known gSNPs were significantly enriched in two peaks: one major peak at around 100% (homozygous) and a minor peak at around 50% (heterozygous) mismatch ratio (Figure 3B, Additional file 12). Theoretically, genomic variations would give rise to either ~ 50% or ~ 100% mismatch ratios depending on whether the variation is heterozygous (Additional file 6A) or homozygous (Additional file 6B) . For a given unknown mismatch site existing in multiple tissues, a PSS was given to test its probability for either a genome variation (PSS = −1, with mismatch ratio ≥ 95% or between 40% ~ 60%) or an editing (PSS = 1, with mismatches ratios between 5% ~ 40% or between 60% ~ 95%) in each sample (Figure 3A and Additional file 12). To optimize parameters for PSS cutoff by considering both efficiency of gSNPs removal and the number of nucleotide variants remained after the removal, we permuted all possible combinations among 40% ~ 60% and 90% ~ 100%. The combination of 40% ~ 60% and ≥ 95% in current analysis is among the best parameter for our purpose (Zhu, et al., unpublished data). A final overall PSS for each mismatch site was obtained by adding up PSSs from multiple tissues and cell lines. PSSs for known SNPs were calculated with a similar strategy and their distribution was then plotted against PSS from −18 to 18. With cutoff at PES < 3, over 97.5% expressed SNPs were filtered out.
Identification of constitutive A-to-I sites in editing box regions (STEP 4)
Mismatch sites were selected using the following criteria: (i) predicted editing sites were constitutively transcribed at least from three human tissues/cell lines; (ii) each site is no longer than 50 bp away from the nearest site and the minimum transcribed genomic region is 20 bp long; (iii) Each site has a greater than 20% mismatch rate in at least one tissue; (iv) at least 5 mismatch sites clustered in one region with at least 20% conversion rate for each type of nucleotide. Thus, We named these regions containing promiscuous edited A-to-I sites as “editing boxes”.
Characterization of constitutive A-to-I sites in editing boxes
Previously identified editing sites were retrieved from the RNA editing database (http://darned.ucc.ie/) and different studies [8–10, 27] for comparison. RefSeq Genes and annotated intron/exon boundaries were retrieved from from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz). Alu and non-Alu repetitive elements were retrieved from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz. IRAlus were defined as any two or more inversely oriented Alu elements located within two kilobases in their genomic location [6, 12, 34].
Analyses of neighbor preferences and RNA secondary structure
Neighbor preferences were calculated based on predicted constitutive editing sites in non-repetitive or non-Alu repetitive regions, by extending 2 bases in both upstream and downstream flanking regions. The neighbor preferences were drawn by software WebLogo . The structure of adjacent two editing boxes at chr2 was predicted by RNAfold from ViennaRNA Package 2.0.7 .
Cell culture, plasmid construction and transfection, knockdown of ADAR1, and Western blots
HeLa cells were cultured using standard protocol provided by ATCC. Human embryonic stem cells (H9 line) were maintained as described before . Sequences of editing box region (Additional file 13) were cloned into the pEGFP series vectors  and each plasmid was transfected into HeLa cells for 24 hours prior to harvest total RNAs for editing analysis. Sense and antisense oligonucleotides were designed based on a human ADAR1 targeting sequence (5′- GTTGACTAAGTCACATGTAAA-3′)  and a control scramble sequence (5′-GATGGCATTACGGCATGTTCA-3′)  and cloned into pLVTHM vector. Lentivirus particles were produced in HEK-293FT cells with the co-transfection of packaging vectors psPAX2 and pMD2.G. For infection, HeLa cells were incubated with concentrated viral particles at 37°C overnight and the medium was changed to fresh the next day. Infected HeLa cells were collected 72 hours later for Western blots with goat anti-ADAR1 (Santa Cruz Biotechnology).
Total RNA isolation, RT-PCR, and Sanger sequencing validation
Total RNAs from HeLa, ADAR1 knockdown HeLa cells, transfected HeLa cells, and H9 cells were extracted with Trizol Reagent (Invitrogen) according to the manufacturer’s protocol. After treatment with DNase I (Ambion, DNA-free™ kit), the cDNA was transcribed with SuperScript II (Invitrogen) with oligo (dT) or random hexamer. Genomic DNAs were purified from both cell lines by TIANamp Genomic DNA kit (Tiangen Biotech). PCR products from cDNAs and gDNAs were amplified with primers (Additional file 13), and predicted A-to-I editing sites were validated in available cell lines with the conventional Sanger sequencing. Editing ratios of validated A-to-I sites by Sanger sequencing were calculated by “ImageJ” (http://rsb.info.nih.gov/ij/index.html). Briefly, the areas of edited and unedited signals, indicated as the signal intensities at each site, were carefully selected and measured by “ImageJ”. The editing ratio was then calculated by dividing edited intensity with total intensity at the same site. Correlation of editing ratios calculated from Sanger sequencing and RNA-seq were determined by scatter plot.
Stranded RNA-seq analysis
Strand-specific RNA-seq libraries were prepared with prereleased Directional mRNA-seq Library Kits (Illumina) with minor modifications. Briefly, after enriched by oligo-dT selection, poly(A) + RNAs were fragmented, and treated with phosphatase and polynucleotide kinase to repair the ends. RNA adapters were sequentially ligated to the 3′ and 5′ ends of RNA fragments and reverse transcribed using a primer complementary to the 3′ linker. cDNA library was then amplified and sequenced on HiSeq2000 with 1x100 bp reads. The sequence file can be accessed from the NCBI Sequence Read Archive by GEO Accession Number GSE44450.
We present an integrative approach to quantitatively profile constitutive A-to-I RNA editing from multiple human transcriptomes in the absence of the relevant genomic information. The application of our approach has allowed us to identify a large number of clustered constitutive A-to-I sites, including 695 novel ones. We further demonstrated that non-repetitive editing boxes could be promiscuously edited by ADAR1, independent of their adjacent IRAlus. Strikingly, clear differences of editing levels in the same editing box sites but from different tissues/cell-lines were also observed, strongly indicating a spatial and dynamic regulation of A-to-I RNA editing across human tissues. Our work thus offers new insights into the catalytic pattern and complex regulation of A-to-I editing by ADAR1.
Embryonic stem cell
Fragments per kilobase per million
Hits per billion-mapped-bases
Reads per kilobase per million
Potential SNP score
Single nucleotide polymorphisms.
We are grateful to Gordon Carmichael for critical reading of the manuscript and all lab members for helpful discussion and technical support from Huahong Fang, Zheng Wu, and Yefen Xu. H9 cells were obtained from the WiCell Research Institute. H9 stranded RNA-seq was performed at CAS-MPG Partner Institute for Computational Biology Omics Core. This work was supported by CAS(XDA01010206), NSFC(31271390), the Hundred Talents Program of CAS (2012OHTP08), the Talents Program of SIBS (2012SSTP01) and SMSTC (11PJ1411000) to LLC and LY.
- Bass BL: RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem. 2002, 71: 817-846. 10.1146/annurev.biochem.71.110601.135501.PubMed CentralView ArticlePubMedGoogle Scholar
- Nishikura K: Functions and Regulation of RNA Editing by ADAR Deaminases. Annu Rev Biochem. 2010, 79: 321-349. 10.1146/annurev-biochem-060208-105251.PubMed CentralView ArticlePubMedGoogle Scholar
- Vendeix FA, Munoz AM, Agris PF: Free energy calculation of modified base-pair formation in explicit solvent: A predictive model. RNA. 2009, 15: 2278-2287. 10.1261/rna.1734309.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A: Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res. 2004, 14: 1719-1725. 10.1101/gr.2855504.PubMed CentralView ArticlePubMedGoogle Scholar
- Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D: Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol. 2004, 22: 1001-1005. 10.1038/nbt996.View ArticlePubMedGoogle Scholar
- Athanasiadis A, Rich A, Maas S: Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2004, 2: e391-10.1371/journal.pbio.0020391.PubMed CentralView ArticlePubMedGoogle Scholar
- Carmi S, Borukhov I, Levanon EY: Identification of widespread ultra-edited human RNAs. PLoS Genet. 2011, 7: e1002317-10.1371/journal.pgen.1002317.PubMed CentralView ArticlePubMedGoogle Scholar
- Peng Z, Cheng Y, Tan BC-M, Kang L, Tian Z, Zhu Y, Zhang W, Liang Y, Hu X, Tan X: Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol. 2012, 30: 253-260. 10.1038/nbt.2122.View ArticlePubMedGoogle Scholar
- Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X: Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 2012, 22: 142-150. 10.1101/gr.124107.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramaswami G, Lin W, Piskol R, Tan MH, Davis C, Li JB: Accurate identification of human Alu and non-Alu RNA editing sites. Nat Methods. 2012, 9: 579-581. 10.1038/nmeth.1982.PubMed CentralView ArticlePubMedGoogle Scholar
- Prasanth KV, Prasanth SG, Xuan Z, Hearn S, Freier SM, Bennett CF, Zhang MQ, Spector DL: Regulating gene expression through RNA nuclear retention. Cell. 2005, 123: 249-263. 10.1016/j.cell.2005.08.033.View ArticlePubMedGoogle Scholar
- Chen LL, DeCerbo JN, Carmichael GG: Alu element-mediated gene silencing. EMBO J. 2008, 27: 1694-1705. 10.1038/emboj.2008.94.PubMed CentralView ArticlePubMedGoogle Scholar
- Mao YS, Sunwoo H, Zhang B, Spector DL: Direct visualization of the co-transcriptional assembly of a nuclear body by noncoding RNAs. Nat Cell Biol. 2011, 13: 95-101. 10.1038/ncb2140.PubMed CentralView ArticlePubMedGoogle Scholar
- Hundley HA, Krauchuk AA, Bass BL: C. elegans and H. sapiens mRNAs with edited 3′ UTRs are present on polysomes. RNA. 2008, 14: 2050-2060. 10.1261/rna.1165008.PubMed CentralView ArticlePubMedGoogle Scholar
- Higuchi M, Single FN, Köhler M, Sommer B, Sprengel R, Seeburg PH: RNA editing of AMPA receptor subunit GluR-B: a base-paired intron-exon structure determines position and efficiency. Cell. 1993, 75: 1361-1370. 10.1016/0092-8674(93)90622-W.View ArticlePubMedGoogle Scholar
- Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB: Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature. 1997, 387: 303-308. 10.1038/387303a0.View ArticlePubMedGoogle Scholar
- Yang W, Chendrimada TP, Wang Q, Higuchi M, Seeburg PH, Shiekhattar R, Nishikura K: Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol. 2006, 13: 13-21. 10.1038/nsmb1041.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang W, Wang Q, Howell KL, Lee JT, Cho DS, Murray JM, Nishikura K: ADAR1 RNA deaminase limits short interfering RNA efficacy in mammalian cells. J Biol Chem. 2005, 280: 3946-3953.PubMed CentralView ArticlePubMedGoogle Scholar
- Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K: Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science. 2007, 315: 1137-1140. 10.1126/science.1138050.PubMed CentralView ArticlePubMedGoogle Scholar
- Kawahara Y, Megraw M, Kreider E, Iizasa H, Valente L, Hatzigeorgiou AG, Nishikura K: Frequency and fate of microRNA editing in human brain. Nucleic Acids Res. 2008, 36: 5270-5280. 10.1093/nar/gkn479.PubMed CentralView ArticlePubMedGoogle Scholar
- Alon S, Mor E, Vigneault F, Church G, Locatelli F, Galeano F, Gallo A, Shomron N, Eisenberg E: Systematic identification of edited microRNAs in the human brain. Genome Res. 2012, 22: 1533-1540. 10.1101/gr.131573.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Vesely C, Tauber S, Sedlazeck FJ, Jantsch MF, von Haeseler A: Adenosine deaminases that act on RNA induce reproducible changes in abundance and sequence of embryonic miRNAs. Genome Res. 2012, 22: 1468-1476. 10.1101/gr.133025.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMed CentralView ArticlePubMedGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralView ArticlePubMedGoogle Scholar
- Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM: Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science. 2009, 324: 1210-1213. 10.1126/science.1170995.View ArticlePubMedGoogle Scholar
- Eggington JM, Greene T, Bass BL: Predicting sites of ADAR editing in double-stranded RNA. Nat Commun. 2011, 2: 319-PubMed CentralView ArticlePubMedGoogle Scholar
- Enstero M, Daniel C, Wahlstedt H, Major F, Ohman M: Recognition and coupling of A-to-I edited sites are determined by the tertiary structure of the RNA. Nucleic Acids Res. 2009, 37: 6916-6926. 10.1093/nar/gkp731.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramaswami G, Zhang R, Piskol R, Keegan LP, Deng P, O’Connell MA, Li JB: Identifying RNA editing sites using RNA sequencing data alone. Nat Methods. 2013, 10: 128-132. 10.1038/nmeth.2330.PubMed CentralView ArticlePubMedGoogle Scholar
- Piskol R, Peng Z, Wang J, Li JB: Lack of evidence for existence of noncanonical RNA editing. Nat Biotechnol. 2013, 31: 19-20. 10.1038/nbt.2472.View ArticlePubMedGoogle Scholar
- Yang L, Duff MO, Graveley BR, Carmichael GG, Chen L-L: Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 2011, 12: R16-10.1186/gb-2011-12-2-r16.PubMed CentralView ArticlePubMedGoogle Scholar
- Park E, Williams B, Wold BJ, Mortazavi A: RNA editing in the human ENCODE RNA-seq data. Genome Res. 2012, 22: 1626-1633. 10.1101/gr.134957.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Blow M, Futreal PA, Wooster R, Stratton MR: A survey of RNA editing in human brain. Genome Res. 2004, 14: 2379-2387. 10.1101/gr.2951204.PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.PubMed CentralView ArticlePubMedGoogle Scholar
- Lorenz R, Bernhart SH, Tafer H, Flamm C, Stadler PF, Hofacker IL, zu Siederdissen CH: ViennaRNA Package 2.0. Algorithms Mol Biol. 2011, 6: 26-10.1186/1748-7188-6-26.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen LL, Carmichael GG: Altered nuclear retention of mRNAs containing inverted repeats in human embryonic stem cells: Functional role of a nuclear noncoding RNA. Mol Cell. 2009, 35: 467-478. 10.1016/j.molcel.2009.06.027.PubMed CentralView ArticlePubMedGoogle Scholar
- Toth AM, Li Z, Cattaneo R, Samuel CE: RNA-specific adenosine deaminase ADAR1 suppresses measles virus-induced apoptosis and activation of protein kinase PKR. J Biol Chem. 2009, 284: 29350-29356. 10.1074/jbc.M109.045146.PubMed CentralView ArticlePubMedGoogle Scholar
- Yin QF, Yang L, Zhang Y, Xiang JF, Wu YW, Carmichael GG, Chen LL: Long Noncoding RNAs with snoRNA Ends. Mol Cell. 2012, 48: 219-230. 10.1016/j.molcel.2012.07.033.View ArticlePubMedGoogle Scholar