Single-cell RNA sequencing reveals dynamic changes in A-to-I RNA editome during early human embryogenesis

Background A-to-I RNA-editing mediated by ADAR (adenosine deaminase acting on RNA) enzymes that converts adenosine to inosine in RNA sequence can generate mutations and alter gene regulation in metazoans. Previous studies have shown that A-to-I RNA-editing plays vital roles in mouse embryogenesis. However, the RNA-editing activities in early human embryonic development have not been investigated. Results Here, we characterized genome-wide A-to-I RNA-editing activities during human early embryogenesis by profiling 68 single cells from 29 human embryos spanning from oocyte to morula stages. We demonstrate dynamic changes in genome-wide RNA-editing during early human embryogenesis in a stage-specific fashion. In parallel with ADAR expression level changes, the genome-wide A-to-I RNA-editing levels in cells remained relatively stable until 4-cell stage, but dramatically decreased at 8-cell stage, continually decreased at morula stage. We detected 37 non-synonymously RNA-edited genes, of which 5 were frequently found in cells of multiple embryonic stages. Moreover, we found that A-to-I editings in miRNA-targeted regions of a substantial number of genes preferably occurred in one or two sequential stages. Conclusions Our single-cell analysis reveals dynamic changes in genome-wide RNA-editing during early human embryogenesis in a stage-specific fashion, and provides important insights into early human embryogenesis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3115-2) contains supplementary material, which is available to authorized users.


Background
A-to-I RNA-editing mediated by ADAR (adenosine deaminase acting on RNA) enzymes is the major RNAediting that post-transcriptionally modifies nucleotide sequences on RNA molecules in metazoans [1]. RNAediting can alter protein sequences, influence RNA stability and miRNA regulations in multiple biological processes including development and carcinogenesis [2]. The mammalian ADAR proteins include ADAR, ADARB1, and ADARB2 [3].
Recent studies have demonstrated that most mice with a null allele of ADAR died before E14 due to defects in the hematopoietic system [4], and most mice with editing deficient ADAR mutation knock-in died at E13.5 as a result of unedited transcripts activating interferon and dsRNA sensing pathway [5]. Shtrichman et al. [6] found that editing levels of various target genes are substantially greater in most adult tissues than corresponding fetal ones and that ADAR protein is substantially regulated in undifferentiated pluripotent hESCs. These findings suggest that RNA-editing plays important roles in embryogenesis. Although early human embryonic transcriptome profiles have been studied [7][8][9], no research on RNAediting activities before blastocyst stage during human embryogenesis has been conducted. To investigate RNAediting activities during early embryogenesis in humans, we profiled the RNA editome from 68 single cells from 29 human embryos ranging from oocyte to morula stages using published human embryonic single cell transcriptome data [8,9].

Characteristics of RNA editome during early human embryogenesis
By analyzing 68 single cells from 29 human embryos spanning from oocyte to morula stages in early embryogenesis (Additional file 1: Figure S1) using our RNA identification pipeline, we identified 14,049 candidate RNA mismatches, including 9,795 in Alu and 4,254 in non-Alu regions. Of the 9,795 mismatches in Alu regions we identified, A-to-G was the most prevalent mismatch type (account for 88.04 %), followed by T-to-C mismatches (account for 11.61 %), of which the majority were thought to be incorrect annotation of A-to-I editing because the RNA-seq libraries were not strand-specific [10]. The A-to-G and T-to-C mismatches together account for 99.65 % of the sites identified in Alu region (Fig. 1a). A typical ADAR-mediated editing is characterized by underrepresentation of G base in position −1 next to the edited site and over-representation of G base in position +1 next to the edited site [11]. Indeed, this characteristic was seen at our identified A-to-G sites and at the complementary strand of T-to-C sites (Fig. 1b). Of the 4,254 candidate RNA editing sites identified in non-Alu regions 2,247 were A-to-G mismatches (52.82 %) and 488 were T-to-C mismatches (11.47 %). The proportion of A-to-G/T-to-C mismatches in non-Alu regions is smaller than those in Alu regions (Fig. 1a). Similar findings have been reported by Fumagalli et al. [12] when they studied RNA-DNA single nucleotide differences (RDDs) in breast cancer. Considering that the proportion of validated RDDs was 90 % in Alu region and below 40 % outside Alu region in their study, and that the majority of editing events are in Alu elements in human [13], we only retain non-Alu A-to-G mismatches recorded in RADAR database [14] and Alu A-to-G mismatches as candidate A-to-I RNA-editing sites in our further analyses.
Totally, we identified 8,813 candidate A-to-I RNAediting sites (Additional file 2: Table S1), of which 97.84 % located in Alu regions and 3,253 sites were present in the RADAR database. We annotated A-to-I RNA-editing sites and found that the majority were located in 3'-UTR regions (47.12 %), followed by intronic (33.77 %), noncoding RNA (ncRNA, 17.12 %), coding (0.98 %), and 5'-UTR regions (1.01 %), respectively (Fig. 1c). We noticed that the proportion of intronic A-to-I editing sites identified in our study is much smaller than that (94.03 %) in RADAR database, which may due to the low-coverage of the single cell RNA-seq data we analyzed. We will discuss this issue later on.
On average, we detected 819 ± 609, 702 ± 64, 953 ± 377, 1,057 ± 557, and 743 ± 330 A-to-I RNA-editing sites in each cell at oocyte, pronucleus, zygote, 2-cell, and 4-cell stages, respectively, whereas in cells at 8-cell and morula stages the A-to-I RNA-editing sites sharply dropped to only 190 ± 97 and 86 ± 35 per cell, respectively (Fig. 1d). To investigate the changes in RNA-editing patterns during human early embryogenesis, we clustered single cells based on editing frequencies (defined as the variantsupporting reads divided by all reads mapped to a specific RNA-editing site), and found that cells at 8-cell and morula stages were clustered together and separated from cells at earlier stages (Fig. 1e). The heatmap shows that~15 % of A-to-I RNA-editing sites present in cells crossing from oocyte to 4-cell stages, but not in cells at 8-cell and morula stages, despite of expression of transcripts at moderate levels (Fig. 1e).
A-to-I RNA-editing levels are sharply decreased in cells at 8-cell and morula stages To profile genome-wide A-to-I RNA-editing changing patterns in the embryonic development, we defined A-to-I RNA-editing level as edited bases per million mapped bases in each cell. Under this definition, A-to-I RNA-editing levels are not affected by the number of mapped bases (Pearson's correlation test, P = 2.11E-01, r = −0.15; Fig. 2a).
We observed that on average the RNA-editing levels remained relatively stable until 4-cell stage, but dramatically decreased (68 %) from 4-cell to 8-cell stages (Wilcoxon rank sum test, P = 7.92E-6), and continually decreased (41 %) at morula stage (Wilcoxon rank sum test, P = 1.48E-02; Fig. 2b and Additional file file 1: Figure S2). We then investigated the gene expression of ADAR, ADARB1, and ADARB2 in cells at different stages, and found that ADARB1 and ADARB2 expression remained at low level (~1 Reads Per Kilobase per Million mapped reads, RPKM) across all embryonic stages investigated (Fig. 2c). The substantially lower expression of ADARB1 than ADAR is also seen in many tissues in adult humans (Table S2) [15]. Amazingly, the changes in ADAR expression levels were almost in parallel with the changes in editing levels in cells at all stages investigated ( Fig. 2b and Additional file 1: Figure S3). Correlation tests indicated that RNA-editing levels were strongly correlated with ADAR expression levels (Pearson's correlation test, P = 5.99E-13, r = 0.74; Fig. 2d). Interestingly, we noticed the largest decreases in ADAR expression level and in A-to-I RNA-editing level occurred in the cells at 8-cell stage. It is worth noting that although the ADARB1 expression levels remained low in cells of all stages investigated, we detected a moderate correlation between the ADARB1 expression levels and the A-to-I RNA-editing levels (Pearson's correlation test, P = 3.38E-4, r = 0.42; Fig. 2e).
Non-synonymous A-to-I RNA-editing events frequently occurred in cells before 8-cell stage We detected 324 A-to-I RNA editing events at 54 nonsynonymous sites on 37 genes, of which 292 nonsynonymous A-to-I RNA-editing events on 36 genes occurred in cells before 8-cell stage (Fig. 3a). Among the 54 nonsynonymous sites, 27 sites were deposited in RADAR (Additional file 2: Table S3). In addition, we observed that 7 genes were each non-synonymously edited at a specific embryonic stage. We found that AZIN1, DCAF16, SLC3A1, TTF1, and NSMCE2 were non-synonymously edited in more than 25 % of cells, almost exclusively confined to stages before 8-cell. Previous study showed that the non-synonymous RNA-editing sites in AZIN1 (A1099G, ENST00000347770) and DCAF16 (A486G, ENST00000382247) are clinically relevant in cancers [16]. We observed that the A1099G editing events on AZIN1 occurred in 3 of 6 oocytes, 3  in the extracellular topological domain of SLC3A1, which encodes a renal amino acid transporter [17]. We found non-synonymous A-to-I RNA-editing events at three sites (A2651G, A2677G, and A2678G, ENST00000334270) altering the TTF1 C-terminal amino acid sequence, which was suggested to mediate the termination of ribosomal gene transcription [18]. In addition, we found nonsynonymous A-to-I RNA-editing events frequently altered the amino acid sequence of NSMCE2 (A262G and A268G, ENST00000519712), which is required for DNA repair [19].
Editing frequencies at the exonic RNA editing sites of four genes are negatively associated with their expression We conducted Spearman's correlation analysis between gene expression and editing frequency on 163 genes which were edited in the exon regions and found that A-to-I RNA-editing frequencies were negatively associated with the expression of four frequently edited genes (adjusted P < 0.1, Benjamini-Hochberg method). For instance, the frequency of non-synonymous editing Fig. 3 Recoding RNA-editing events in cells during early embryonic stages. a Heatmap of non-synonymous RNA-editing events in cells during early embryonic stages. Each row represents a cell, and each column represents a non-synonymous A-to-I editing site. Frequently edited genes are highlighted with dashed lines. Blank regions, uniquely mapped reads < 4, not qualified for RNA-editing determination. b Changes in nonsynonymous editing frequency are negatively associated with the gene expression of AZIN1. Left, changes in AZIN1's editing frequency; middle, changes in AZIN1's expression; right, editing frequency is negatively associated with the AZIN1 expression level, each open circle represents one single cell. * P < 0.05; ** P < 0.01; *** P < 0.001 A1099G on AZIN1 dramatically dropped from 4-cell to 8-cell stages (Wilcox rank sum test, P = 2.67E-6; Fig. 3b) while the AZIN1 expression substantially increased (edgeR, P = 3.24E-17; Fig. 3b). We found that AZIN1 expression was highly negatively associated with editing frequency of A1099G (Spearman's correlation test, r = −0.60, adjusted P = 2.02E-05, Fig. 3b). This pattern was also seen in protein coding genes DCAF16 (Spearman's correlation test, r = −0.43, adjusted P = 1.99E-02, Additional file 1: Figure S4A), and in long intergenic noncoding RNAs RPL23AP53 (Spearman's correlation test, r = −0.58, adjusted P = 2.63E-03, Additional file 1: Figure S4B) and SNHG16 (Small Nucleolar RNA Host Gene 16) which was suggested to be associated with cell proliferation [20] (Spearman's correlation test, r = −0.57, adjusted P = 4.66E-05, Additional file 1: Figure S4C).
In addition to the differences in quantities of editing sites at different stages, A-to-I RNA-editing in miRNAtargeted regions on a substantial number of genes appears to be stage related. By calculating the percentage of cells with editing events in miRNA-targeted regions on each gene at each stage, we found that a group of 19 genes were more frequently edited in oocytes than in cells of other stages (Fisher's exact test, P < 0.05), whereas a different group of 13 genes were more frequently edited in zygotes (Fisher's exact test, P < 0.05), and another group of 36 genes were more frequently edited in cells at 2-cell stage (Fisher's exact test, P < 0.05). Interestingly, the number of genes that were more frequently edited in miRNA-targeted regions increased to 42 in cells at 4-cell stage but suddenly decreased to 4 in cells at 8-cell stage (Fig. 4a). In addition, we observed a small number of genes that were more frequently edited in miRNA-targeted regions in cells crossing two sequential stages than other stages. For instance, one group of 5 genes were more frequently edited at zygote and 2-cell stages, while another 9 genes were more frequently edited at 2-cell and 4-cell stages (Fig. 4a). We noticed that 16 genes involved in generic transcription pathway and 18 genes involved in cell cycle, were edited in miRNA-targeted regions at most stages ( Fig. 4b and c). Interestingly, NUP43 and MCM4 involved in cell cycle were frequently edited in miRNA-targeted regions in cells across zygote to 4-cell stages (Fig. 4b).

Discussion
In this study, we detected genome-wide A-to-I RNAediting in cells ranging from oocyte to morula stages, and defined editing-level to reflect RNA-editing activities in a cell. We showed that A-to-I RNA-editing levels dramatically decreased at 8-cell stage. By looking at the impacts of A-to-I RNA-editing on protein recoding, we found that seven genes such as AZIN1 were frequently non-synonymously edited in cells of multiple embryonic stages.
We noticed that the proportion of intronic A-to-I editing sites identified in our study (33.70 %) is much smaller than that (94.03 %) in RADAR. Because introns are retained only in premature mRNA of genes, the sequencing depths in intron regions are usually lower than those in exon region of expressed genes in RNA-seq. Because the single cell RNA-seq data we analyzed are low-coverage, thus, the depths of many intronic regions are less than 4, which do not meet our requirement for determination of RNA editing events. This leads to a substantial reduction in A-to-I RNA editing sites being identified in intron regions, consequently largely reducing the percentage of intronic A-to-I RNA editing sites among the total Ato-I RNA editing sites identified.
Human embryogenesis is a complex and genetically well-programmed developmental process that is controlled by cascades of genes. Our results indicated that A-to-I RNA-editing acted in a stage-specific fashion during human early embryogenesis. We noticed that genome-wide A-to-I editing level suddenly and dramatically dropped in cells at 8-cell stage, suggesting that this sudden drop of A-to-I RNA-editing level may have an important biological significance during human early embryogenesis. Although the biological function of this sudden drop of editing level is yet to be discovered, we consider this event particularly interesting. Previous studies suggested that the 8-cell stage is a turning point because many important biological events occur at this stage. For example, a recent study on human early embryogenesis showed that there was a dramatic change in gene expression in cells at 8-cell stage as compared to the previous stages. At 8-cell stage, the expression of 3037 genes, enriched with genes involved in regulation of transcription and regulation of RNA metabolic process, substantially up-regulated while 1941 genes, enriched with genes involved in regulation of transcription and cell cycle, substantially down-regulated [9]. The X chromosome inactivation (XCI) is an important mechanism that compensates for the difference in gene dosage between XX females and XY males in mammals. Interestingly, the XCI in humans appears to start from the 8-cell stage [23]. Moreover, embryonic left-right separation is an important event during embryogenesis of bilaterians. In a recent study, by analyzing multiple lines of molecular and cell biology evidence, Ma concluded that embryos of bilaterians are divided into left and right lateral halves at or shortly after 8-cell stage [24].
AZIN1 plays a role in cell growth and proliferation by maintaining polyamine homeostasis within cells [25]. A non-synonymous editing (A1099G) on AZIN1 was found to be conserved in human and mouse [11]. Previous study showed that the A1099G edited AZIN1 resulted in substantially enhancing cell proliferation in cultured  CHEK1  CDC6  DSN1  MCM4  SKA1  NUP43  RRM2  TP53  CASC5  CDT1  CENPL  DHFR  GINS1  MCM8  SDCCAG8  TERF2   ZNF426  ZNF14  ZNF160  ZNF595  MED6  ZNF587  ZNF641  TGS1  ZNF136  ZNF33A  ZNF425  ZNF468  ZNF528  ZNF546 ZNF552 ZNF577 Fig. 4 Genes are edited in miRNA-targeted mRNA regions in a stage-specific fashion. a Heatmaps of the percentage of cells edited at each stage (left) and P-values by Fisher's exact tests (excluding pronucleus stage for insufficient cells) on the number of edited and unedited cells between a specific stage and other stages (right). Vertical bars highlight the genes that are more frequently edited in miRNA-targeted regions at one (black) or two sequential stages (blue) than other stages. Arabic numeral denotes the number of genes in each group. b and c Heatmaps of the frequencies of cells edited on genes involved in cell cycle and in generic transcription pathway, respectively. Each column represents an embryonic stage, and each row represents a gene liver cell lines [25]. In addition, previous studies showed that the editing frequencies at this site were significantly higher in tumors than in matched nontumorous tissues in hepatocellular carcinoma and esophageal squamous cell carcinoma [25,26]. Besides, AZIN1 expression was significantly increased in esophageal tumors compared with their matched nontumor specimens [26]. We observed that A1099G edited AZIN1 are present in 31 out of 38 cells crossing from oocyte to 4-cell stages, and 4 out of 20 cells at 8-cell stages. Intriguingly, the editing frequency at this site dramatically dropped from 4-cell to 8-cell stage while the AZIN1 expression substantially increased. We found that AZIN1 expression was highly and negatively correlated with the A1099G editing frequencies. As either increase in A1099G editing or high expression of AZIN1 may promote cell proliferation, we speculate that the frequent A1099G editing in earlier stages may be a compensation of the low level of AZIN1 expression. When embryos produce more AZIN1, a decrease in edited AZIN1 could keep the stability of overall AZIN1 activity. We believe that the underlying biological significance of the frequent A1099G editing on AZIN1 and the function of the A1099G edited AZIN1 during human early embryogenesis requires further investigation. We also believe that the negative association between the AZIN1 expression and the A1099G editing frequencies during early human embryogenesis is particularly interesting, deserving more comprehensive investigation.

Conclusions
Taken together, our study indicates that human embryos undergo dynamic changes in genome-wide A-to-I RNAediting during human early embryogenesis. Our findings underscore the importance of A-to-I RNA-editing during early human embryogenesis. It is worth noting that our findings are based on the observation of 68 cells from 29 embryos across 7 embryonic stages. Therefore we believe it is necessary to conduct more comprehensive studies to verify our findings and to understand the biological significance of the dynamic changes in A-to-I editing during human early embryogenesis.

Reads mapping and pre-processing
We downloaded single cell RNA-seq data of human embryos spanning from oocyte to morula stages from two previous studies [8,9] from NCBI database. In both studies, RNA libraries were sequenced on Illumina platform but not strand-specific. Xue et al., sequenced the libraries as 90 bp-long pair-end reads, while Yan et al., obtained 100 bp-long single-end reads. We noticed that the sequencing quality near the 3'-ends of some singleend reads were not satisfactory. Therefore, we trimmed off 15 bp from the 3'-end of every single-end sequencing read to eliminate sequencing errors. We used SOAPnuke to filter out reads from both studies that contained adapters and low quality using default parameter. We downloaded the hg19 (GRCh37) genome sequences from the UCSC Genome Browser (http:// genome.ucsc.edu). We aligned the filtered reads with Tophat2 [27] using command: The transcriptomes-index is generated using ensemble gene set. The command is: Tophat2 -G Homo_sapiens.GRCh37.75.gtf -transcriptome-index = transcriptome-index hg19.

RNA-editing detection
We only selected the data of 68 single cells from 29 embryos each with over 0.5 Gb uniquely mapped bases for further analyses. We summarized the base calls of preprocessed aligned RNA-reads to the human reference in pileup format. To identify candidate RNA-editing sites, we only used sequencing bases with base-quality ≥ 20. We determined RNA-editing sites as follows: Finally, we tried to filter out private DNA-induced variants using the exon DNA sequencing data from the father of 12 embryos in Xue's study. We found no additional paternal exonic DNA variants in the RNA mismatches we identified (Additional file 1: Table S6), suggesting that the false positive rate of DNA-induced RNA variants within the RNA mismatches we identified is considerably low.

Hierarchical clustering
In cluster analysis, we created a matrix C (i, k) to store the editing frequency (0 for no edit and −1 for depth < 4) on site i in sample k. We calculated the euclidean distances among cells based on this matrix using the dist() function and performed hierarchical clustering using the hclust() function with default parameters. Both functions could be found in R package stats. The heatmap was drawn using R package pheatmap.

Gene expression analyses
We calculate the number of reads aligned to each gene using the featureCounts function in R package Rsubread [29]. We performed differential gene expression analyses using the R package edgeR [30]. The RPKM values and differential expression P-values of genes used in this study were reported by edgeR.

RNA-editing sites annotation
We used ANNOVAR [31,32] to functionally annotate RNA-editing sites. The gene set used in annotation including ensemble gene set v75 and NONCODE v4 [33]. When studying the editing in miRNA-target regions, we focused on regions that were complementary to miRNA seed within 3' untranslated regions based on the predicted miRNA targeted regions downloaded from http://www.microrna.org/microrna/getDownloads.do (August 2010 release, good mirSVR scores).

Additional files
Additional file 1: Figure S1. RNA sequencing data summary. Figure  S2. Changes in RNA-editing levels during early human embryogenesis in each study. Figure S3. Changes in ADAR expression levels during early human embryogenesis. Figure S4. Changes in exonic editing frequency Sense strand variant supporting reads Sense strand reference supporting reads Antisense strand variant supporting reads Antisense strand reference supporting reads are negatively associated with the gene expression of DCAF16, RPL23AP53 and SNHG16. Figure S5. Changes in editing sites in miRNA-targeted mRNA regions. Table S2. ADAR and ADARB1 expression in different tissues revealed by Illumina Body Map project. Table S6. exonic DNA filter efficiency of the 17 cells from 12 embryos in Xue's study. (DOC 775 kb) Additional file 2: Table S1. Candidate A-to-I RNA-editing sites. Table  S3. Editing levels of A-to-I RNA-editing events that may recode the proteins. Table S4. Editing levels of A-to-I RNA-editing events in miRNA-targeted mRNA regions. Table S5. Significant GSEA results of the genes that are preferably edited in miRNA-targeted regions in cells crossing oocyte to 4-cell stages. (XLSX 582 kb) Abbreviations GSEA: Gene set enrichment analysis; lincRNAs: Long intergenic noncoding RNAs; ncRNA: Noncoding RNA; RPKM: Reads Per Kilobase per Million mapped reads; XCI: X chromosome inactivation