We present a comprehensive characterization of genic short RNAs in the human kidney cell line HEK293. We found that only a limited number of genes give rise to short RNAs, predominantly endo-siRNAs that are associated with AGO1 and correlate with RNAPII occupancy at 5′ends of genes. This RNA signature does not scale with the transcription of the related gene and expression level of the mRNA. Moreover, the short RNA pattern is resistant to mutagenic insults indicating that the endo-siRNAs are the result of a controlled synthesis –or the byproduct of a controlled process.
Three possibilities how these endo-siRNAs are being produced are plausible and not mutually exclusive. First, RNAPII produces a variety of RNA by-products, related to pausing of the enzyme during initiation and termination of transcription or as a consequence of incomplete splicing . The most prominent of the short RNA species are denoted “promoter-associated small RNAs” (PASRs) and “transcription initiation RNAs (tiRNAs) . Their occurrence is linked to highly active promoters and does not appear to involve RNA interference since they are either capped (PASRs) or do not co-precipitate with Argonaute (tiRNAs). tiRNAs are hypothesized to be generated by the backtracking RNA polymerase and the action of an intrinsic endonuclease activity . The fact that the short RNAs identified in this screen show a very restricted expression pattern and do not scale with RNAPII occupancy genome-wide suggests that the RNA signature is not an obligatory by-product of transcription.
Second, the reads could represent degradation products from cellular mRNAs and reflect physiological mRNA turnover. Several observations, however, argue against this hypothesis. Only about 1-2% of genes produce short RNAs whereas roughly 40-50% of mRNAs generate positive calls in expression analysis. Moreover, careful expression analysis of 10 gene clusters, genes producing short RNAs and its neighbors (discussed below), showed no correlation between mRNA levels and short RNA reads. On the other hand, we identified genes where a very low number of randomly distributed reads could well be explained by RNA breakdown or sequencing errors, these were not studied in further detail. The reproducibility of the endo-siRNA signature after mutagenesis, however, suggests that such random errors do not contribute significantly to the sequencing output.
Third, the short RNAs identified in this screen could be synthesized by components of the RNAi machinery. Nuclear localization of key components such as Dicer, Drosha and Argonaute proteins and shuttling of related RNA protein complexes between the cytoplasm and the nucleus has been reported [35, 36]. Our results from the AGO1 CLASH experiments and also the length profile of the short RNAs suggest that the RNAseq reads are significantly constituted of endo-siRNAs. At what stage a double stranded RNA precursor is formed and dicing occurs, is yet unknown.
One particular focus of the presented work was to investigate a potential link between convergent transcription, i.e. the co-expression of sense and antisense transcripts, and the formation of endo-siRNAs. Genic endo-siRNAs have been documented in several systems, predominantly in C. elegans, Drosophila and also in mouse [18, 37–40]. However, in mammalian systems, the link between convergent transcription and RNA interference is controversial . Our results confirm the existence of endo-siRNAs in human cells and the fact that 53% of these derive from SA loci provides circumstantial evidence that sense/antisense transcription may be involved. Our observation, that genes on the X chromosome tend to avoid the possibility of convergent transcription, indicates that situations of convergent transcription may indeed exist in somatic, diploid cells.
There is a substantial body of evidence that siRNAs can promote both transcriptional silencing and activation through chromatin modifications [42, 43]. The siRNAs used in those experiments were synthesized in vitro and applied at high concentrations indicating that the process was inefficient. Our findings also suggests that the cell culture model used may not express balanced levels of all the components essential for the processing of NATs into endo-siRNAs and transcriptional silencing. For example, Hela cells transfected with vectors expressing both thymidylate synthase (TS) sense and antisense transcripts failed to generate TS related siRNAs . On the other hand, highly expressed convergent transcripts from a single plasmid were recently demonstrated to produce siRNAs which induced transcriptional gene silencing in trans. Our findings support the conclusion that the production of endo-siRNAs from NATs is inefficient in HEK293 cells and probably in other cell culture models as well. Indeed, the genome wide studies suggest that NATs-linked endo-siRNA processing is a highly cell-specific process, for example in developing sperm cells where antisense transcripts and endo-siRNAs are prominently found [6, 45].