When designing a gene expression experiment with the goal of measuring steady-state levels of mRNA, care should be taken to isolate RNA from the correct cellular compartment. Currently, the majority of RNA-Seq experiments sequence mature transcripts (via poly-A tail enrichment) in the total RNA fraction, which also contain mature mRNA species to some degree . Removing the ~5–10 times more complex nuclear RNA  could reduce the overall complexity and enable deeper sampling of the remaining mRNA population and thus increase sensitivity. However, isolating the cytoplasmic RNA instead of total RNA is feasible when working with cell cultures, but for many other biological models are total RNA the only choice.
Despite the proposed advantage of sequencing only cytoplasmic RNA for cells in suspension, it is still not clear whether the cytoplasmic fraction represents the full complexity of the steady-state RNA of whole cells. One argument against using cytoplasmic RNA could be that the translation levels of certain transcripts might be regulated by their transportation rate from nucleus to cytoplasm [6, 7]. Moreover, the transportation rate of transcripts from nucleus to cytoplasm could depend on particular properties of the transcript such as length or sequence.
Here, we investigated how the representations of transcripts differ between the cytoplasmic and total RNA fractions. There were 405, 1072, and 1380 transcripts in U-2 OS, U-251MG, and A-431 that were detected at higher levels in total RNA than in cytoplasmic RNA. This indicates that a significant proportion of the mature transcripts were retained in the nucleus, which then contributed to higher detection levels in the total RNA fraction since the cytoplasmic RNA lacked the mature transcripts from the nucleus. UTR fold energies can influence post-transcriptional regulation and it has been shown that UTR fold energies of mRNA transcripts are lower than those of random sequences of the same length with the same mononucleotide frequency [20, 21]. Interestingly, most of the genes detected at higher level in total RNA had long and structured 5’ and 3’ UTR sequences as well as longer coding sequences, in all cell lines. Furthermore, it may cause an improper estimation of the RNA levels of these transcripts in the cytoplasmic fraction. Similarly, shorter genes or genes with shorter UTRs were overestimated in the cytoplasmic fraction. This mis-estimation could introduce biases and should be considered in the analysis of transcriptome.
Hence, our data indicates that the transportation rate of transcripts from nucleus to cytoplasm depends on the sequence features of transcripts. Selective degradation of transcripts by for example the exosome complex and the half-life of transcripts cannot be ruled out as contributing factors. The results from the comparison of microRNA targets per gene for all three cell lines show that there is a higher number of microRNA targets per gene for genes detected differentially higher in the total RNA fraction compared to the cytoplasmic RNA fraction. This could indicate that these genes are subject to degradation to a higher degree when entering the cytoplasm. However this does not explain the higher number of genes with structured 5’ UTR sequences as well as longer coding sequences in the total RNA fraction. Therefore, we propose that both nuclear retention and cytoplasmic RNA degradation via miRNAs are the main contributors to the differential detection of genes.
There were 512, 1203, and 1334 genes for U-2 OS, U-251MG, and A-431, respectively, that were detected at higher levels in cytoplasmic RNA than in total RNA. There is no obvious biological reason for this. However, a technical explanation can be suggested: owing to the lower representation of longer transcripts in the cytoplasmic fraction, there was relatively more sequencing space. This could have allowed for better coverage of shorter transcripts in cytoplasmic RNA than in total RNA. Indeed, most of the genes detected at higher levels in the cytoplasmic fraction had shorter coding lengths. However, not all the differentially detected genes were the same for all cell lines. This supports the fact that there are also cell-specific factors that affect the nuclear retention of transcripts, apart from transcript sequence and structure .
Our results have shown that the total and cytoplasmic fractions yield different representations of steady-state RNA levels. It can be argued that cytoplasmic polyadenylated RNA might correlate better with protein abundance levels if one assumes that the contribution of polyadenylated nuclear RNA to the steady-state mRNA levels in cytoplasm were not negligible. However, a previous study of mouse fibroblasts investigated mRNA and protein levels in relation to half-lives, transcription rates, and translational control and found that mRNA only explained around 40% of the variability in protein levels . Our data show that cytoplasmic and total RNA correlated very similarly to protein abundance levels in all cell lines, and the correlation level is similar to what have previously been published . This indicates that the neither nucleus-to-cytoplasm transportation rate nor the miRNA mediated degradation of transcripts affect protein abundance at a global level. However, future studies with synchronized cells and different time points would shed some more light upon the correlation between the RNA and protein population in a cell. Furthermore, including all transcripts and not only polyadenylated RNA would give amore complete overview of the RNA population in a given cell type.