Pseudogenes are “fossil” copies of functional genes that have lost their potential as DNA templates for functional products [1–6]. While the definition of pseudogenes is still somewhat fuzzy, most of them are defined operationally by bioinformatics criteria, e.g., genomic scans of signatures of homology to known genes. Ribosomal protein (RP) pseudogenes represent the largest class of pseudogenes found in the human genome: over 2000 ribosomal protein pseudogenes are identified by bioinformatics scan of genomic sequence .
These pseudogenes are commonly thought to be non-functional due to the lack of promoters and/or the presence of loss of function mutations. Indeed, the vast majority of these pseudogenes either carry dysfunctional mutations such as in-frame stop codons, or lack of proper regulatory sequences, such as promoters, mTOP signals, and first introns . Interestingly, three RP pseudogenes, with 89%-95% sequence identity to their parent (progenitor) RP genes, were found to be transcribed and seem to be functional, by a bioinformatics scan of cDNA and expression sequence tag (EST) databases and confirmation by PCR and Northern blot . A genome-wide bioinformatics scan identified over 2000 potential pseudogenes . Moreover, it was found  that the six RP pseudogenes shared at syntenic loci between the human and the mouse genomes are more conserved than other RP pseudogenes.
However, data were lacking to experimentally validate pseudogene expression. It is unclear from the literature whether the reported cases are merely anecdotal or that pseudogenes do play some cellular roles. This is largely hindered by the lack of methods for the identification of pseudogenes transcription. The traditional method of transcriptome profiling, gene expression microarray, is not sensitive in distinguishing transcripts among very similar gene sequences.
Recent advancements of next-generation sequencing allow for direct massive transcriptome sequencing (RNA-seq), and thus providing unprecedented insights into all transcribed sequences. For example, RNA-seq has been applied to detect complex transcriptional activities such as alternative splicing [10, 11] and allelic-specific expression . Recently, RNA-seq has been applied to reveal RNA editing events . However, to the best of our knowledge, there were yet no attempts to detect the transcription of pseudogenes in RNA-seq data. The main challenge for pseudogene identification in RNA-seq data is the difficulty of high fidelity read mapping. Because sequences of pseudogenes are highly similar to the sequences of the mRNAs of the parent genes, specialized read mapping methods are required to detect reads unambiguously generated from pseudogenes.
In this study, we conduct a bioinformatics analysis of pseudogene expression using RNA-sequencing data of 16 human tissues of the Illumina Human Body Map 2.0 project. We first describe our new computational pipeline for detecting pseudogene expression that disentangles sequencing reads of pseudogenes from those of the parent genes, with consideration of possible mismatches due to SNPs and RNA-editing. This is followed by a description of our findings and a discussion of their implications.