Skip to main content
Figure 1 | BMC Genomics

Figure 1

From: Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and ceRNA potential

Figure 1

Reliable quantification of pseudogene expression. (A) Example showing that even an ideal aligner may produce uniquely misaligned reads in the presence of mutations and read errors if alignments to unmappable regions are considered trustworthy. In this example, the gene and pseudogene differ in one nucleotide so the regions are not identical. Now the gene in the subject genome being sequenced has undergone mutation so it differs from the reference genome in 3 positions. RNA-seq produces reads from this gene reflecting the mutations in the subject genome. If the reads are then mapped back to the genome allowing 2 mismatches, they map only to a pseudogene of the gene that produced the reads. The problem arises because the sequences of the gene and pseudogene are sufficiently similar that unique misalignment cannot be ruled out. (B) If a read has at least two alignments that are at distance δ1 and δ2 from the reference genome, respectively, then the true position of the read should be considered ambiguous unless |δ12| > ε for some integer safety margin ε > 0. (C) Pipeline for computing RPKUM expression levels for pseudogenes. (D) “Synthetic regions” around splice junctions are used to extend mappability to the transcriptome. A synthetic region is constructed by concatenating k–1 nucleotides from the donor and acceptor exons on either side of a splice junction. Any k-mer that crosses the splice junction thus occurs in the synthetic region.

Back to article page