Inference of subclonal heterogeneity from RNA-Seq data. The scheme shown in panel A demonstrates the way in which allelic imbalance along a single exon arises. The exon of interest (black lines) carries a germline polymorphism on one of the copies in position X0 (grey dot). In addition, out of three sub clones, one acquired a somatic point mutation in position X1, and two acquired somatic point mutations in position X2 (red dots). Assuming that both alleles are expressed with equal frequencies and that the expression level is the same across all subclones, the measured allelic imbalance is determined by the unequal distribution of somatic mutations, as shown in the table. Differences of allele-specific expression levels or of the subclonal expression levels affect the exon as a whole, but do not change the overall picture of distinct allelic imbalances at the X0, X1 and X2 loci. Allelic imbalance along AKIP1 (NM_020642) shown in panel B reflects subclonal heterogeneity. The score of the non-reference nucleotide has been calculated as 0.5 - | 0.5 - (#B/(#B+#A)) |, where #A is the number of reads matching the reference sequence, and #B is the number of reads with a mismatch at the given nucleotide (non-reference nucleotide). The AKIP1 exhibits multi-modal B-allele frequency at heterogeneous nucleotides. Vertical bars represent 95% confidence interval of the estimated proportion, assuming an underlying binomial distribution. All non-reference nucleotides shown are supported by at least 100 reads. Vertical dashed lines mark the boundaries between exons in the transcript.