Transcript abundance distribution predicted from sampled reads. Figure 4: (a) Each point on a curve (solid-line) corresponds to the number of transcripts (Y-axis) that had a specific normalized count (X-axis). Note that the number of transcripts small counts relative to coverage drops sharply. For example, the curve for 100K drops at log(X) ≤ 5. This undersampling is corrected by the dotted lines. This correction enables the computation of (b), the probability of detecting an arbitrary transcript. The solid lines correspond to predictions made of the empirical (or simulated empirical) distribution. The dotted lines correspond to corrected values from regression (see METHODS). Note the high fit that is obtained after correction, with only 100,000 reads.