Changes in CodingQuarry prediction accuracy at various stages of prediction of
genes. The gene-level sensitivity and specificity is shown at various stages (See Figure 1 and Methods) within a CodingQuarry run. Results show comparisons with Sc. pombe where A) (left-hand panel) RNA-seq data strand information was used and B) (right-hand panel) strand information was ignored. Longest ORF is the initial training set, found by taking the longest open reading frame in each transcript to be a gene, stage 1 predictions are made from transcript sequences, stage 2 adds to and replaces some of stage 1 predictions by predicting from genome sequence. Filtering of likely false-positive genes (see Implementation section) takes place before a set of predicted genes is output as the “final output”. This output is the annotation generated by CodingQuarry.