Overview of the in silico coding sORF prediction. (A) Histogram of the total number of sORFs depicted by ORF length (in AA). (B) Distribution of sORFs according to their genomic location. sORfs overlapping more than one different category are grouped as “others”. (C) Evaluation of the sORF coding probability. The fractions of annotated and predicted coding and non-coding sORFs within the test dataset are plotted. (D) Visual representation of the classification of all 9,612 test subjects, based upon both SVMs (SVMlight and libSVM). True coding sORFs are depicted in green and true non-coding in red (see Additional file 1: Figure S2).