General layout of the identification pipeline. The identification pipeline consists of different steps as outlined in the workflow. Central in the analysis is the MySQL sORF database where all obtained and calculated data is stored. This overall sORF data matrix can be downloaded via Additional file 1. (1) Genome-wide search for sORFs (with high coding potential) with the sORFfinder package. (2) Calculation of different peptide conservation measures based on the UCSC Mouse multiple alignments. (3) Coding capability assessment of the sORFs by means of a Support Vector Machine (SVM) learning algorithm. (4) Inspection of the sORF locations for presence of ribosome profiling signals obtained from mESC experiments. (5) Genome-wide visualization of all (experimental) data and all sORF information on our in-house developed H2G2 Genome Browser.