Skip to main content
Fig. 3 | BMC Genomics

Fig. 3

From: Common and phylogenetically widespread coding for peptides by bacterial small RNAs

Fig. 3

a Visualization of a machine-learning classifier for combining features into a single predictive score. A bagged decision tree classifier was trained on B. subtilis ORF subsets and mock ORFs, and its output is plotted for each value of SD score and composition bias score (above) or D n /D s (below). For each position, the hidden third feature is subsampled and the classifier output is averaged over these possibilities. b Number of sRNA ORFs and mock ORFs classified as coding as a function of the coding score threshold in B. subtilis. Gray band represents 95% confidence intervals based on 20 mock ORF sets. c Number of ORFs predicted as coding above background expectation for B. subtilis. For each coding score threshold, a false-discovery rate q-value is calculated using the ratio between the sRNA ORFs and mock ORFs plotted in (b). The difference between these two, i.e. the number of ORFs predicted coding above background, is plotted on the y-axis in red, with 95% confidence intervals plotted in gray. The cutoff with the highest sensitivity is marked (dashed black line). The calculation is also made separately for subsets of ORFs having no overlaps with annotated coding ORFs, or those with sense or antisense overlaps. d Left: Estimated number of ORFs under selection for coding for all species. The numbers predicted for each species were calculated as illustrated in (c) at the most sensitive cutoff and a correction was applied for random fluctuations. Error bars represent 95% confidence intervals on the total estimate, and the breakdown by overlap with annotated coding ORFs is represented by colors. Right: The predicted coding ORFs are sampled to estimate the fraction of sRNAs having at least one coding ORF, and error bars represent 95% confidence intervals. Inset: The fraction of sRNAs having a predicted coding ORF for each species in box plot form. Box represents first and third quartiles and median; whiskers extend to most extreme values

Back to article page