Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: Common and phylogenetically widespread coding for peptides by bacterial small RNAs

Fig. 2

sRNA ORFs have features characteristic of coding ORFs. a The three features useful for separating coding from non-coding sequences, as illustrated using a B. subtilis sRNA that was annotated by tiling array: (1) The Shine-Dalgarno free energy (SD score) measures the free energy of pairing to the 16S ribosomal RNA sequence, which enhances translation to a variable degree depending on the species. (2) The D n /D s test measures whether there is significant conservation on the amino acid level relative to the DNA level by measuring the rate of non-synonymous and synonymous mutations. For three selected orthologs, synonymous mutations are highlighted in green, non-synonymous mutations are in red, and start and stop codons are in bold. (3) The composition bias measures phase-specific nucleotide, dinucleotide, and trinucleotide occurrences, learning the difference between coding and noncoding sequence using a logistic regression on training data. The contribution of each codon to the logistic regression score is plotted with bars, with the cumulative score as a black line. b An example of using a feature to predict coding ORFs in B. subtilis. Annotated coding ORFs (orange) have Shine-Dalgarno free energies greater than that expected by chance (green, with 95% confidence limits in gray). Actual sRNA ORFs follow the same distribution, except for an excess of ORFs with free energies more than about -10 kJ/mol (light blue region). c The number of ORFs predicted as coding based individual features. For each feature, the cutoff with the maximum difference between sRNA ORFs and the background expectation was selected. After correcting empirically for this degree of freedom in selecting the cutoff (see Methods), the number of ORFs predicted coding above background is plotted with 95% confidence intervals. d Different features sometimes implicate the same ORFs as coding. B. subtilis sRNA ORFs were separated into those with SD score stronger than -11 kJ/mol and those with weaker SD score (others). Those with strong SD score (n=27) also had higher D n /D s log likelihoods (left panel) but only marginally composition bias scores (right panel)

Back to article page