Shoelaces: an interactive tool for ribosome profiling processing and visualization
BMC Genomics volume 19, Article number: 543 (2018)
The emergence of ribosome profiling to map actively translating ribosomes has laid the foundation for a diverse range of studies on translational regulation. The data obtained with different variations of this assay is typically manually processed, which has created a need for tools that would streamline and standardize processing steps.
We present Shoelaces, a toolkit for ribosome profiling experiments automating read selection and filtering to obtain genuine translating footprints. Based on periodicity, favoring enrichment over the coding regions, it determines the read lengths corresponding to bona fide ribosome protected fragments. The specific codon under translation (P-site) is determined by automatic offset calculations resulting in sub-codon resolution. Shoelaces provides both a user-friendly graphical interface for interactive visualisation in a genome browser-like fashion and a command line interface for integration into automated pipelines. We process 79 libraries and show that studies typically discard excessive amounts of quality data in their manual analysis pipelines.
Shoelaces streamlines ribosome profiling analysis offering automation of the processing, a range of interactive visualization features and export of the data into standard formats. Shoelaces stores all processing steps performed in an XML file that can be used by other groups to exactly reproduce the processing of a given study. We therefore anticipate that Shoelaces can aid researchers by automating what is typically performed manually and contribute to the overall reproducibility of studies. The tool is freely distributed as a Python package, with additional instructions, tutorial and demo datasets available at https://bitbucket.org/valenlab/shoelaces.
Ribosome profiling provides the first opportunity to monitor the behavior of translating ribosomes on a transcriptome-wide scale. Since its development , the technique has been widely adopted and inspired a diverse range of studies on translational regulation. While the assay itself has been partially standardized, the processing of data has not. A significant bottleneck is that of reproducibility and interpretation. In particular, most studies rely on manual selection of read lengths and manual P-site determination. The choices made are highly variable between studies, biasing the sub-codon resolution or discarding excessive amounts of data, which makes it challenging to compare results in the literature.
The consistent processing of such data necessitates that two major challenges are met: (1) separating signal from noise, i.e. distinguishing footprints of translating ribosomes from reads originating from other processes and (2) determining the specific codon being translated by the ribosome which the read fragment originates from (a P-site offset). While some software tools have been developed for analyzing ribosome profiling data (for an overview see ), few address these challenges directly. Instead, tools typically rely on manual selection of read lengths and offsets [3, 4] or perform selection as part of an integrated pipeline for open reading frame prediction, with no option to export ribosome coverage after processing .
Here, we introduce Shoelaces, a software tool for processing ribosome profiling data. Shoelaces addresses the processing challenges by (1) utilizing a property of phasing, a strong 3-nucleotide periodicity of the reads stemming from coding regions [1, 6, 7] to filter genuine translating footprints and (2) calibrating P-site offsets based on metagene profiles over start or stop codons, stratified by footprint length [1, 8]. Shoelaces automatically selects these lengths and offsets, as well as offers batch-mode for processing multiple libraries in bulk.
The tool can be run in two modes: either using a graphical or command line interface. The graphical interface is accessible to users of all levels and guides the user through each processing step, allows for interactive adjustments and offers a range of extra visualization features on both gene/transcript or global level. The command line interface offers the same functionality as the graphical interface, without the interactivity, and can be easily integrated into automated processing pipelines.
Shoelaces is implemented in Python3 and designed to run on Linux and MacOS operating systems. It relies on OpenGL for rendering graphics and PyQt5 for cross-platform graphical user interface. GUI is composed of a set of windows that user can easily rearrange by drag-and-drop to customize layout. The plots are interactive making the processing easily adjustable to specific needs. While primarily designed for the visualization features, Shoelaces can be also run in command line, making it easy to incorporate into processing pipelines. Shoelaces operates on common genomic formats (BAM, GTF, BED, wiggle), and stores settings in XML files, for maximum ease of use and reproducibility of analyses.
Results and discussion
Data processing workflow
The workflow of Shoelaces is shown in Fig. 1. Shoelaces accepts standard genomic formats requiring alignment of ribosome profiling reads (BAM) and corresponding gene/transcript annotations (GTF or BED). Shoelaces then guides the user through three main steps: (1) read filtering, (2) footprint identification and (3) P-site determination.
In the initial step Shoelaces filters reads from noise regions. Here, users can optionally include an additional annotation file with regions (such as e.g. ribosomal RNA or repetitive elements) which will be masked from all further analyses. Specific genes can also be deselected during this step if certain outliers are undesired.
In the following step, Shoelaces automatically determines the correct footprint lengths. This is based on the intrinsic 3-nucleotide periodicity characteristic of ribosome-derived fragments as opposed to reads originating from other processes . The periodicity is detected using discrete Fourier transform (see below) over the coding regions (CDS) of annotated genes. Lengths displaying periodicity are selected for further analysis. The rest is classified as noise but is available for further analysis by the user.
Finally, for each footprint Shoelaces determines the codon that is actively translated. A length dependent P-site offset is calibrated using change point analysis (see below) over the distribution of footprints surrounding start and stop codons of annotated genes. Based on this, Shoelaces will automatically suggest offsets and provide plots of the summed footprints over start and stop codons of all genes. In addition, ribosome footprints are known to map preferentially to the first nucleotide in the codon  and Shoelaces therefore displays the fraction of reads falling into each reading frame. Manual adjustment is also possible if deemed necessary by the user.
After confirming the selection of the suggested footprint lengths and offsets, the user can export the ribosome coverage into flat file format (wiggle) for further downstream analysis, either in genomic or transcriptomic coordinates. Optionally, different footprint lengths can be exported into separate files. Separation by length can be useful for more specific analysis, such as e.g. detection of conformational changes of ribosome at certain positions [9, 10].
To aid the researcher, the GUI produces summary statistics and counts for individual genes and transcripts, as well as for the whole library. It provides an overview over how many reads of a given length fall into different genomic regions (CDSs, 5‚ leaders, 3‚ trailers and introns) as well as how many footprints are found over non-coding transcripts or mapping to multiple positions in the genome. Users can update the statistics after read length and offset selection to see how they change. Together, these give an indication of the quality of the library and how well the reads represent genuine ribosome protected fragments.
Additionally, Shoelaces can produce expression tables for ribosome profiling data normalized to reads per kilobase of exon per million mapped reads (RPKM). Optionally, if additional RNA-seq data is loaded, Shoelaces calculates translational efficiency per gene as well.
Automatic selection of read lengths and offsets
An ideal-case scenario is presented in Fig. 2: the given footprint length is periodic (Fig. 2d), the metagene profiles have distinct peaks over start and before stop codons (Fig. 2a, b) and reads preferentially map to the first reading frame (Fig. 2c). However, library-specific biases can result in varying distributions of coding footprint lengths, as well as varying offsets (for various examples see Additional file 1: Figures S1-S3). To take these biases into account, as well as to make processing large amounts of ribosome profiling data easy for the user, Shoelaces automatically suggests read lengths and offsets to be used.
Selection of periodic lengths
For each fragment length, the 5‚ ends of footprints mapping to the first 150 nucleotides of CDSs (by default from top 10% of protein-coding genes with highest coverage) are summed together. As the reads map preferentially to the first nucleotide of every codon, the periodic pattern will be conserved. The resulting vector is subject to discrete Fourier transform, and the fragment lengths whose highest amplitude corresponds to a period of 3 are considered to be periodic.
For each fragment length, the distribution of 5‚ ends of footprints surrounding start and stop codons (-30/+10 nucleotides) of protein-coding genes is calculated. The resulting window is subject to change point analysis, where for each adjacent position we calculate the difference of means. The maximum shift in means is assumed to correspond to the 5‚ end of the footprints of initiating ribosomes and the distance from these to the P-site becomes the offset for that fragment length.
Stratification per footprint length covers all different assignment strategies [1, 8], as the effective position of the P-site will be the same, whether calibrated from 5‚ end or 3‚ end of the footprint of a given length (see Additional file 1: Figure S5). This accounts for biases in different ribosome profiling libraries, which can have uniform offsets from 5‚ ends of reads (Additional file 1: Figure S2), or changing in increments of one nucleotide with increasing footprint length from 5’ends, thus having uniform 3‚ end offsets (with minor variations, Additional file 1: Figures S1 and S3). Shoelaces offers calibration over both start and stop codons, accounting for libraries where there is no clear peak defined over either end (shorter footprint lengths in Additional file 1: Figures S2 and S3).
Shoelaces also allows for visual inspection of coverage over individual genes (or group of genes) of interest. Users can manually zoom in/out to adjust the view, inspect the summary statistics with and without using offsets, and export high quality figures and tracks for further analysis and visualization.
For processing multiple libraries in bulk, a batch mode is available. For instance, for a number of same-batch libraries, one can be inspected visually, processing steps stored in an XML file and applied to the others. This additionally makes the processing easily reproducible later on. The processing can also be performed and fully automated from the command line allowing Shoelaces to be a part of a more comprehensive pipeline.
Analysis of human ribosome profiling data
We analyzed 79 libraries of human ribosome profiling data from 12 studies [11–22] and compared our read selection to the original, where applicable. Shoelaces retains up to 32% more data mapping to the coding regions of the genome: CDSs and 5‚ leaders (see Additional file 1: Table S1) than when originally processed, simultaneously decreasing the relative frequency of non-translating footprints, such as those that map primarily to 3‚ trailers, suggesting that they might originate from e.g. mRNA-binding proteins, abundant in 3‚ trailers, secondary structure or other sources of noise (see Additional file 1: Figure S4).
Shoelaces aims for an intuitive and streamlined processing of libraries from different studies and treatments, making them comparable and analysis easily reproducible. The precision in bringing the data to sub-codon resolution is especially important in studies on translational efficiency of different codons [23, 24], but also allows for detection of translational events such as ribosomal pausing , stop codon readthrough  or frameshifting . The automation and batch processing facilitate dealing with large amounts of data, while visualization features add to user-friendliness and allow for more specific analyses. As we demonstrate on human ribosome profiling data, Shoelaces retains more reads mapping to coding regions than arbitrary manual processing. Overall, Shoelaces is a comprehensive tool for ribosome profiling data processing, and should prove useful to anyone interested in small or large-scale studies on ribosome profiling.
Availability and requirements
Project name: Shoelaces
Project home page: https://bitbucket.org/valenlab/shoelaces
Operating systems: Linux and MacOS
Programming language: Python3
Other requirements: Python3 packages: pysam, numpy, pyqt5, pyopengl
License: MIT license
Coding sequence, the coding part of a messenger RNA
Reads per kilobase of exon per million mapped reads
Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009; 324(5924):218–23.
Wang H, Wang Y, Xie Z. Computational resources for ribosome profiling: from database to web server and software. Brief Bioinform. 2017. https://doi.org/10.1093/bib/bbx093.
Dunn JG, Weissman JS. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data. BMC Genomics. 2016; 17(1):958.
Popa A, Lebrigand K, Paquet A, Nottet N, Robbe-Sermesant K, Waldmann R, Barbry P. Riboprofiling: a bioconductor package for standard ribo-seq pipeline processing. F1000Res. 2016; 5:1309.
Malone B, Atanassov I, Aeschimann F, Li X, Grosshans H, Dieterich C. Bayesian prediction of rna translation from ribosome profiling. Nucleic Acids Res. 2017; 45(6):2960–72.
Michel AM, Choudhury KR, Firth AE, Ingolia NT, Atkins JF, Baranov PV. Observation of dually decoded regions of the human genome using ribosome profiling data. Genome Res. 2012; 22(11):2219–29.
Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B, Fleming ES, Vejnar CE, Lee MT, Rajewsky N, Walther TC, Giraldez AJ. Identification of small orfs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014; 33(9):981–93.
Woolstenhulme CJ, Guydosh NR, Green R, Buskirk AR. High-precision analysis of translational pausing by ribosome profiling in bacteria lacking efp. Cell Rep. 2015; 11(1):13–21.
Giess A, Jonckheere V, Ndah E, Chyzynska K, Van Damme P, Valen E. Ribosome signatures aid bacterial translation initiation site identification. BMC Biol. 2017; 15(1):76.
Lareau LF, Hite DH, Hogan GJ, Brown PO. Distinct stages of the translation elongation cycle revealed by sequencing ribosome-protected mrna fragments. Elife. 2014; 3:01257. NLM: Original DateCompleted: 20140612.
Andreev DE, O’Connor PBF, Fahey C, Kenny EM, Terenin IM, Dmitriev SE, Cormican P, Morris DW, Shatsky IN, Baranov PV. Translation of 5’ leaders is pervasive in genes resistant to eif2 repression. Elife. 2015; 4:03971.
Gonzalez C, Sims JS, Hornstein N, Mela A, Garcia F, Lei L, Gass DA, Amendolara B, Bruce JN, Canoll P, Sims PA. Ribosome profiling reveals a cell-type-specific translational landscape in brain tumors. J Neurosci. 2014; 34(33):10924–36.
Guo H, Ingolia NT, Weissman JS, Bartel DP. Mammalian micrornas predominantly act to decrease target mrna levels. Nature. 2010; 466(7308):835–40.
Hsieh AC, Liu Y, Edlind MP, Ingolia NT, Janes MR, Sher A, Shi EY, Stumpf CR, Christensen C, Bonham MJ, Wang S, Ren P, Martin M, Jessen K, Feldman ME, Weissman JS, Shokat KM, Rommel C, Ruggero D. The translational landscape of mtor signalling steers cancer initiation and metastasis. Nature. 2012; 485(7396):55–61.
Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mrna fragments. Nat Protoc. 2012; 7(8):1534–50.
Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJS, Jackson SE, Wills MR, Weissman JS. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 2014; 8(5):1365–79.
Lee S, Liu B, Lee S, Huang S-X, Shen B, Qian S-B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc Natl Acad Sci U S A. 2012; 109(37):2424–32.
Liu B, Han Y, Qian S-B. Cotranslational response to proteotoxic stress by elongation pausing of ribosomes. Mol Cell. 2013; 49(3):453–63.
Sidrauski C, McGeachy AM, Ingolia NT, Walter P. The small molecule isrib reverses the effects of eif2alpha phosphorylation on translation and stress granule assembly. Elife. 2015; 4:1–16.
Stern-Ginossar N, Weisburd B, Michalski A, Le VTK, Hein MY, Huang S-X, Ma M, Shen B, Qian S-B, Hengel H, Mann M, Ingolia NT, Weissman JS. Decoding human cytomegalovirus. Science. 2012; 338(6110):1088–93.
Stumpf CR, Moreno MV, Olshen AB, Taylor BS, Ruggero D. The translational landscape of the mammalian cell cycle. Mol Cell. 2013; 52(4):574–82.
Subtelny AO, Eichhorn SW, Chen GR, Sive H, Bartel DP. Poly(a)-tail profiling reveals an embryonic switch in translational control. Nature. 2014; 508(7494):66–71.
Dana A, Tuller T. The effect of trna levels on decoding times of mrna codons. Nucleic Acids Res. 2014; 42(14):9171–81.
Dana A, Tuller T. Mean of the typical decoding rates: A new translation efficiency index based on the analysis of ribosome profiling data. G3. 2015; 5(1):73–80.
Li G-W, Oh E, Weissman JS. The anti-shine-dalgarno sequence drives translational pausing and codon choice in bacteria. Nature. 2012; 484(7395):538–41.
We would like to acknowledge Guo-Liang “Chewie” Chew for his suggestion of the name Shoelaces.
This work was supported by the Bergen Research Foundation and the Norwegian Research Council (#250049). The funding bodies did not have any role in design or execution of the study.
Availability of data and materials
The datasets analyzed in the current study are available in the Sequence Read Archive with accession numbers SRP038695 , SRP031501 , SRP002605 , SRP010679 , SRP012648 , SRP045257 , SRP014629 , SRP017263 , SRP053402 , SRP016143 , SRP029589 , SRP033369 . The demo dataset is available together with the pipeline at https://bitbucket.org/valenlab/shoelaces.
Ethics approval and consent to participate
Not applicable. This is a tool building on previously published, public data.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Analysis examples. Figures S1-S3. Three different examples of offset selection (PDF file) for human ribosome profiling datasets: SRR493747 , treated with harringtonine and cyclohexamide; SRR1039861 , treated with cyclohexamide; SRR592961 , no drug. Table S1: Comparison of selected footprint lengths as originally in human ribosome profiling studies and Shoelaces. Figure S4: Comparison of reads mapping to different parts of transcript as selected by Shoelaces and the original manual selection (SRR493747 ). (PDF 8213 kb)
About this article
Cite this article
Birkeland, Å., ChyŻyńska, K. & Valen, E. Shoelaces: an interactive tool for ribosome profiling processing and visualization. BMC Genomics 19, 543 (2018). https://doi.org/10.1186/s12864-018-4912-6