Skip to main content

Table 2 Eukaryotic proteogenomics pipelines and Galaxy workflows

From: Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes

Name

Pipeline Interface

Database-driven for peptide identifi-cation

de novo peptide inter-pretation

User-friendly for biologists

Results curation

Results visuali-zation

Description

Revelance

Peptimapper (released in 2018)

Command line, Docker image, Galaxy tools

Peptide Sequence Tags (PSTs) obtained from partial interpretation of ion trap mass spectra are mapped onto the six-frame translation of genomic sequences giving hits. Hits are then clustered to detect potential coding regions. Clusters are evaluated and further compared to existing gene predictions. Clusters are available as GFF file to be uploaded into a genome viewer. https://galaxy.protim.eu https://hub.docker.com/r/dockerprotim/peptimapper/ or https://docker-ui.genouest.org/app/#/container/dockerprotim/peptimapper https://github.com/laeticlo/Ectoline

Improves genome annotation

IPAW (2018) [61]

Command line

This is an Integrated Proteomics Analysis Workflow: i) Peptide spectra are searched in two different databases in parallel: VarDB filtered by class-specific FDR for SAAV peptides and 6FT of the human genome filtered by peptides pI. ii) SAAV candidates are curated by SpectrumAI and potential novel proteins are blasted onto public databases. ii) Curated results are validated by different controls. https://github.com/yafeng/proteogenomics_python

Identification of Pseudogenes, lncRNAs, nsSNPs and somatic mutations

JUMPg (2016) [62]

Command line

This pipeline includes multiple customized databases construction, tag-based database search, peptide-spectrum match filtering, ans data visualization. https://github.com/gatechatl/JUMPg/

Improves genome annotation

PGMiner (2016) [63]

Command line

This workflow allows acquisition of mass spectrometric data, peptide identification against preprocessed sequence databases, assignment of statistical confidence to identified peptides, and mapping confident peptides to gene models. https://github.com/olalonde/pgtools

Improves genome annotation

PROTEO-FORMER (2015) [64]

Command line, Virtual machine, Galaxy tools

RIBO-seq NGS data are processed to delineates proteoforms. RIBO-seq-derived sequences are then translated and mapped to a public database, creating a custom search database for peptides to MS/MS matching.

Identification of novel translation products

PGTools (2015) [65]

Command line

The software is divided into 2 phases: Phase 1 contains 8 modules to analyse MS/MS data using known proteins databases. Phase 2 contains 5 modules and 7 customized databases that allow MS/MS data to be analysed against the genome. That software includes applications, libraries, customized databases and visualization tools.

Improves genome annotation

NextSearch (2015) [66]

Command line

Nucleotide EXon-graph Transcriptome Search identifies peptides by directly searching the nucleotide exon graph against tandem mass spectra. NextSearch outputs which are the proteome-genome/transcriptome mapping that can be visualized using public tools.

Improves genome annotation

ProteoAnnotator (2014) [52]

Command line, Stand alone application

MS spectrum are queried by one or several proteomics databases search engines (MASCOT, OMSSA, X!Tandem or MSGF+) and results are converted into GFF adding genome coordinates and statistical confidence values. It exports mzIdentML files.

http://www.proteoannotator.org

Improves genome annotation

Peppy (2013) [67]

Command line, Stand alone application

N/A

This workflow generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns FDR confidence values to those matches.

http://geneffects.com/peppy

Improves genome annotation

Protk (released in 2012)

Command line, Galaxy tools

It is a suite of tools for proteomics providing the following analysis tasks: (i) MS/MS data search with X!Tandem, Mascot, OMSSA and MS-GF+; (ii) peptide and protein inference with Peptide Prophet, iProphet and Protein Prophet; (iii) conversion of pepXML or protXML to tabular format, and (iv) mapping of peptides to genomic coordinates https://github.com/iracooke/protk

Improves genome annotation

IggyPep (2010) [54]

Web interface

N/A

The pipeline is based on a database system with advanced indexing and querying strategy, which holds the translated genome in all six reading frames. It can be queried with de novo sequences or partial peptide sequence tags (PSTs). It determines the ORF amino acid comprising these tags and compiles a fasta-formated sequence file for a database-driven search. www.iggypep.org (No more accessible)

Improves genome annotation

PepLine (2008) [18]

Command line

N/A

Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF mass spectra are mapped onto the six-frame translation of genomic sequences giving hits. Hits are then clustered to detect potential coding regions.

www.grenoble.prabi.fr/protehome/software/pepline (no more accessible)

Improves genome annotation

Workflows for Proteomics Informed by Transcriptomics (2015) [57]

Galaxy tools

Galaxy Integrated Omics (GIO) provides workflows for 4 common use cases: i) a standard search against a reference proteome; ii) PIT protein identification without a reference genome; iii) PIT protein identification using a genome guide; iiii) and PIT genome annotation. http://gio.sbcs.qmul.ac.uk

Improves genome annotation

Workflows for proteogenomics studies using Galaxy-P (2014–2018) [55, 56, 58, 59]

Galaxy tools

These modular workflows incorporating both established and customized software tools that improve depth and quality of proteogenomic results. http://galaxyp.org

Improves genome annotation

  1. Available Eukaryotic Proteogenomics pipelines are listed in https://omictools.com/proteogenomics-category. We only selected software types “pipeline/workflow” or “Toolkit/Suite” for comparison to our pipeline. Proteogenomics Galaxy workflows [49, 50] are added at the end of the table