Skip to main content

Table 2 Eukaryotic proteogenomics pipelines and Galaxy workflows

From: Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes

Name Pipeline Interface Database-driven for peptide identifi-cation de novo peptide inter-pretation User-friendly for biologists Results curation Results visuali-zation Description Revelance
Peptimapper (released in 2018) Command line, Docker image, Galaxy tools Peptide Sequence Tags (PSTs) obtained from partial interpretation of ion trap mass spectra are mapped onto the six-frame translation of genomic sequences giving hits. Hits are then clustered to detect potential coding regions. Clusters are evaluated and further compared to existing gene predictions. Clusters are available as GFF file to be uploaded into a genome viewer. or Improves genome annotation
IPAW (2018) [61] Command line This is an Integrated Proteomics Analysis Workflow: i) Peptide spectra are searched in two different databases in parallel: VarDB filtered by class-specific FDR for SAAV peptides and 6FT of the human genome filtered by peptides pI. ii) SAAV candidates are curated by SpectrumAI and potential novel proteins are blasted onto public databases. ii) Curated results are validated by different controls. Identification of Pseudogenes, lncRNAs, nsSNPs and somatic mutations
JUMPg (2016) [62] Command line This pipeline includes multiple customized databases construction, tag-based database search, peptide-spectrum match filtering, ans data visualization. Improves genome annotation
PGMiner (2016) [63] Command line This workflow allows acquisition of mass spectrometric data, peptide identification against preprocessed sequence databases, assignment of statistical confidence to identified peptides, and mapping confident peptides to gene models. Improves genome annotation
PROTEO-FORMER (2015) [64] Command line, Virtual machine, Galaxy tools RIBO-seq NGS data are processed to delineates proteoforms. RIBO-seq-derived sequences are then translated and mapped to a public database, creating a custom search database for peptides to MS/MS matching. Identification of novel translation products
PGTools (2015) [65] Command line The software is divided into 2 phases: Phase 1 contains 8 modules to analyse MS/MS data using known proteins databases. Phase 2 contains 5 modules and 7 customized databases that allow MS/MS data to be analysed against the genome. That software includes applications, libraries, customized databases and visualization tools. Improves genome annotation
NextSearch (2015) [66] Command line Nucleotide EXon-graph Transcriptome Search identifies peptides by directly searching the nucleotide exon graph against tandem mass spectra. NextSearch outputs which are the proteome-genome/transcriptome mapping that can be visualized using public tools. Improves genome annotation
ProteoAnnotator (2014) [52] Command line, Stand alone application MS spectrum are queried by one or several proteomics databases search engines (MASCOT, OMSSA, X!Tandem or MSGF+) and results are converted into GFF adding genome coordinates and statistical confidence values. It exports mzIdentML files.
Improves genome annotation
Peppy (2013) [67] Command line, Stand alone application N/A This workflow generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns FDR confidence values to those matches.
Improves genome annotation
Protk (released in 2012) Command line, Galaxy tools It is a suite of tools for proteomics providing the following analysis tasks: (i) MS/MS data search with X!Tandem, Mascot, OMSSA and MS-GF+; (ii) peptide and protein inference with Peptide Prophet, iProphet and Protein Prophet; (iii) conversion of pepXML or protXML to tabular format, and (iv) mapping of peptides to genomic coordinates Improves genome annotation
IggyPep (2010) [54] Web interface N/A The pipeline is based on a database system with advanced indexing and querying strategy, which holds the translated genome in all six reading frames. It can be queried with de novo sequences or partial peptide sequence tags (PSTs). It determines the ORF amino acid comprising these tags and compiles a fasta-formated sequence file for a database-driven search. (No more accessible) Improves genome annotation
PepLine (2008) [18] Command line N/A Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF mass spectra are mapped onto the six-frame translation of genomic sequences giving hits. Hits are then clustered to detect potential coding regions. (no more accessible)
Improves genome annotation
Workflows for Proteomics Informed by Transcriptomics (2015) [57] Galaxy tools Galaxy Integrated Omics (GIO) provides workflows for 4 common use cases: i) a standard search against a reference proteome; ii) PIT protein identification without a reference genome; iii) PIT protein identification using a genome guide; iiii) and PIT genome annotation. Improves genome annotation
Workflows for proteogenomics studies using Galaxy-P (2014–2018) [55, 56, 58, 59] Galaxy tools These modular workflows incorporating both established and customized software tools that improve depth and quality of proteogenomic results. Improves genome annotation
  1. Available Eukaryotic Proteogenomics pipelines are listed in We only selected software types “pipeline/workflow” or “Toolkit/Suite” for comparison to our pipeline. Proteogenomics Galaxy workflows [49, 50] are added at the end of the table