VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data
© Peterson et al; licensee BioMed Central Ltd. 2012
Received: 27 December 2011
Accepted: 5 April 2012
Published: 5 April 2012
The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates.
VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.
VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
High throughput (HTP) molecular technologies are at the core of new capabilities to derive genomic-level profiles oforganisms [1, 2]. One challenge often not addressed in the context of HTP technologies is the relationship of the analyses to the defined structural annotation of the genome. For example, the accuracy of global bottom-up proteomics is directly dependent upon accurately defined open reading frames (ORFs), because spectra are matched directly to an in silico enzymatic digest of the predicted proteins. Although a well-annotated genome is typically needed to analyze HTP data, it is also true that HTP data can contribute to genome annotation. Specifically, both next-generation sequencing transcriptomic data (RNA-Seq) and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators [3–9] to locate features such as missed genes and intron/exon borders. While the procedural aspects of genome sequencing and assembly have become relatively inexpensive, the full and accurate annotation of these genomes, and integration of HTP data types to improve structural genome annotation is not straightforward, still very labor-intensive, and few computational tools have been developed to address this issue.
The development of RNA-Seq has been a major leap forward for transcriptomics, providing data to identify differentially expressed genes, as well as improve structural gene annotation. Common tools to process RNA-Seq data, such as IGV , SAMtools , Tablet , and Bambino , focus on aligning individual reads with the genome, because the number of reads aligned with particular genes can be used as a metric to quantify differential gene expression within the context of an experiment. Although most RNA-Seq experiments are focused on differential expression, the expression pattern in the context of the genome can yield information about transcriptional units, such as operons, and annotation errors, such as missed genes. Similar observations can also be made from other transcriptomics platforms, such as tiled arrays. However, visualization and analysis of these data in genomic context, in order to enhance the annotations or make inferences about mis-annotations, remains a challenge.
While transcriptomics data can give valuable insight into genome annotation, transcription does not necessarily mean translation into protein. Mass spectrometry-based proteomics can fill this gap through global identification of proteins expressed in a sample. However, similar to transcriptomics, proteomics usually focuses on comparative studies to identify differentially expressed proteins. In particular, in tandem mass spectrometry (MS/MS), spectra from proteolytic peptides are matched to theoretical spectra derived from candidate peptides from a defined genome annotation. In this traditional manner, only peptides from an annotated gene will be identified. However, in theory, proteomics data includes spectra from any gene translated into protein. Thus, an alternate strategy is to match spectra against peptide candidates from any potential open reading frame between two stop codons in any of the six frames of the DNA - proteogenomics. Proteogenomics experiments have successfully corrected gene locations (start sites), located novel genes, and identified additional various mis-annotations, such as frameshifts [7, 14]; however, because mass spectrometry-based proteogenomics analyses require investigation of large numbers of potential peptides relative to the standard analysis, parsing and visualizing this data is challenging. Current software tools for proteomics data primarily focus on the processes of peptide identification, quantification and statistical comparison [15–18], whereas for proteogenomics, prokaryotic genome browser tools such as ARTEMIS [19, 20] or Gbrowse [21, 22] have been used due to their ability to compare different gene annotation models. To use these genome browsing tools for proteomics requires significant data formatting on the side of the user, because peptide identifications must be put into a standard format, such as a general feature format (GFF). Furthermore, there is no simple way to search for locations of interest in the genome, such as peptides located outside the defined gene annotations.
We present a novel software platform for Visual Exploration and Statistics to Promote Annotation (VESPA). VESPA was developed as a specialized tool within an overarching tool suite focused on the visualization and statistical integration of multiple data sources in a genomic context. VESPA 1.1.1 is a client-side Java application focused on assisting scientists with the annotation of prokaryotic genomes through the integration of proteomics (peptide-centric) and transcriptomics (probe or RNA-Seq) data with current genome location coordinates. VESPA visualizes all potential reading frames in a genome and has the capability to browse and query the data to quickly identify regions of interest with respect to structural annotation (e.g., novel genes, frameshifts). A basic proteotypic peptide statistic called SVM Technique to Evaluate Proteotypic Peptides (STEPP)  can be computed within VESPA, and used to filter peptides displayed in the visualization and queries. In addition, sequences of interest can be sent directly to BLAST  to assess the homology of genes identified within VESPA to known genes in the public databases. Alternatively, information extracted from the data, based on user queries to locate regions of interest, can be exported in easy-to-use formats for continued exploration outside of VESPA. VESPA is freely available at https://www.biopilot.org/docs/Software/Vespa.php. Here, we demonstrate the capabilities of VESPA with several use-case scenarios.
There are two modules in VESPA, an independent data analyzer module and the user interface (UI) platform. These two modules are installed together as one application built entirely in Java including an embedded H2 database http://www.h2database.com. From the data import and analytical operations, performance gains were obtained by a fast database running in embedded mode and by modularizing technical analysis of different types of data in several phases. Most of the intensive processing is performed at project creation or load so that quick data retrievals are possible.
The VESPA user interface is built using the Netbeans Platform and relies heavily on Java 2D for its visualizations http://netbeans.org/features/platform. Each visual component was built by extending existing Swing components for containers and providing custom paint code to render data from specific UI rendering models. The NetBeans platform provides a mature windowing system, module loader, persistence mechanism, and a Service Provider Application Programming Interface (API). With the Service Provider API, we developed a custom extension point or Service Interface and a Service Provider implementation that wraps the Analyzer. Using this platform, the visualization modules can be dynamically registered on start-up. This approach allows new visualizations to be added to the software with relative ease and additionally allows for an "auto update" feature that will download updates and new modules without any additional installation steps.
The VESPA Data Analyzer is written to read and process the various data types for storage in an H2 database. It relies heavily on the Apache POI libraries http://poi.apache.org to handle reading and writing Excel files. The Analyzer is independent from the UI, so that it can be used to process data independently or to load projects behind the scenes. After processing and storing the data the Analyzer serves up objects for the UI to visualize and query against. In-memory UI rendering models are used to provide a rapid response UI during navigation and trivial data filtering or searching. More complex searches and data exports are deferred to the database.
Data import, processing and summarization
VESPA displays four levels of genome resolution, shown in the default layout in Figure 1. The highest level of resolution is the Genome View (Figure 1B), which shows an entire chromosome or plasmid, on which the density of features, such as orphan peptides or RNA-Seq, can be displayed. The second level of resolution is the Intermediate View (Figure 1C) along the top of the visualization; this view allows the user to more easily navigate around the Reading Frame View (third tier of resolution). The Reading Frame View (Figure 1D) is the primary visualization, displaying the double stranded DNA as thin black lines, with the three positive reading frames above and the three negative reading frames below these lines. This view wraps from left to right, and thus more of the genome can be visualized than in a single linear view. The proteomic and transcriptomic data are displayed directly within the Reading Frame View, and can be viewed or hidden using the buttons on the top control panel (Figure 1E). The resolution level of this screen can be modified by simply clicking on the "Number of Base Pairs per Pixel" and setting it to a desired level (Figure 1F). The fourth level of resolution is displayed based on regions defined by the user; a click and drag activity opens the Sequence View (Figure 1G), where all of the specific nucleic and amino acids in that region are displayed and accordingly color-coded to match the Reading Frame View. All individual visualization components can be easily resized within the application or completely un-docked from the application main window to allow the user to customize the application to suit their analysis task. The user-defined settings will be restored each time the application is launched, although, under the "Windows" pull-down selection the user may "Reset Windows" to the default view.
Filter, query and export capabilities
Users can export all orphan peptide locations and potential ORFs that meet query definitions (highlighted in Figure 2), or simple lists of ORFs, peptides or probes with coordinates. Furthermore, from within VESPA, a protein sequence of interest can be used as a BLAST query to search for sequence homology with proteins in the public databases at NCBI, by a right click on the protein in the Search or Reading Frame Data panel (Figure 1J). This gives the user a quick method to infer if an O SS of interest has homology with an annotated gene from another species, and could potentially be a true ORF.
Simplified annotation discoveries through proteogenomic queries (case study 1)
Proteogenomics is focused on the utilization of MS/MS data to facilitate structural annotation efforts. VESPA dramatically simplifies proteogenomic tasks by allowing easy upload, browse, query and export capabilities for genomic and proteomic data. To demonstrate these capabilities, Figure 2 shows a screenshot of VESPA displaying proteomics data from Yersinia pestis Pestoides F. The project creation took in the genome files downloaded from GenBank (NC_009381.fna and NC_009381.gff) and an Excel table containing over 20,000 peptides, which were collected from a peptide identification search against protein translations of all potential coding regions in the Y. pestis Pestoides F genome. VESPA imported these peptides as their raw amino acid sequences, matching each peptide against all potential reading frames in the genome, thus not requiring prior mapping of the peptides to genomic coordinates.
Integrated omic data queries: Simultaneous evaluation of proteogenomic and transcriptomic data (case study 2)
Integration of proteomics and probe-based transcriptomic data
Integration of proteomics and RNA-Seq transcriptomic data
Despite recent advances in the generation and processing of high-throughput proteomics and transcriptomics data, the availability of visual analytics tools for the purposes of studying genome annotation, as well as the integration and exploration of these data streams in concert, remains a challenge. Here we have presented VESPA, a freely available software tool, for the purpose of proteogenomics and the integration of peptide-centric data with other forms of high-throughput transcriptomics data. The proteogenomic queries available through VESPA enable the discovery of regions of mis-annotations in a rapid manner, which can reduce the number of candidate reading frames for evaluation.
While VESPA and similar software tools facilitate data integration to improve genome annotations, the contribution of annotation corrections back to the genome databases is an ongoing challenge within the genomics community. NCBI does provide a mechanism to submit Third Party Annotations based on experimental evidence http://www.ncbi.nlm.nih.gov/genbank/tpa. Though these annotations do not become incorporated into the whole genome annotation unless the genome project owner updates the genome, this at least provides a mechanism to make specific gene annotation corrections found using VESPA publicly available.
VESPA is designed with a plug-in-play architecture to allow the addition the new visualizations and query interfaces. Future development will include the enhancement of these capabilities, such as the ability to query and filter on transcriptomic data and the visualization of CHIP-Seq data. In addition, VESPA enhancements are planned that will allow ingest of additional data formats (e.g., GenBank, pepXML), and facilitate the analysis of genomes with multiple genetic elements. Since VESPA has an auto-update feature that will notify the user when newer versions are available, the user will have nearly immediate access to these features as they are added.
Methods - biological case studies
Synechococcus sp. PC 7002
The Synechococcus sp. PCC 7002 data generation methods have been described previously [27, 28], thus the data generation and analyses will be only briefly described here. The proteomics and RNA-Seq experiments were performed on cells grown under atmospheric CO2 levels to study photorespiration processes in this organism.
The Synechococcus sp. PCC 7002 cell samples were processed for LC-MS/MS proteomics essentially as described previously ; the peptide identification computations were performed at the Molecular Sciences Computing Facility at the Environmental Molecular Sciences Laboratory (Richland, WA). MS/MS peaks were determined using DeconMSn, v2.2  and MSPolygraph  was used for identifying peptides. Tryptic peptides were searched for using a parent mass-to-charge window of +/- 3 Da, and fragment ion windows of +/- 1 Da. Two missed cleavages were allowed for peptides with a parent mass charge of +1, three for +2 parent mass peptides, and four missed cleavages for peptides with a parent mass of +3. The spectra were searched against a six frame translation (no minimum length was imposed) of the Synechococcus sp. PCC 7002 genome and plasmids (NC_010474 through NC_010480) and spectra matching peptides at least six amino acids in length reported. For estimating error rates, random peptides were generated using the program mimic, released with percolator. The false discovery rates (q-values) were estimated with qvality. Peptides identified at a q-value < 5% were retained for visualization in VESPA.
The Synechococcus sp. PCC 7002 RNA-Seq data were generated from an 0.5 μg RNA sample using a SOLiD™ Whole Transcriptome Analysis Kit (Applied Biosystems) and the SOLiD™ 3Plus protocol as described previously . Sequencing was performed at the Genomics Core Facility at The Pennsylvania State University (University Park, PA). The raw RNA-Seq data were processed as described previously , the sequence reads mapping to rRNA-coding regions removed and a SAM file generated for VESPA import.
Yersinia pestis pestoides F
The Y. pestis Pestoides F data were a subset of data collected for a larger experiment focused on the comparison of genome annotations across multiple Yersiniae strains . Here we briefly describe the methods used to generate the proteomics and global microarray data.
Peptides (0.5 μg/μL) from global preparations (run in triplicate, total of n = 30 LC-MS/MS runs per strain), and SCX fractionated samples (n = 48 fractionated samples run per strain) were separated by a custom-built nanocapillary HPLC system. The eluate from the global preparations and fractionated samples was directly analyzed by electrospray ionization (ESI) using a LTQ Orbitrap Velos mass spectrometer or linear ion trap (LTQ) mass spectrometer (Thermo Scientific), respectively. Raw data are available to the public at http://omics.pnl.gov. MS/MS fragmentation spectra were searched against a six frame translation (minimum open reading frame length of 30 amino acids) of Y. pestis Pestoides F genome and plasmids using the SEQUEST peptide identification software . The mass tolerance used for matching was set to ± 3 Da. Peptide identifications were retained based upon the following criteria: 1) SEQUEST DelCn2 value ≥ 0.10; 2) SEQUEST correlation score (Xcorr) ≥ 1.9 for charge state 1+ for fully tryptic peptides and Xcorr ≥ 2.20 for 1+ for partially tryptic peptides; Xcorr ≥ 2.2 for charge state 2+ and fully tryptic peptides and Xcorr ≥ 3.3 for charge state 2+ and partially tryptic peptides; Xcorr ≥ 3.3 for charge state 3+ and fully tryptic peptides and Xcorr ≥ 4.0 for charge state 3+ and partially tryptic peptides. Using the reverse database approach, the false discovery rate (FDR) was calculated to be < 0.4% at the spectrum level.
Universal Yersinia Microarray
The global microarray data included 7641 designed oligos from Y. pestis strains CO92, KIM, Pestoides F, Antiqua, Nepal516, and biovar Microtus str. 91001, and Y. pseudotuberculosis. The array platform description and oligo list is available at NCBI Gene Expresssion Omnibus (GEO) under accession GPL9009. Scanning, image analysis, and normalization were performed as outlined in PFGRC standard protocol http://pfgrc.jcvi.org/index.php/microarray/protocols.html. Individual TIFF images from each channel were analyzed with JCVI Spotfinder software (available at http://pfgrc.jcvi.org/index.php/bioinformatics.html). Microarray data were normalized by LOWESS normalization using TM4 software MIDAS http://pfgrc.jcvi.org/index.php/bioinformatics.html. Oligos generating intensity signals ≥ 25,000 from samples prepared at 1 hour time point under 37 degree growth were considered to have positive hybridization above background and therefore incorporated as experimental measurements. Transcriptomics data have been deposited in the GEO repository under series accession GSE30634.
Availability and requirements
VESPA is freely available at https://www.biopilot.org/docs/Software/Vespa.php with installers for Windows XP, Windows 7, Macintosh and Linux. Java Runtime Environment 1.6 is required to run the application.
Sequencing by Oligonucleotide Ligation and Detection
General Feature Format
Basic Local Alignment Search Tool
Application Programming Interface
This work was supported by the National Institutes of Health for work performed at Pacific Northwest National Laboratory (PNNL). The software was developed under grant 1R01GM084892-01 (BMW) from the National Institute of General Medical Sciences and the data shown in screen shots were generated in part under contract Y1-A1-8401 (JNA) from the National Institute for Allergy and Infectious Disease, contract 56812 from the Genomic Science Program (GSP) and contracts 54876 and 57271 from the Office of Advanced Scientific Computing Research and the Office of Biological and Environmental Research of the U.S. Department of Energy (DOE). PNNL is a multiprogram national laboratory operated by Battelle for the DOE under Contract DE-AC06-76RL01830. The proteomics data presented were processed by the Instrument Development Laboratory at the Environmental Molecular Sciences Laboratory (EMSL). EMSL is a national scientific user facility supported by the DOE Office of Biological and Environmental Research.
The authors authors gratefully acknowledge Drs. Donald A. Bryant and Marcus Ludwig for sharing Synechococcus sp. PCC 7002 RNA-Seq data and Sebastian Jaramillo-Riveri for processing the Synechococcus sp. PCC 7002 proteomics data shown in screenshots. The transcriptomic data presented for Yersinia pestis Pestoides F were processed the J. Craig Venter Institute. We gratefully acknowledge Vladimir Motin, Sadhana Chauhan, Scott N. Peterson, Marcus B. Jones, and Bryan C. Frank for their support in acquisition of these data.
- Steinfath M, Repsilber D, Scholz M, Walther D, Selbig J: Integrated data analysis for genome-wide research. EXS. 2007, 97: 309-329.PubMedGoogle Scholar
- Zhang W, Li F, Nie L: Integrating multiple 'omics' analysis for microbial biology: application and methodologies. Microbiology. 2010, 156 (Pt 2): 287-301.View ArticlePubMedGoogle Scholar
- Ansong C, Purvine SO, Adkins JN, Lipton MS, Smith RD: Proteogenomics: needs and roles to be filled by proteomics in genome annotation. Brief Funct Genomic Proteomic. 2008, 7 (1): 50-62.View ArticlePubMedGoogle Scholar
- Armengaud J: Proteogenomics and systems biology: quest for the ultimate missing parts. Expert Rev Proteomics. 2010, 7 (1): 65-77.View ArticlePubMedGoogle Scholar
- Castellana N, Bafna V: Proteogenomics to discover the full coding content of genomes: a computational perspective. J Proteomics. 2010, 73 (11): 2124-2135.PubMed CentralView ArticlePubMedGoogle Scholar
- Croucher NJ, Vernikos GS, Parkhill J, Bentley SD: Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011, 12: 120-PubMed CentralView ArticlePubMedGoogle Scholar
- Payne SH, Huang ST, Pieper R: A proteogenomic update to Yersinia: enhancing genome annotation. BMC Genomics. 2010, 11: 460-PubMed CentralPubMedGoogle Scholar
- Renuse S, Chaerkady R, Pandey A: Proteogenomics. Proteomics. 2011, 11 (4): 620-630.View ArticlePubMedGoogle Scholar
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH: JBrowse: a next-generation genome browser. Genome Res. 2009, 19 (9): 1630-1638.PubMed CentralView ArticlePubMedGoogle Scholar
- Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol. 2011, 29 (1): 24-26.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079.PubMed CentralView ArticlePubMedGoogle Scholar
- Milne I, Bayer M, Cardle L, Shaw P, Stephen G, Wright F, Marshall D: Tablet-next generation sequence assembly visualization. Bioinformatics. 2010, 26 (3): 401-402.PubMed CentralView ArticlePubMedGoogle Scholar
- Edmonson MN, Zhang J, Yan C, Finney RP, Meerzaman DM, Buetow KH: Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics. 2011, 27 (6): 865-866.PubMed CentralView ArticlePubMedGoogle Scholar
- Helmy M, Tomita M, Ishihama Y: OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics. BMC Plant Biol. 2011, 11 (1): 63-PubMed CentralView ArticlePubMedGoogle Scholar
- Hwang D, Zhang N, Lee H, Yi E, Zhang H, Lee IY, Hood L, Aebersold R: MS-BID: a Java package for label-free LC-MS-based comparative proteomic analysis. Bioinformatics. 2008, 24 (22): 2641-2642.PubMed CentralView ArticlePubMedGoogle Scholar
- Mortensen P, Gouw JW, Olsen JV, Ong SE, Rigbolt KT, Bunkenborg J, Cox J, Foster LJ, Heck AJ, Blagoev B: MSQuant, an open source platform for mass spectrometry-based quantitative proteomics. J Proteome Res. 2010, 9 (1): 393-403.View ArticlePubMedGoogle Scholar
- Polpitiya AD, Qian WJ, Jaitly N, Petyuk VA, Adkins JN, Camp DG, Anderson GA, Smith RD: DAnTE: a statistical tool for quantitative analysis of -omics data. Bioinformatics. 2008, 24 (13): 1556-1558.PubMed CentralView ArticlePubMedGoogle Scholar
- Yu K, Salomon AR: PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information. Proteomics. 2009, 9 (23): 5350-5358.PubMed CentralView ArticlePubMedGoogle Scholar
- Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA: Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008, 24 (23): 2672-2676.PubMed CentralView ArticlePubMedGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945.View ArticlePubMedGoogle Scholar
- Podicheti R, Gollapudi R, Dong Q: WebGBrowse-a web server for GBrowse. Bioinformatics. 2009, 25 (12): 1550-1551.View ArticlePubMedGoogle Scholar
- Wilkinson M: Gbrowse Moby: a Web-based browser for BioMoby Services. Source Code Biol Med. 2006, 1: 4-PubMed CentralView ArticlePubMedGoogle Scholar
- Webb-Robertson BJ, Cannon WR, Oehmen CS, Shah AR, Gurumoorthi V, Lipton MS, Waters KM: A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics. 2010, 26 (13): 1677-1683.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402.PubMed CentralView ArticlePubMedGoogle Scholar
- Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N, Sun Z, Nilsson E, Pratt B, Prazen B: A guided tour of the Trans-Proteomic Pipeline. Proteomics. 2010, 10 (6): 1150-1159.PubMed CentralView ArticlePubMedGoogle Scholar
- Schrimpe-Rutledge AC, Jones MB, Chauhan S, Purvine SO, Sanford JA, Monroe ME, Brewer HM, Payne SH, Ansong C, Frank BC: Comparative Omics-Driven Genome Annotation Refinement: Application Across Yersiniae. PLoS One. 2012, 7 (3): e33903-PubMed CentralView ArticlePubMedGoogle Scholar
- Cannon WR, Rawlins MM, Baxter DJ, Callister SJ, Lipton MS, Bryant DA: Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. J Proteome Res. 2011, 10 (5): 2306-2317.View ArticlePubMedGoogle Scholar
- Ludwig M, Bryant DA: Transcription Profiling of the Model Cyanobacterium Synechococcus sp. Strain PCC 7002 by Next-Gen (SOLiD) Sequencing of cDNA. Front Microbiol. 2011, 2: 41-PubMed CentralView ArticlePubMedGoogle Scholar
- Mazauric MH, Licznar P, Prere MF, Canal I, Fayet O: Apical loop-internal loop RNA pseudoknots: a new type of stimulator of -1 translational frameshifting in bacteria. J Biol Chem. 2008, 283 (29): 20421-20432.View ArticlePubMedGoogle Scholar
- Mayampurath AM, Jaitly N, Purvine SO, Monroe ME, Auberry KJ, Adkins JN, Smith RD: DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra. Bioinformatics. 2008, 24 (7): 1021-1023.PubMed CentralView ArticlePubMedGoogle Scholar
- Cannon WR, Jarman KH, Webb-Robertson BJ, Baxter DJ, Oehmen CS, Jarman KD, Heredia-Langner A, Auberry KJ, Anderson GA: Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. J Proteome Res. 2005, 4 (5): 1687-1698.View ArticlePubMedGoogle Scholar
- Kall L, Canterbury JD, Weston J, Noble WS, MacCoss MJ: Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods. 2007, 4 (11): 923-925.View ArticlePubMedGoogle Scholar
- Kall L, Storey JD, Noble WS: QVALITY: non-parametric estimation of q-values and posterior error probabilities. Bioinformatics. 2009, 25 (7): 964-966.PubMed CentralView ArticlePubMedGoogle Scholar
- Yates JR, Eng JK, McCormack AL, Schieltz D: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal Chem. 1995, 67 (8): 1426-1436.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.