- Open Access
WHAM!: a web-based visualization suite for user-defined analysis of metagenomic shotgun sequencing data
© The Author(s). 2018
- Received: 11 January 2018
- Accepted: 14 June 2018
- Published: 25 June 2018
Exploration of large data sets, such as shotgun metagenomic sequence or expression data, by biomedical experts and medical professionals remains as a major bottleneck in the scientific discovery process. Although tools for this purpose exist for 16S ribosomal RNA sequencing analysis, there is a growing but still insufficient number of user-friendly interactive visualization workflows for easy data exploration and figure generation. The development of such platforms for this purpose is necessary to accelerate and streamline microbiome laboratory research.
We developed the Workflow Hub for Automated Metagenomic Exploration (WHAM!) as a web-based interactive tool capable of user-directed data visualization and statistical analysis of annotated shotgun metagenomic and metatranscriptomic data sets. WHAM! includes exploratory and hypothesis-based gene and taxa search modules for visualizing differences in microbial taxa and gene family expression across experimental groups, and for creating publication quality figures without the need for command line interface or in-house bioinformatics.
WHAM! is an interactive and customizable tool for downstream metagenomic and metatranscriptomic analysis providing a user-friendly interface allowing for easy data exploration by microbiome and ecological experts to facilitate discovery in multi-dimensional and large-scale data sets.
- Data exploration
- DNA analysis
- Expression analysis
As metagenomic and metatranscriptomic shotgun sequencing data become both less expensive to generate and more readily available, researchers have turned to automated pipelines such as MetaPhlAn , HUMAnN2 [2, 3] MEGAN  and SAMSA  for annotation and analysis. While these applications provide high quality functional and taxonomic annotations, a computational hurdle still exists between the data output and biologically interpretable results. Output formats from annotation pipelines are typically cumbersome tables and large matrices of genes, assigned taxa, and abundance or expression levels. Researchers then must sift through the data for their genes of interest to test their stated hypotheses. Further because of the size and density of information, exploration of the data presents an even more overwhelming task for experimentalists, inhibiting data-driven discovery.
Concurrently with the increasing interest in the field, many of the tools described above have been employed to analyze and characterize the human microbiome. Two widely used tools, HUMANn2 and QIIME2, provide extensive frameworks for gene annotation and taxonomic analysis, respectively. However, both of these tools have limitations for downstream visualization and user-based data exploration. While HUMAnN2 includes a visualization script to generate relative abundance plots for a particular pathway or gene family of interest, users are limited in figure customization and must use the command line. Requiring users to specify the feature of interest hinders exploration of the data set in its entirety. However, other platforms such as QIIME2 have recognized the utility of command line independence and user-defined exploration of sequencing data. A novel feature of the QIIME2 platform includes a Graphical User Interface (GUI)-based Shiny derivative where users can visualize taxonomic information and download high-quality figures. Nevertheless, QIIME users are limited to taxonomic investigations and therefore miss the opportunity to correlate gene expression observations with taxonomic abundance. In addition to these commonly used resources, new tools and methods are continuously being developed to deal with the challenges of visualizing these complex datasets. Several R-packages or command line tools exist for this purpose, including MG-RAST , CAMERA , and ASAR . Others only focus only on 16S rRNA sequence data input and are unable to accommodate shotgun metagenomics data containing information on both taxa and functional elements [9–13]. Therefore, there is a growing a need for tools addressing the specific challenges biomedical experts face when analyzing metagenomics data.
Our Workflow Hub for Automated Metagenomic Exploration (WHAM!) aims to provide a platform for simple and intuitive exploration and targeted analysis of metagenomic sequencing data. Our platform requires no computational background or processing on the part of the user to generate publication-quality figures. Furthermore, this application allows users to interactively explore their dataset for patterns and changes in expression or taxonomic composition while also providing a platform for analyzing specific biological features and their taxonomic contributors.
WHAM! UI architecture
WHAM! is described here as an easy to use, web-based, R-shiny application that generates publication-quality figures for metagenomic sequencing analyses (https://ruggleslab.shinyapps.io/wham_v1/). The application employs a number of R packages including, ggplot2 , psych , gplots , and plotly  for visualization (For source code and full list of packages and dependencies please see https://github.com/ruggleslab/jukebox/tree/master/wham_v1). However, all dependencies are packaged within the application, so users only need web access and input data. Currently, the application accepts two input options, based on commonly used metagenomics pipelines and the platform is open to adding additional input options as they are developed by the community. The first is a tab-delimited output of gene families, pathways or Gene Ontology (GO) terms and their abundance or expression levels in the specified format shown in Additional file 1: Table S1. This format is based on the Huttenhower Biobakery pipeline  which is comprised of a suite of tools including FastQC, Kneaddata , MetaPhlAn  and HUMAnN2 [2, 3]. We chose this pipeline, in part, because the next iteration of the human microbiome project uses a workflow that includes Biobakery-based tools  and a curated database of metagenomics studies which have been processes through this pipeline are available through the Bioconductor ExperimentHub platform . Creating user-friendly web-tools downstream of these analyses steps will allow researchers to explore the ongoing large-scale metagenomics projects without having to do the computational heavy lifting. The second input option is the European Bioinformatics Institute (EBI) Metagenomics service, in which the user can upload up to two files containing functional features (Interpro protein families, GO terms, etc.) and/or a taxa file, in the specified formats shown in Additional file 2: Table S2.
Pipeline architecture and visualization methods
Methods for the analysis of metagenomics data are rapidly being developed to meet the need of the community (see reviews [22–24]). The choice of statistical methods, in particular, must be tailored to the specific challenges inherent to metagenomics data analysis. For differential expression analysis, we chose the ANOVA-Like Differential Expression (ALDEx2) method, which takes into account within-condition variation, the compositional characteristics of high-throughput sequencing data and multiple testing corrections. This method evaluates differential expression between experimental groups using a combination of statistical significance and effect size estimates, both of which are included in our pipeline . The WHAM! ‘Explore Your Data’ module has user input sliders for absolute effect size selection and Wilcoxon test p-value cutoffs to isolate meaningful findings in the data. A non-parametric Spearman correlation analysis was chosen for our cross correlation tests, with Benjamini-Hochberg correction for false discovery rates (FDR) .
Further, we have carefully considered the options available for metagenomic data visualization during application development. In terms of visualization, we have chosen to focus on a combination of stacked bar plots (for taxa contribution) and heatmaps (for relative abundance, correlation analysis and pairwise statistics). Stacked bar plots are able to efficiently represent the proportion of taxa present in each sample across many metagenomes and are commonly used in microbiome studies. Heatmaps are particularly useful in highlighting the taxa and gene abundance in a collection of samples or for taxa correlation plots, where other methods such as box or bar plots can become cumbersome .
To demonstrate the utility of WHAM!, we used two independent, publicly available test datasets. The first was derived from 47 human microbiome samples from four body sites made available by the Human Microbiome Project (HMP) . Shotgun metagenomic sequencing data were processed through an analysis pipeline utilizing the Huttenhower Biobakery pipeline , including FastQC, Kneaddata , MetaPhlAn  and HUMAnN2 [2, 3] to obtain an annotated gene abundance matrix. After host decontamination and quality filtering, the estimated counts in each sample were calculated by multiplying the relative abundances for each feature by the total sum of profiled counts. Following count estimation, the gene family identifiers were further collapsed by GO term mapping via the “humann2_regroup_table” function provided within HUMANn2. This dataset has been mounted as a test case to our web-app in the ‘Try a Sample Dataset’ mode on the application homepage. Although an already well-studied dataset, our analysis of these HMP sequencing data highlights the utility and exploratory capabilities provided by our visualization suite. As expected, body sites vary widely in the taxonomic species present and in the abundance of these taxa (Fig. 2a). Arm samples were dominated by the genus Cutibacterium (previously classified as Propionibacterium), which was also observed in the original HMP analysis (Fig. 2b, c) . Furthermore, stool and saliva samples exhibited much greater microbial diversity when compared to arm and vaginal samples, at the depth of resolution provided in the original data (Fig. 2a). As demonstrated, WHAM! is able to readily identify and visualize taxonomic differences based on group classifications which could include varied diets, drug treatment groups, disease states, or any other user-defined classification. We can similarly explore the GO term abundance across samples using the ‘Explore Features’ tab, automatically identifying differentially abundant GO terms across samples based on user-controlled p-value and effect size cutoffs (Fig. 2d). Of those found to be significantly different, several antibiotic resistance-related GO terms were represented, including drug transmembrane transport, differing between stool and all other body sites tested (Fig. 2e). The taxa contributing to the abundance of this pathway also differed between sites, with high diversity, including E. coli and Bacteroides, found in stool samples (Fig. 2f).
Because of our interest in the emergence of antibiotic resistance, we chose to explore our test data set for patterns in pathway abundances for antibiotic resistance mechanisms based on GO-term categories. By searching for these keywords in the ‘Feature Search’ tab, we detected several antibiotic resistance-related GO-term categories across the four body sites (Fig. 3a). Clicking on the features in the heatmap revealed significant differences in relative abundance levels of a subset of GO terms across the four body sites. These included the ‘response to antibiotic’ GO-term, which was significantly different in abundance in comparisons between stool and vagina, stool and saliva, vagina and saliva, and arm and saliva (Fig. 3b). Our analysis also demonstrates relatively high abundance levels of antibiotic resistance gene families in saliva and a wide dispersion of these gene families in stool samples (Fig. 3a).
Further investigation via the ‘Feature Search’ tab also provided taxonomic identification corresponding to the differences in ‘response to antibiotic’ GO-term abundance across the four body sites. In arm samples, the ‘response to antibiotic’ GO-term was almost exclusively present in C. acnes, while in saliva and stool samples the contributing taxa were more diverse, with the highest prevalence occurring in Streptococcus oralis in saliva and Prevotella copri in stool (Fig. 3c). Such observations in other data sets can address a number of biologically relevant questions, including how commensal bacteria contribute to the spread of antibiotic resistance, and how particular bacterial species are able to inhabit multiple different body sites, and whether or not their attributes differ across body sites.
Correlation analyses of functional features can enable users to obtain information about shared selection, or interactions between gene families, according to abundance patterns across different classification groups in the studied datasets. From this information, the highly correlated antibiotic transporter activity (GO term 9), kanamycin kinase activity (GO term 11), and response to antibiotic (GO term 7) pathways, suggest shared selection. These three pathways also were found to be anti-correlated with antibiotic metabolism (GO term 3) and beta-lactam antibiotic catabolism (GO term 5) (Fig. 3d). Establishing and evaluating these relationships in real time provides the opportunity to test and generate on-the-fly hypotheses by biomedical experts.
These implementation examples demonstrate how WHAM! can be applied to metagenomics data to easily identify and visualize biologically relevant relationships and to generate novel hypotheses. Recently developed tools, Metaviz , BURRITO  and MetaComp , address similar challenges, however, WHAM! has several important differences. Although visually striking and useful, Metaviz focuses on taxonomic analysis without factoring in biological processes, gene features or pathways . Like WHAM!, BURRITO enables uses to interactively explore their metagenomics data, but lacks the capability of feature searching and hypothesis testing and provides fewer statistical tests for relative abundance across groups when compared with WHAM! . MetaComp has robust statistics and accepts a range of inputs, but it requires an external download and installation, which can lead to unexpected issues depending on the user’s compute platform . WHAM! allows for web-based hypothesis generation based on both taxa and functional features, permitting on-the-fly confirmation and figure generation, substantially adding to the current suite of tools available for metagenomic analysis.
WHAM! is an interactive and customizable tool for data exploration, hypothesis generation and figure generation for downstream metagenomics and metatranscriptomics analysis. Offering these capabilities as an R Shiny web tool provides a user-friendly interface allowing for easy data exploration by ecologists and microbiologists to streamline discovery in multi-dimensional and large-scale data sets. Overall, WHAM! strives to provide users with the opportunity for in-depth exploration and targeted analysis of metagenomic and metatrascriptomic sequencing information with special emphasis on microbiome-related investigations. As demonstrated, the ease and utility of the WHAM! visualization suite enables users to explore patterns in the microbiome, to understand relationships between taxonomic communities and the processes in which they engage. For 16S rRNA taxonomic analysis, QIIME and Mothur have dominated the field as user friendly comprehensive bioinformatics pipelines for microbial taxonomic analysis [34, 35]. QIIME2 improved upon the pipeline, not only in the taxonomic inference algorithm , but also in its user interface, now including interactive web-based visualization and no longer requiring the use of a command line interface . Currently, there is a growing, but insufficient number of tools that allow for real-time exploratory visualization of complex shotgun metagenomics data that are designed specifically for biomedical scientists and medical professionals lacking computational training. WHAM! helps to fill this gap and we will continue to expand upon the capabilities of our tool by increasing the allowable input data structures and supported statistical packages to reflect the evolving analysis methods as they are adopted by the field.
Project name: Workflow Hub for Automated Metagenomic Exploration (WHAM!)
Project home page: https://ruggleslab.shinyapps.io/wham_v1/
Operating system: Platform independent.
Programming Language: R/Rshiny.
Other requirements: None.
Any restrictions to use by non-academics: None.
This work has used computing resources at the NYU High Performance Computing Facility (HPCF).
This work has been supported by U01 AI22285 from the National Institutes of Health and by the C & D fund.
Availability of data and materials
The first dataset analyzed has been mounted to the WHAM! web page as a sample dataset and were downloaded from the published article, Abubucker et al. (2012) Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol., 8, e1002358. The second dataset was downloaded from the EBI Metagenomics server (project ERP106171) and the original data can be found in the published article Rose G, Shaw AG, Sim K, Wooldridge DJ, Li M-S, Gharbia S, et al. Antibiotic resistance potential of the healthy preterm infant gut microbiome. PeerJ. 2017;5:e2928.
JCD developed the software, ran test analysis and contributed to writing the manuscript. TB set up upstream metagenomics analysis pipelines (Huttenhower Biobakery) and edited the manuscript. MB aided in conceptualizing the tool and edited the manuscript. KVR led the project, aided in software development and wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods. 2015;12:902–3.View ArticlePubMedGoogle Scholar
- Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol. 2012;8:e1002358.View ArticlePubMedPubMed CentralGoogle Scholar
- Pasolli E, Schiffer L, Renson A, Obenchain V, Manghi P, Truong DT, et al. Accessible, curated metagenomic data through ExperimentHub. bioRxiv [Internet]. 2017; Available from: http://biorxiv.org/content/early/2017/01/27/103085.abstract.
- Huson DH, Weber N. Microbial community analysis using MEGAN. Meth Enzymol. 2013;531:465–85.View ArticlePubMedGoogle Scholar
- Westreich ST, Korf I, Mills DA, Lemay DGSAMSA. A comprehensive metatranscriptome analysis pipeline. BMC Bioinformatics. 2016;17:399.View ArticlePubMedPubMed CentralGoogle Scholar
- Keegan KP, Glass EM, Meyer F. MG-RAST, a Metagenomics Service for Analysis of microbial community structure and function. Methods Mol Biol. 2016;1399:207–33.View ArticlePubMedGoogle Scholar
- Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M. CAMERA: a community resource for metagenomics. PLoS Biol. 2007;5:e75.View ArticlePubMedPubMed CentralGoogle Scholar
- Oranov AN, Sakenova NK, Sorokin A, Goryanin II. ASAR: visual analysis of metagenomes in R. Bioinformatics. 2018;34(8):1404–5.Google Scholar
- Huse SM, Mark Welch DB, Voorhis A, Shipunova A, Morrison HG, Eren AM, et al. VAMPS: a website for visualization and analysis of microbial population structures. BMC Bioinformatics. 2014;15:41.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Y, Xu L, Gu YQ, Coleman-Derr D. MetaCoMET: a web platform for discovery and visualization of the core microbiome. Bioinformatics. 2016;32:3469–70.PubMedGoogle Scholar
- Ayyala DN, Lin S. GrammR: graphical representation and modeling of count data with application in metagenomics. Bioinformatics. 2015;31:1648–54.View ArticlePubMedGoogle Scholar
- Vázquez-Baeza Y, Pirrung M, Gonzalez A, Knight R. EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience. 2013;2:16.View ArticlePubMedPubMed CentralGoogle Scholar
- Visualize your metagenomics 16S results with Krona charts [Internet]. [cited 2018 May 4]. Available from: https://ionreporter.thermofisher.com/ionreporter/help/GUID-BE5F627D-27BE-48E3-ACCF-6C8C1585CF92.html.
- Wickham H. Ggplot2: elegant graphics for data analysis. New York: Springer; 2009.View ArticleGoogle Scholar
- Wi R. Psych: procedures for psychological, psychometric and personality research [internet]. Evanston, Illinois: Northwestern University; 2017. Available from: https://CRAN.R-project.org/package=psych.Google Scholar
- Warnes GR, Bolker B, Bonebakker L, Gentleman R, Liaw WHA, Lumley T, et al. gplots: Various R Programming Tools for Plotting Data [Internet]. 2016 [cited 2018 May 5]. Available from: https://CRAN.R-project.org/package=gplots.
- Plotly Technologiex Inc. Collaborative data Science. Montreal, QC: Plotly Technologies Inc; 2015.Google Scholar
- McIver LJ, Abu-Ali G, Franzosa EA, Schwager R, Morgan XC, Waldron L, et al. bioBakery: a meta’omic analysis environment. Bioinformatics. 2018;34:1235–7.View ArticlePubMedGoogle Scholar
- KneadData | The Huttenhower Lab [Internet]. [cited 2017 Dec 19]. Available from: http://huttenhower.sph.harvard.edu/kneaddata.
- Integrative HMP (iHMP) Research Network Consortium. The integrative human microbiome project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe. 2014;16:276–89.View ArticleGoogle Scholar
- Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong DT, et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods. 2017;14:1023–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS One. 2013;8:e67019.View ArticlePubMedPubMed CentralGoogle Scholar
- Sudarikov K, Tyakht A, Alexeev D. Methods for the metagenomic data visualization and analysis. Curr Issues Mol Biol. 2017;24:37–58.View ArticlePubMedGoogle Scholar
- Odintsova V, Tyakht A, Alexeev D. Guidelines to statistical analysis of microbial composition data inferred from metagenomic sequencing. Curr Issues Mol Biol. 2017;24:17–36.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.Google Scholar
- hclust function | R Documentation [Internet]. [cited 2018 May 1]. Available from: https://www.rdocumentation.org/packages/fastcluster/versions/1.1.24/topics/hclust.
- Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–14.View ArticleGoogle Scholar
- Rose G, Shaw AG, Sim K, Wooldridge DJ, Li M-S, Gharbia S, et al. Antibiotic resistance potential of the healthy preterm infant gut microbiome. PeerJ. 2017;5:e2928.View ArticlePubMedPubMed CentralGoogle Scholar
- Novick RP, Muir TW. Virulence gene regulation by peptides in staphylococci and other gram-positive bacteria. Curr Opin Microbiol. 1999;2:40–5.View ArticlePubMedGoogle Scholar
- Khamash DF, Voskertchian A, Milstone AM. Manipulating the microbiome: evolution of a strategy to prevent S. aureus disease in children. J Perinatol. 2018;38:105–9.View ArticlePubMedGoogle Scholar
- Wagner J, Chelaru F, Kancherla J, Paulson JN, Zhang A, Felix V, et al. Metaviz: interactive statistical and visual analysis of metagenomic data. Nucleic Acids Res. 2018;46:2777–87.View ArticlePubMedPubMed CentralGoogle Scholar
- McNally CP, Eng A, Noecker C, Gagne-Maynard WC, Borenstein E. BURRITO: An Interactive Multi-Omic Tool for Visualizing Taxa-Function Relationships in Microbiome Data. Front Microbiol. 2018;9:365.Google Scholar
- Zhai P, Yang L, Guo X, Wang Z, Guo J, Wang X, et al. MetaComp: comprehensive analysis software for comparative meta-omics including comparative metagenomics. BMC Bioinformatics. 2017;18:434.View ArticlePubMedPubMed CentralGoogle Scholar
- Kuczynski J, Stombaugh J, Walters WA, González A, Caporaso JG, Knight R. Using QIIME to Analyze 16S rRNA Gene Sequences from Microbial Communities. Curr Protoc Microbiol. 2012;0 1:Unit-1E.5.Google Scholar
- Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13:581.View ArticlePubMedPubMed CentralGoogle Scholar
- QIIME 2 [Internet]. [cited 2017 Dec 19]. Available from: https://qiime2.org/.