From: Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples
Pipeline Tool | Vipie | ViromeScan [11] | VirusTAP [8] | Virome [16] | Metavir [14] | Taxonomer [6] | MetaShot [12] |
---|---|---|---|---|---|---|---|
Primary goal | Parallel analysis of multiple viral metagenomes from web and suited for molecular epidemiology studies. | To profile viromes using databases of existing eukaryotic viruses without assembly. | Identification of viruses in a sample, after a thorough elimination of known non-viral sequences. | Classification of all putative ORF found in a viral metagenome, characterization of viral communities. | Analysis of virome, diversity metrics and marker gene phylogenies. | Ultra fast metagenomics analysis focusing on detection of microorganisms, including virus and bacterial. | Highly accurate and comprehensive workflow for host-associate microbiome classification on multiple samples. |
Web based | Yes. | No. | Yes. | Yes (Flash required). | Yes. | Yes. | No. |
Outputs | Interactive table, plots and raw downloads. Clustered heatmaps with dynamic group assignment re-plots. | Static population pie charts. Sample based clustered heatmaps. | Contig based hits and seamless web BLAST interface. | Rich collection of sample source virome ORF and sequence categories. | Comparative analysis of viromes and annotations including networks, nonmetric distance and tree maps. | Interactive pie charts with kingdoms in bins and also impressive sunburst flare sub classifiers. | A Krona graph and Interactive Taxonomy HTML table along with csv file. |
Source data | Paired-end reads; fastq format. | Sinle-end or paired-end reads; fastq format. | Paired-end reads. Accepts also single-end reads; fastq format. | sff, or fastq; intended for the 454-generated metagenomes. | Reads (>300 bases) or assembled contigs. | Paired-end reads in fastq and fasta formats. | Paired-end reads in fastq format. |
Trimming and filtering | YES, as the first step. | YES, after selection of viral reads, at the level of a bam file. | YES, as the first step. | YES: quality based; duplicate filtering; contamination | Not specified. | Not specified. | YES, as the first step. |
De-novo assembly | YES, a choice of assemblers. | No. | YES, a choice of assemblers; done after subtraction steps. | No. | No. | No. | No. |
Subtraction of human ref. and bacterial ribosomal sequences | Optional, only for the output of dark matter sequences. | YES, using Human Best Match Tagger. No for ribosomal. | YES, also other host databases available (mouse etc.). | Not specified for human. Ribosome is removed using BLAST against rDNA db. | Not specified. | Not subtracted but reported as part of detection. | Yes, reports identification of human host reads and bacterial mappings. |
Means of virus identification | (a) BLAST against a pan-viral database. (b) Remapping of original reads to the identified candidates. | Mapping to the members of the virus database using bowtie2 [24]. | BLAST search against the NCBI nt database. | Protein BLASTP upon two databases. Several tiers of classification of the ORFs. | Not specified. | Taxonomer Binner DB with 21Ā bp kmers unique identifiers to known viruses. | Custom similarity workflow with hamming distance. |
Virus database for identification | A custom database containing 20759 human, animal, plant and bacterial viruses. | Eukaryotic viruses only. Four custom databases available for download. | Specificity is maintained by the subtraction steps prior to assembly and BLAST search. | UniRef 100 peptide database, five annotated protein databases, MetaGenomes On-line. | GAAS tool (https://sourceforge.net/projects/gaas/). | Binner DB needs to be built using KAnalyze [42] (https://sourceforge.net/projects/kanalyze/files/). | |
Action when a read maps to different viruses | Score is split among the hit reference sequences. | Not specified. | Not specified. | Not specified. | Not specified. | Assigns as ambiguous. | Parsed for human endogenous retrovirus otherwise classify as ambiguous and discarded. |