Skip to main content

Table 1 The performance comparison between major assembly tools and HyDA-Vista on the error-corrected standard multicell E.coli dataset (6.2 Gbps, 28 million reads, 100 bp, treated as single-end) using QUAST in default mode [43].

From: HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly

Assembler NGA50 NA50 Largest (bp) Total (bp) MA GF (%) Unaligned (bp)
SOAPdenovo 32,032 35,343 101,201 4,304,232 3 95.2 3,421
ABySS 31,237 32,987 110,012 4,530,701 0 97.56 0
SPAdes 60,338 60,768 173,976 4,545,775 0 97.8 3,001
IDBA 57,826 58,549 173,964 4,538,426 0 97.7 2,349
HyDA 36,292 39,069 123,771 4,524,075 0 97.4 0
HyDA-Vista 82,838 94,910 204,602 4,544,286 0 97.9 0
  1. All statistics are based on contigs no shorter than 500 bp. Since there are not (QUAST-defined) misassemblies in any of the assemblies, the length statistics are based on correct contigs. NGA50 (NA50) is a (QUAST-corrected) contig size the contigs larger than which cover half of the genome (assembly) size [43, 44]. Total is sum of the length of all contigs. MA is the number of misassemblies. GF is the genome fraction percentage, which is the fraction of genome bases that are covered by the assembly. Unaligned is the total length of all of the contigs that could not be aligned to the reference.