Skip to main content

Table 1 The performance comparison between major assembly tools and HyDA-Vista on the error-corrected standard multicell E.coli dataset (6.2 Gbps, 28 million reads, 100 bp, treated as single-end) using QUAST in default mode [43].

From: HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly

Assembler

NGA50

NA50

Largest (bp)

Total (bp)

MA

GF (%)

Unaligned (bp)

SOAPdenovo

32,032

35,343

101,201

4,304,232

3

95.2

3,421

ABySS

31,237

32,987

110,012

4,530,701

0

97.56

0

SPAdes

60,338

60,768

173,976

4,545,775

0

97.8

3,001

IDBA

57,826

58,549

173,964

4,538,426

0

97.7

2,349

HyDA

36,292

39,069

123,771

4,524,075

0

97.4

0

HyDA-Vista

82,838

94,910

204,602

4,544,286

0

97.9

0

  1. All statistics are based on contigs no shorter than 500 bp. Since there are not (QUAST-defined) misassemblies in any of the assemblies, the length statistics are based on correct contigs. NGA50 (NA50) is a (QUAST-corrected) contig size the contigs larger than which cover half of the genome (assembly) size [43, 44]. Total is sum of the length of all contigs. MA is the number of misassemblies. GF is the genome fraction percentage, which is the fraction of genome bases that are covered by the assembly. Unaligned is the total length of all of the contigs that could not be aligned to the reference.