Skip to main content

Table 1 Statistics of contigs assembled from simulated data

From: Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology

Dataset a

Program

Total length (bp)

Mean (bp)

N50 (bp)

N90 (bp)

Error b

D. melanogaster (simulation)

anytag c

113,166,478

66,141

197,693

43,974

109

ABySS d

116,966,148

5,795

177,493

33,254

89

MSR-CA

116,924,670

48,396

163,131

34,562

346

soap d

113,971,825

16,208

56,061

13,361

90

velvet d

114,719,611

16,573

104,879

23,729

330

Human Chr1 (simulation)

anytag

216,049,114

49,360

106,803

27,723

189

ABySS

221,070,068

1,578

9,362

1,332

122

MSR-CA

218,489,997

16,398

37,472

9,204

1,785

soap

221,093,414

4,002

21,237

5,295

46

 

velvet

Out of memory e

  1. aAll programs use the same simulated raw data. Our dataset was generated into four libraries, with insert sizes at 200 bp, 300 bp, 400 bp and 600 bp. Sequencing error was simulated at 0.005 and randomly distributed on the reads. The diploid heterozygosity is set at 0.001.
  2. bError = Inversion + Relocation + Translocation. The evaluation was completed by the evaluator from GAGE.
  3. canytag constructed pseudo-Sanger sequences, Newbler and minimus2 were used to assemble pseudo-Sanger sequences.
  4. dkmer size was iteratively set to 21, 25, 31, 41, 51 for ABySS, SOAPdenovo and velvet. The assembly with the largest N50 contig was showed.
  5. eOur memory limit is 450 G bytes.