Skip to main content

Table 4 Mismatches analysis for human genome assembly

From: BASE: a practical de novo assembler for large genomes using long NGS reads

Dataset

Assembler

Whole genome mismatches

Exon region mismatches

Total

Variant Call

Public SNP

Novel

Total

Variant Call

Public SNP

Novel

YH_100bp

SOAPdenovo2

2,735,141

2,423,482

108,044

203,615

47,926

40,701

1,590

5,635

BASE

3,479,046

2,872,396

208,351

398,299

47,515

42,124

1,724

3,667

YH_150bp

SOAPdenovo2

2,911,990

2,613,670

113,037

185,283

46,002

43,151

1,148

1,703

BASE

3,839,110

3,075,913

256,217

506,980

52,561

46,420

1,822

4,319

NA12878_150bp

SOAPdenovo2

2,301,111

2,025,220

109,497

166,394

39,702

35,644

1,740

2,318

BASE

3,459,648

3,052,269

129,933

277,446

49,711

45,361

1,151

3,199

NA12878_250bp

SOAPdenovo2

2,544,785

2,122,144

130,806

291,835

42,744

35,890

1,936

4,918

BASE

3,751,887

2,613,065

604,805

534,017

48,635

37,853

5,068

5,714

  1. We mapped the assembled contigs to Hg19 and got the mismatches between each contig and reference sequence. Then we used the detected SNPs and SNPs from published SNP databases to analysis these mismatches in whole genome and exon regions respecitively