Skip to main content

Table 4 Mismatches analysis for human genome assembly

From: BASE: a practical de novo assembler for large genomes using long NGS reads

Dataset Assembler Whole genome mismatches Exon region mismatches
Total Variant Call Public SNP Novel Total Variant Call Public SNP Novel
YH_100bp SOAPdenovo2 2,735,141 2,423,482 108,044 203,615 47,926 40,701 1,590 5,635
BASE 3,479,046 2,872,396 208,351 398,299 47,515 42,124 1,724 3,667
YH_150bp SOAPdenovo2 2,911,990 2,613,670 113,037 185,283 46,002 43,151 1,148 1,703
BASE 3,839,110 3,075,913 256,217 506,980 52,561 46,420 1,822 4,319
NA12878_150bp SOAPdenovo2 2,301,111 2,025,220 109,497 166,394 39,702 35,644 1,740 2,318
BASE 3,459,648 3,052,269 129,933 277,446 49,711 45,361 1,151 3,199
NA12878_250bp SOAPdenovo2 2,544,785 2,122,144 130,806 291,835 42,744 35,890 1,936 4,918
BASE 3,751,887 2,613,065 604,805 534,017 48,635 37,853 5,068 5,714
  1. We mapped the assembled contigs to Hg19 and got the mismatches between each contig and reference sequence. Then we used the detected SNPs and SNPs from published SNP databases to analysis these mismatches in whole genome and exon regions respecitively