Skip to main content

Table 3 Summary of human contig assembly

From: BASE: a practical de novo assembler for large genomes using long NGS reads

  YH, 100 bp YH, 150 bp NA12878, 150 bp NA12878, 250 bp
SOAPdenovo2,k = 41 BASE SOAPdenovo2,k = 61 BASE SOAPdenovo2,k = 41 BASE SOAPdenovo2,k = 61 BASE
Contig num 3,420,897 3,319,617 2,279,026 2,145,792 8,068,278 1,934,261 1,416,658 1,511,270
Contig size 2.67E + 09 2.88E + 09 2.76E + 09 2.95E + 09 2.44E + 09 2.90E + 09 2.60E + 09 2.94E + 09
Contig N50 2,244 2,279 3,008 3,126 1,140 3,823 3,368 4,199
Contig aligned rate 99.10 % 97.07 % 98.87 % 95.96 % 99.40 % 97.62 % 99.34 % 96.33 %
Genome coverage 90.36 % 93.76 % 93.12 % 93.90 % 84.11 % 95.58 % 89.55 % 94.09 %
RepeatMasked coverage 97.05 % 96.13 % 97.28 % 95.32 % 93.94 % 97.38 % 95.60 % 95.99 %
Exon coverage 93.76 % 91.51 % 95.73 % 94.13 % 91.48 % 96.84 % 93.90 % 91.49 %
Mismatch base 2,735,141 3,479,046 2,911,990 3,839,110 2,301,111 3,459,648 2,544,785 3,751,887
Mismatch ratio 0.103 % 0.121 % 0.105 % 0.130 % 0.094 % 0.119 % 0.098 % 0.128 %
Indel num 340,930 327,469 358,358 334,989 259,190 322,214 327,695 372,941
Indel base 1,412,005 1,587,265 1,692,213 1,741,947 1,086,014 1,602,240 1,400,230 1,953,311
Indel ratio 0.053 % 0.057 % 0.062 % 0.061 % 0.045 % 0.057 % 0.054 % 0.069 %
  1. We mapped the raw contigs to Hg19. Aligned rate is the contig-aligned length divided by total contig length. To calculate genome coverage, the length of gap regions in Hg19 has been removed. For unique coverage, the repetitive regions have been further removed. For SOAPdenovo2 contig assembly, we all used single-kmer method and M1 to treat heterozygous regions