Skip to main content

Table 2 Comparing the assembly results of PE subset selection for the grouper dataset.

From: Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

  Original dataset Selected subset
Dataset characteristics   
Dataset size (G bp) 125 63
# read pairs 319,878,932 158,651,599
Mean length of reads 195.3 198.6
%GC content of reads 41.0% 39.7%
Assembly statistics 1   
# contigs 39,911 53,488
Total contig length 996,203,993 991,109,739
N50 contig size (K bp) 82.2 43.5
# scaffolds 3,917 4,043
Total scaffold length 1,076,396,971 1,062,462,514
Largest scaffold length 12,701,604 21,777,629
N50 scaffold size (K bp) ( L50 number)2 3,354 (97 scaffolds) 5,443 (61 scaffolds)
N75 scaffold size (K bp) (L75 number)2 1,429 (218 scaffolds) 2,493 (131 scaffolds)
%GC of scaffolds 41.23% 41.17%
# 'N's 79,902,759 71,510,549
# 'N's per 100K bp 7,423.10 6,730.57
# scaffolds for 1G bp3 482 304
  1. 1 All statistics are based upon the size of contigs and scaffolds both ≥ 1K bp.
  2. 2 L50/L75 denotes the minimal number of the scaffolds that produce the 50%/75% bases of the assembly (i.e., all the scaffolds).
  3. 3 The minimal number of the scaffolds whose total length ≥ 1G bp.