Skip to main content

Table 2 Comparing the assembly results of PE subset selection for the grouper dataset.

From: Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

 

Original dataset

Selected subset

Dataset characteristics

  

Dataset size (G bp)

125

63

# read pairs

319,878,932

158,651,599

Mean length of reads

195.3

198.6

%GC content of reads

41.0%

39.7%

Assembly statistics 1

  

# contigs

39,911

53,488

Total contig length

996,203,993

991,109,739

N50 contig size (K bp)

82.2

43.5

# scaffolds

3,917

4,043

Total scaffold length

1,076,396,971

1,062,462,514

Largest scaffold length

12,701,604

21,777,629

N50 scaffold size (K bp) ( L50 number)2

3,354 (97 scaffolds)

5,443 (61 scaffolds)

N75 scaffold size (K bp) (L75 number)2

1,429 (218 scaffolds)

2,493 (131 scaffolds)

%GC of scaffolds

41.23%

41.17%

# 'N's

79,902,759

71,510,549

# 'N's per 100K bp

7,423.10

6,730.57

# scaffolds for 1G bp3

482

304

  1. 1 All statistics are based upon the size of contigs and scaffolds both ≥ 1K bp.
  2. 2 L50/L75 denotes the minimal number of the scaffolds that produce the 50%/75% bases of the assembly (i.e., all the scaffolds).
  3. 3 The minimal number of the scaffolds whose total length ≥ 1G bp.