Skip to main content

Table 1 Overview of assembly statistics

From: An improved genome assembly uncovers prolific tandem repeats in Atlantic cod

Assembly Total size N50 N50 Percentage CEGMAa BUSCOb REAPRc FRC bam d Potential
  assembly contig scaffold gap bases      conflict
  (Mbp) (kbp) (Mbp)       (sequences)e
gadMor1f 832 2.3 0.14 26.9 444 (96.9%) 3 308 (89.4%) 2 547 4 210 772 76
ALPILM 660 4.4 0.16 28.7 424 (92.6%) 3 016 (81.6%) 19 787 2 182 096 122
NEWB454 656 6.2 1.30 24.4 435 (95.0%) 3 109 (84.1%) 18 117 2 044 008 26
CA454ILM 647 9.9 0.50 3.49 447 (97.5%) 3 379 (91.4%) 7 406 1 351 500 96
CA454PB 682 95 0.27 1.62 431 (94.1%) 3 310 (89.5%) 8 617 1 508 054 188
gadMor2g 643 116 1.15 1.69 435 (95.0%) 3 447 (93.2%) 7 359 1 248 792 15
  1. aCEGMA annotates 458 highly conserved eukaryotic genes
  2. bBUSCO annotates 3,698 actinopterygii specific genes
  3. cREAPR analyses the discordance between the expected order, orientation and distance of mapped paired reads, with detected potential errors, fewer is better
  4. d FRC bam uses a similar approach as REAPR, with total number of features (i.e., potential assembly problems), fewer is better
  5. eNumber of sequences mapping to more than one linkage group or to multiple linkage groups, fewer is better
  6. fFrom [5]
  7. g93% of the gadMor2 assembly is additionally oriented and ordered into 23 linkage groups (Additional file 1: Table S3)