Skip to main content

Advertisement

Table 1 Genus-level classification accuracy and speed of CLARK, KRAKEN , and NBC for four simulated metagenomes and several k -mer length

From: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

  k HiSeq MiSeq simBA-5 simHC.20.500
   Prec Sens Speed Prec Sens Speed Prec Sens Speed Prec Sens Speed
NBC 15 82.57 82.57 0.008 81.00 81.00 0.007 97.69 97.69 0.007 99.40 99.40 0.005
  13 78.85 78.85 0.011 77.70 77.70 0.009 92.41 92.41 0.010 98.57 98.57 0.006
  11 58.97 58.97 0.020 64.43 64.43 0.016 46.10 46.10 0.017 86.83 86.83 0.008
Clark(full) 31 99.26 77.78 541 95.33 77.69 435 98.88 89.67 591 99.68 99.42 121
  27 98.98 79.88 538 93.50 78.57 433 98.90 93.09 585 99.67 99.42 122
  23 97.33 81.97 530 90.06 80.02 426 98.71 94.54 559 99.59 99.42 119
  20 87.00 82.87 532 82.45 80.19 420 97.38 94.80 549 99.43 99.41 115
Kraken 31 99.26 77.76 2,332 95.50 77.59 1,361 98.28 89.35 1,976 96.83 96.55 237
  27 99.01 79.85 2,048 93.91 78.47 1,240 98.31 92.73 1,917 96.85 96.57 231
  23 97.45 81.89 1,923 90.56 79.75 1,186 98.25 94.18 1,824 96.80 96.57 228
  20 90.22 82.67 1,546 86.28 79.99 965 98.07 94.44 1,478 96.71 96.59 211
Clark 31 99.31 77.25 3,116 95.66 77.44 1,670 98.91 88.62 2,855 99.68 99.42 251
  27 99.07 79.37 2,796 93.90 78.29 1,522 98.90 92.26 2,554 99.67 99.42 241
  23 97.85 81.36 2,679 90.98 79.57 1,482 98.75 94.26 2,394 99.60 99.42 244
  20 88.60 82.26 2,567 83.35 79.77 1,456 97.73 94.49 2,306 99.43 99.41 239
Kraken-Q 31 99.20 76.84 6,224 95.81 74.13 5,308 98.17 87.46 7,023 91.17 85.79 3,809
  27 98.79 78.19 6,410 94.12 73.73 5,555 98.11 89.89 7,992 90.99 83.71 4,196
  23 96.67 78.48 7,015 90.57 72.35 6,329 97.21 89.07 8,989 90.46 79.27 4,574
  20 82.07 70.11 9,437 80.05 65.25 9,537 90.02 77.04 10,961 70.86 57.40 5,819
Clark-E 31 99.55 72.72 32,450 98.11 74.58 28,988 99.00 77.85 26,171 97.63 97.31 15,426
  27 99.43 74.67 29,897 96.93 75.68 28,459 98.93 84.86 27,451 97.47 97.18 16,124
  23 98.93 78.20 31,112 95.01 76.88 26,747 98.34 90.20 26,647 98.56 98.32 15,408
  20 94.74 78.46 30,029 90.57 76.60 25,789 96.61 89.98 26,545 93.94 93.82 15,587
Clark-l 27 98.45 62.30 1,525 92.11 69.64 861 95.96 52.00 1,705 99.49 98.94 143
  1. Performance statistics for several choices of the k-mer length for NBC, KRAKEN, CLARK and their fast variants on the classification of “HiSeq”, “MiSeq”, “simBA-5” and “simHC.20.500” metagenomic datasets against the 695 genus-level targets; precision and sensitivity are expressed as percentages, while speed is expressed in 103 reads per minute; KRAKEN-Q and CLARK-E are faster, but less accurate, variants of these tools; CLARK-l is a less memory-intensive version of CLARK which runs only for k = 27; experiments were carried out in single-threaded mode; parameter k is referred as N in the NBC manuscript.