Skip to main content

Table 1 Genus-level classification accuracy and speed of CLARK, KRAKEN , and NBC for four simulated metagenomes and several k -mer length

From: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

 

k

HiSeq

MiSeq

simBA-5

simHC.20.500

  

Prec

Sens

Speed

Prec

Sens

Speed

Prec

Sens

Speed

Prec

Sens

Speed

NBC

15

82.57

82.57

0.008

81.00

81.00

0.007

97.69

97.69

0.007

99.40

99.40

0.005

 

13

78.85

78.85

0.011

77.70

77.70

0.009

92.41

92.41

0.010

98.57

98.57

0.006

 

11

58.97

58.97

0.020

64.43

64.43

0.016

46.10

46.10

0.017

86.83

86.83

0.008

Clark(full)

31

99.26

77.78

541

95.33

77.69

435

98.88

89.67

591

99.68

99.42

121

 

27

98.98

79.88

538

93.50

78.57

433

98.90

93.09

585

99.67

99.42

122

 

23

97.33

81.97

530

90.06

80.02

426

98.71

94.54

559

99.59

99.42

119

 

20

87.00

82.87

532

82.45

80.19

420

97.38

94.80

549

99.43

99.41

115

Kraken

31

99.26

77.76

2,332

95.50

77.59

1,361

98.28

89.35

1,976

96.83

96.55

237

 

27

99.01

79.85

2,048

93.91

78.47

1,240

98.31

92.73

1,917

96.85

96.57

231

 

23

97.45

81.89

1,923

90.56

79.75

1,186

98.25

94.18

1,824

96.80

96.57

228

 

20

90.22

82.67

1,546

86.28

79.99

965

98.07

94.44

1,478

96.71

96.59

211

Clark

31

99.31

77.25

3,116

95.66

77.44

1,670

98.91

88.62

2,855

99.68

99.42

251

 

27

99.07

79.37

2,796

93.90

78.29

1,522

98.90

92.26

2,554

99.67

99.42

241

 

23

97.85

81.36

2,679

90.98

79.57

1,482

98.75

94.26

2,394

99.60

99.42

244

 

20

88.60

82.26

2,567

83.35

79.77

1,456

97.73

94.49

2,306

99.43

99.41

239

Kraken-Q

31

99.20

76.84

6,224

95.81

74.13

5,308

98.17

87.46

7,023

91.17

85.79

3,809

 

27

98.79

78.19

6,410

94.12

73.73

5,555

98.11

89.89

7,992

90.99

83.71

4,196

 

23

96.67

78.48

7,015

90.57

72.35

6,329

97.21

89.07

8,989

90.46

79.27

4,574

 

20

82.07

70.11

9,437

80.05

65.25

9,537

90.02

77.04

10,961

70.86

57.40

5,819

Clark-E

31

99.55

72.72

32,450

98.11

74.58

28,988

99.00

77.85

26,171

97.63

97.31

15,426

 

27

99.43

74.67

29,897

96.93

75.68

28,459

98.93

84.86

27,451

97.47

97.18

16,124

 

23

98.93

78.20

31,112

95.01

76.88

26,747

98.34

90.20

26,647

98.56

98.32

15,408

 

20

94.74

78.46

30,029

90.57

76.60

25,789

96.61

89.98

26,545

93.94

93.82

15,587

Clark-l

27

98.45

62.30

1,525

92.11

69.64

861

95.96

52.00

1,705

99.49

98.94

143

  1. Performance statistics for several choices of the k-mer length for NBC, KRAKEN, CLARK and their fast variants on the classification of “HiSeq”, “MiSeq”, “simBA-5” and “simHC.20.500” metagenomic datasets against the 695 genus-level targets; precision and sensitivity are expressed as percentages, while speed is expressed in 103 reads per minute; KRAKEN-Q and CLARK-E are faster, but less accurate, variants of these tools; CLARK-l is a less memory-intensive version of CLARK which runs only for k = 27; experiments were carried out in single-threaded mode; parameter k is referred as N in the NBC manuscript.