Skip to main content

Table 5 Evaluations on the viral and the 14-bacterial-species data sets. The viral set was clustered with an identity score of 50%; it includes nine clusters representing nine viruses. The 14-bacterial-species set was clustered with multiple identity scores; it includes 14 clusters representing 14 bacterial species. We mark MeShClust v3.0 with “auto” when the threshold is estimated automatically, otherwise a specific threshold is provided to the tool

From: MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores

Tool

Purity ()

Jaccard ()

G-Measure ()

Cluster quality ()

Coverage ()

Time ()

Memory (GB) ()

Viral data set

CD-HIT

0.97

0.67

0.77

0.78

0.95

00:00:02

0.18

MeShClust v1.0

0.91

0.72

0.83

0.81

0.98

00:00:28

0.08

MeShClust v3.0

0.96

0.56

0.72

0.71

0.72

00:00:07

0.12

UCLUST

1.00

0.26

0.46

0.43

0.64

00:00:17

0.08

14-bacterial-species set

MeShClust v3.0 (0.80)

1.00

1.00

1.00

1.00

1.00

00:29:36

14.09

MeShClust v3.0 (0.85)

1.00

0.93

0.97

0.97

0.91

00:46:01

14.11

MeShClust v3.0 (0.90)

1.00

0.64

0.76

0.78

0.96

00:42:54

14.21

MeShClust v3.0 (auto)

1.00

0.62

0.73

0.77

0.93

02:48:41

14.21