Skip to main content

Table 3 Evaluations on the synthetic testing data sets. The first set of evaluations was conducted on 12 sets, each of which includes clusters whose members are 80%, 90%, 95%, or 97% identical to a template sequence, i.e., a true center. The second set of evaluations was conducted on six data sets representing clusters of degenerate sequences (e.g., members are 60% or 70% identical to true centers). Each set of the first and second sets of evaluations includes less than 25k sequences. The third set of evaluations was conducted on four data sets, each of which includes more than one million sequences (80%, 90%, 95%, or 97% identical to true centers). All clusters in the same data set have the same minimum identity score. For example, cluster members of the Short-97 data set are 97.00–99.99% identical to the true centers. The direction of the arrow next to each criterion indicates whether a high or a low value is better. We mark MeShClust v3.0 with “auto” when the threshold is estimated automatically, otherwise a specific threshold is provided to the tool

From: MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores

Tool

Purity ()

Jaccard ()

G-Measure ()

Cluster quality ()

Coverage ()

Centers ()

Time ()

Memory (GB) ()

Short, Medium, and Long: 80–97%

CD-HIT

0.92

0.19

0.33

0.29

0.92

0.01

00:71:00

0.36

MeShClust v1.0

0.99

0.92

0.93

0.94

0.99

0.35

00:00:26

0.20

MeShClust v3.0

1.00

1.00

1.00

1.00

1.00

0.78

00:05:18

6.55

MeShClust v3.0 (auto)

1.00

1.00

1.00

1.00

1.00

0.80

00:12:08

6.59

UCLUST

0.70

0.08

0.20

0.15

0.70

0.00

00:00:16

0.12

Short, Medium, and Long: 60–70%

CD-HIT

1.00

0.14

0.28

0.24

0.90

0.01

01:23:46

0.35

MeShClust v1.0

0.98

0.93

0.96

0.96

1.00

0.53

00:00:25

0.19

MeShClust v3.0

1.00

1.00

1.00

1.00

1.00

0.76

00:11:44

5.79

MeShClust v3.0 (auto)

1.00

1.00

1.00

1.00

0.98

0.65

00:14:32

5.83

UCLUST

1.00

0.22

0.34

0.34

0.83

0.01

00:00:28

0.08

Numerous: 80–97%

CD-HIT

1.00

0.48

0.59

0.62

1.00

0.00

00:39:31

0.91

MeShClust v1.0

1.00

0.81

0.83

0.87

0.99

0.01

00:19:04

2.58

MeShClust v3.0

1.00

0.99

0.99

1.00

1.00

0.15

02:58:15

12.76

MeShClust v3.0 (auto)

1.00

0.98

0.99

0.99

0.99

0.15

02:50:40

13.06

UCLUST

1.00

0.07

0.20

0.15

0.89

0.00

00:06:41

0.72