Skip to main content

Table 2 Maximum classification accuracy scores when using Euclidean vs. Pearson’s correlation coefficient (PCC) as a distance measure

From: ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels

     

Maximum accuracy

     

Euclidean

PCC

Data Set

No. of Seq.

Max Length (bp)

Min Length (bp)

Median Length (bp)

Norm. to Max Length (a)

Norm. to Median Length (b)

Norm. to Max Length (c)

Norm. to Median Length (d)

Primates (Haplorrhini: 115, Strepsirrhini: 33)

148

17531

15467

16554

98.6%

100%

100%

100%

Protists (Alveolata: 34, Rhodophyta: 46, Stramenopiles: 79)

159

77356

5882

35660

89.3%

90.6%

96.2%

91.2%

Fungi (Basidiomycota: 30, Pezizomycotina: 104, Saccharomycotina:92)

226

235849

1364

39154

70.1%

82.6%

87.9%

89.3%

Plants (Chlorophyta: 44, Streptophyta: 130)

174

1999595

12998

128211

95.4%

94.8%

90.2%

91.4%

Amphibians (Anura: 161, Caudata:95, Gymnophiona: 34)

290

28757

15757

17271

95.2%

97.6%

98.3%

99.0%

Mammals (Xenarthrans: 30, Bats: 54, Carnivores: 135, Even-toed Ungulates: 242, Insectivores: 40, Marsupials: 34, Primates: 148, Rodents and Rabbits: 147)

830

17734

15289

16537

95.2%

96.1%

97.8%

97.1%

Insects (Coleoptera: 95, Dictyptera: 77, Diptera: 149, Hemiptera: 126, Hymenoptera: 47, Lepidoptera:294, Orthoptera: 110)

898

20731

10662

15529

87.9%

90.0%

91.3%

94.2%

3 classes (Amphibians: 290, Mammals: 874, Insects: 1006)

2170

28757

8118

16361

99.9%

99.7%

99.8%

99.7%

Vertebrates (Amphibians: 290, Birds: 553, Fish: 2313, Mammals: 874, Reptiles: 292)

4322

28757

14935

16616

99.6%

99.8%

99.6%

99.7%

Table Average Accuracy

——

——

——

——

92.4%

94.6%

95.7%

95.7%

  1. (a)(c) Genomes normalized to the maximum genome sequence length; (b)(d) Genomes normalized to the median genome sequence length