Skip to main content

Table 4 Maximum classification accuracy (of the accuracies obtained with each of the six classifiers) of ML-DSP, for datasets at different taxonomic levels, from ‘domain into kindgoms’ down to ‘family into genera’

From: ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels

Test

No. of Seq.

Max Length

Min Length

Median Length

Mean Length

Numerical representation maximum accuracy

      

PP

Real

Just-A

Random3*

Random13**

Domain to Kingdom

7396

1999595

1136

16580

25434

96.2%

97.3%

96.1%

95.5%

92.8%

Domain:Eukaryota

          

Kingdoms:

          

Plants:,254, Animals: 6697,

          

Fungi: 267, Protists :178

          

Domain to Kingdom (No Protists)

7218

1999595

1136

16573

25254

97.9%

98.4%

97.9%

97.4%

94.4%

Domain:Eukaryota

          

Kingdoms:

          

Plants:254, Animals: 6697,

          

Fungi: 267

          

Kingdom to Phylum

6673

48161

5596

16553

16474

96.2%

95.9%

95.3%

93.6%

85.6%

Kingdom: Animalia

          

Phylum:

          

Chordata:4367, Cnidaria: 127,

          

Ecdysozoa: 1572, Porifera: 60,

          

Echinodermata: 44, Lophotrochozoa: 403,

          

Platyhelminthes: 100

          

Phylum to SubPhylum

4367

28757

13424

16615

16791

99.7%

99.8%

99.8%

99.5%

99.7%

Phylum:Chordata

          

SubPhylum:Cephalochordata:9,

          

Craniata: 4334, Tunicata:24

          

SubPhylum to Class

4322

28757

14935

16616

16806

99.7%

99.6%

99.3%

99.2%

86.2%

SubPhylum:Vertebrata

          

Class:

          

Amphibians(Amphibia):290,

          

Birds(Aves): 553,

          

Fish(Actinopterygii, Chondrichthyes,

          

Dipnoi, Coelacanthiformes): 2313,

          

Mammals(Mammalia): 874,

          

Reptiles(Crocodylia, Sphenodontia,

          

Squamata, Testudines): 292

          

Class to SubClass

2176

22217

15534

16589

16656

100%

99.9%

99.9%

99.8%

99.2%

Class:Actinopterygii

          

SubClass:

          

Chondrostei: 24, Cladistia: 11,

          

Neopterygii: 2141

          

SubClass to SuperOrder

1488

22217

15534

16597

16669

96.2%

96.4%

95.4%

94.4%

78.8%

SubClass: Neopterygii

          

SuperOrder:

          

Osteoglossomorpha:23, Elopomorpha: 60,

          

Clupeomorpha: 75, Ostariophysi: 792,

          

Protacanthopterygii: 66, Paracanthopterygii: 46,

          

Acanthopterygii:426

          

SuperOrder to Order

781

17859

16123

16597

16621

99.0%

98.7%

98.8%

97.6%

92.2%

SuperOrder:Ostariophysi

          

Order:

          

Cypriniformes: 643, Characiformes: 31,

          

Siluriformes: 107

          

Order to family

635

17859

16411

16601

16627

98.9%

97.8%

98.3%

97.3%

85.7%

Order: Cypriniformes

          

Family:

          

Balitoridae: 25, Catostomidae:12,

          

Cobitidae: 51, Cyprinidae: 502,

          

Nemacheilidae: 47

          

Family to Genus

81

17155

16563

16597

16630

91.8%

92.6%

91.4%

85.2%

66.7%

Family: Cyprinidae

          

Genus:

          

Schizothorax: 19, Labeo: 19,

          

Acrossocheilus: 12, Acheilognathus: 10,

          

Rhodeus: 11, Onychostoma: 10

          

Table Average Accuracy

—–

—–

—–

—–

—–

97.6%

97.6%

97.2%

96.0%

88.1%

  1. At each level, the cluster with the highest number of sequences was chosen as the next dataset to be classified into its sub-taxa. *Random3: each sequence is represented by a random representation among PP, Real, or Just-A. **Random13: each sequence is represented by random representation among 13 representations (Integer, Integer(Other), Real, Atomic, EIIP, PP, Paired Numeric, Nearest neighbor based doublet, Codon, Just-A, Just-C, Just-G or Just-T)