Skip to main content

Table 1 Feature selection: Average testing set error rate and sparsity (in parentheses) for 10 random partitions of the data

From: Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons

Dataset

SCM

χ 2+CART

χ 2+L1SVM

χ 2+L2SVM

χ 2+SCM

Baseline

C. difficile

      

Azithromycin

0.030 (3.3)

0.086 (7.2)

0.064 (20326.0)

0.056 (106)

0.075 (3.0)

0.446

Ceftriaxone

0.073 (2.6)

0.117 (6.8)

0.087 (8114.1)

0.102 (106)

0.111 (3.2)

0.306

Clarithromycin

0.011 (3.0)

0.070 (8.0)

0.062 (36686.1)

0.059 (106)

0.069 (3.5)

0.446

Clindamycin

0.021 (1.4)

0.011 (2.0)

0.009 (598.2)

0.021 (106)

0.008 (2.3)

0.136

Moxifloxacin

0.020 (1.0)

0.020 (1.3)

0.020 (25.6)

0.048 (106)

0.021 (1.1)

0.390

M. tuberculosis

      

Ethambutol

0.179 (1.4)

0.185 (1.9)

0.153 (201.3)

0.221 (106)

0.174 (3.2)

0.351

Isoniazid

0.021 (1.0)

0.021 (1.1)

0.017 (104.7)

0.125 (106)

0.021 (1.2)

0.421

Pyrazinamide

0.318 (3.1)

0.371 (4.4)

0.353 (481.2)

0.342 (106)

0.366 (5.8)

0.347

Rifampicin

0.031 (1.4)

0.031 (1.5)

0.031 (130.0)

0.196 (106)

0.029 (1.3)

0.452

Streptomycin

0.050 (1.0)

0.052 (1.6)

0.043 (98.8)

0.137 (106)

0.050 (2.1)

0.435

P. aeruginosa

      

Amikacin

0.175 (4.9)

0.206 (14.1)

0.187 (11514.6)

0.164 (106)

0.164 (9.7)

0.216

Doripenem

0.270 (1.4)

0.261 (1.9)

0.261 (950.0)

0.275 (106)

0.307 (8.5)

0.359

Levofloxacin

0.072 (1.2)

0.076 (1.0)

0.085 (148.9)

0.212 (106)

0.083 (3.5)

0.463

Meropenem

0.267 (1.6)

0.261 (1.0)

0.328 (5368.5)

0.327 (106)

0.331 (9.1)

0.404

S. pneumoniae

      

Benzylpenicillin

0.013 (1.1)

0.012 (2.3)

0.011 (124.9)

0.013 (106)

0.013 (1.3)

0.073

Erythromycin

0.037 (2.0)

0.047 (3.8)

0.041 (328.8)

0.042 (106)

0.041 (5.1)

0.142

Tetracycline

0.031 (1.1)

0.029 (1.2)

0.032 (1108.5)

0.037 (106)

0.033 (2.2)

0.106

  1. Results are shown for the SCM, which uses the entire feature, and the feature selection-based methods: χ 2+CART, χ 2+L1SVM, χ 2+L2SVM and χ 2+SCM. The baseline method predicts the most abundant class in the training set. The smallest error rates are in bold