Skip to main content

Table 2 Model parameters and performances

From: Maximizing biomarker discovery by minimizing gene signatures

UniqueModelID

BR_D_Model

Swap_BR_D_Model

BR_E_Model

SwapBR_E_Model

Endpoint

D

D

E

E

Dataset

training dataset

validation dataset

training dataset

validation dataset

Samples

130

100

130

100

Features

32

33

55

10

Normalization

MAS5

MAS5

MAS5

MAS5

Batch Effect Removal Method

AGC

AGC

none

None

Feature Selection Method

MCC-robustness

MCC-robustness

MCC-robustness

MCC-robustness

Classification Method

SVM

SVM

SVM

SVM

Internal Validation

5F-CV

5F-CV

5F-CV

5F-CV

Validation Iterations

10

10

10

10

MFS Fitting Index

index1

index1

MCC

MCC

MFS Optimized Method

SVM

SVM

SVM

SVM

MFS Best Fitting Model

yes

yes

yes

yes

CV_MCC

0.707

0.689

0.904

0.942

CV_ACC

0.892

0.827

0.955

0.972

CV_SEN

0.915

0.673

0.947

0.955

CV_SPE

0.815

0.981

0.959

0.983

MCC_Std Dev

0.030

0.082

0.029

0.021

ACC_Std Dev

0.011

0.048

0.014

0.010

SEN_Std Dev

0.011

0.091

0.017

0.024

SPE_Std Dev

0.026

0.013

0.013

0.000

Val_MCC

0.395

0.368

0.819

0.661

Val_ACC

0.850

0.792

0.910

0.838

Val_SEN

0.907

0.714

0.841

0.914

Val_SPE

0.500

0.802

0.964

0.811

  1. ACC is short for Accuracy, SEN for Sensitivity, SPE for specificity and StdDev for standard deviation. The CV rows refer to internal validation and the Val rows refer to validation of the training dataset against the validation dataset. We balanced the training dataset for Swap_BR_D_Model, as the P/N ratio is too small (0.18), as MCC is very sensitive and its value might change a lot even for a small predictive error in P/N ratio (positive/negative ratio) unbalanced datasets. Features of BR_D_Model and BR_E_Model are available at Additional file 17.