Skip to main content

Table 2 Model parameters and performances

From: Maximizing biomarker discovery by minimizing gene signatures

UniqueModelID BR_D_Model Swap_BR_D_Model BR_E_Model SwapBR_E_Model
Endpoint D D E E
Dataset training dataset validation dataset training dataset validation dataset
Samples 130 100 130 100
Features 32 33 55 10
Normalization MAS5 MAS5 MAS5 MAS5
Batch Effect Removal Method AGC AGC none None
Feature Selection Method MCC-robustness MCC-robustness MCC-robustness MCC-robustness
Classification Method SVM SVM SVM SVM
Internal Validation 5F-CV 5F-CV 5F-CV 5F-CV
Validation Iterations 10 10 10 10
MFS Fitting Index index1 index1 MCC MCC
MFS Optimized Method SVM SVM SVM SVM
MFS Best Fitting Model yes yes yes yes
CV_MCC 0.707 0.689 0.904 0.942
CV_ACC 0.892 0.827 0.955 0.972
CV_SEN 0.915 0.673 0.947 0.955
CV_SPE 0.815 0.981 0.959 0.983
MCC_Std Dev 0.030 0.082 0.029 0.021
ACC_Std Dev 0.011 0.048 0.014 0.010
SEN_Std Dev 0.011 0.091 0.017 0.024
SPE_Std Dev 0.026 0.013 0.013 0.000
Val_MCC 0.395 0.368 0.819 0.661
Val_ACC 0.850 0.792 0.910 0.838
Val_SEN 0.907 0.714 0.841 0.914
Val_SPE 0.500 0.802 0.964 0.811
  1. ACC is short for Accuracy, SEN for Sensitivity, SPE for specificity and StdDev for standard deviation. The CV rows refer to internal validation and the Val rows refer to validation of the training dataset against the validation dataset. We balanced the training dataset for Swap_BR_D_Model, as the P/N ratio is too small (0.18), as MCC is very sensitive and its value might change a lot even for a small predictive error in P/N ratio (positive/negative ratio) unbalanced datasets. Features of BR_D_Model and BR_E_Model are available at Additional file 17.