Maximizing biomarker discovery by minimizing gene signatures

BMC Genomics

Table 2 Model parameters and performances

UniqueModelID	BR_D_Model	Swap_BR_D_Model	BR_E_Model	SwapBR_E_Model
Endpoint	D	D	E	E
Dataset	training dataset	validation dataset	training dataset	validation dataset
Samples	130	100	130	100
Features	32	33	55	10
Normalization	MAS5	MAS5	MAS5	MAS5
Batch Effect Removal Method	AGC	AGC	none	None
Feature Selection Method	MCC-robustness	MCC-robustness	MCC-robustness	MCC-robustness
Classification Method	SVM	SVM	SVM	SVM
Internal Validation	5F-CV	5F-CV	5F-CV	5F-CV
Validation Iterations	10	10	10	10
MFS Fitting Index	index1	index1	MCC	MCC
MFS Optimized Method	SVM	SVM	SVM	SVM
MFS Best Fitting Model	yes	yes	yes	yes
CV_MCC	0.707	0.689	0.904	0.942
CV_ACC	0.892	0.827	0.955	0.972
CV_SEN	0.915	0.673	0.947	0.955
CV_SPE	0.815	0.981	0.959	0.983
MCC_Std Dev	0.030	0.082	0.029	0.021
ACC_Std Dev	0.011	0.048	0.014	0.010
SEN_Std Dev	0.011	0.091	0.017	0.024
SPE_Std Dev	0.026	0.013	0.013	0.000
Val_MCC	0.395	0.368	0.819	0.661
Val_ACC	0.850	0.792	0.910	0.838
Val_SEN	0.907	0.714	0.841	0.914
Val_SPE	0.500	0.802	0.964	0.811

ACC is short for Accuracy, SEN for Sensitivity, SPE for specificity and StdDev for standard deviation. The CV rows refer to internal validation and the Val rows refer to validation of the training dataset against the validation dataset. We balanced the training dataset for Swap_BR_D_Model, as the P/N ratio is too small (0.18), as MCC is very sensitive and its value might change a lot even for a small predictive error in P/N ratio (positive/negative ratio) unbalanced datasets. Features of BR_D_Model and BR_E_Model are available at Additional file 17.

ISSN: 1471-2164