Identification of biomarkers that distinguish chemical contaminants based on gene expression profiles

BMC Genomics

Table 1 Prediction of 14 class compounds using independent dataset

Feature selection methods	Feature number	Classification algorithm	Training accuracy (%) (D2*)	Prediction accuracy (%) (D2 to D1*)
SVM-RFE	200	LibSVM	100	64.9
Relief	500	SL	83.7	72.6
Inforgain	500	SL	83.7	66.7
Chisquare	400	SL	82.6	72.6
Gainratio	500	SL	82.9	66.1
PCA	200	SMO	73.3	66.7
Gradient	300	SMO	85.7	79.7

*Dataset 1 (D1) has a total of 168 array samples what were produced in 2007, and dataset 2 (D2) includes 363 array samples that were hybridized in 2008. For each dataset, a complete set of 105 compounds were included.

ISSN: 1471-2164