Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping

Table 3 Comparison of classification accuracy using top ranked features for platform transition

Feature selection	CV (#)				SVM-RFE (#)				RF_based_FS (#)
Classifier	FC	Equal-W	Equal-F	k-means	FC	Equal-W	Equal-F	k-means	FC	Equal-W	Equal-F	k-means
SVM	43.4 (500)	35.5 (80)	100 (700)	92.1 (300)	51.3 (400)	75.0 (200)	100 (1000)	73.6 (60)	48.6 (20)	39.4 (50)	97.3 (600)	92.1 (200)
RF	69.7 (300)	84.2 (1000)	97.3 (1000)	89.4 (600)	61.8 (60)	89.4 (700)	96.0 (1000)	81.5 (100)	73.6 (40)	85.5 (100)	97.3 (800)	88.1 (300)
NB	27.6 (800)	30.2 (10)	92.1 (500)	75 (200)	35.5 (40)	38.1 (10)	85.5 (600)	67.1 (60)	35.5 (200)	34.2 (20)	94.7 (600)	78.9 (90)
PAM	44.7 (300)	26.3 (10)	92.1 (400)	76.3 (300)	44.7 (900)	39.4 (600)	89.4 (400)	60.5 (60)	46.0 (10)	34.2 (10)	93.4 (500)	82.8 (200)

# Number of variables in the classification model
Comparison of classification methods trained on exon-array (342 samples) and tested on RNA-seq (76 samples). The best accuracy (percentage of samples correctly predicted) achieved by each combination of the four classifiers and three feature selection schemes are presented, with number of features used in the best fitted model is shown in parenthesis. The models were built by stepwise addition of feature variables into the model by considering the top 1,000 ranked feature variables. Highest accuracy, achieved with the least number of features, for each classification method is marked in bold.

ISSN: 1471-2164