The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations

BMC Genomics

Table 3 Performance of different prediction methods using a balanced dataset of mutations that map to structure extracted from HumVar.

Method	Cross-validated	MCC	Acc
SAAPpred	Yes	0.692	0.846
SAAPpred	No	0.894	0.944
PolyPhen2	No	0.572	0.785
SIFT	?	0.528	0.763
MutationAssessor	N/A	0.453	0.698

The values for the cross-validated assessment of SAAPpred were obtained from 10-fold cross-validation performed during the Weka training and used all 1540 SNPs from HumVar that mapped to structure with a random sample of 1540 of the 7182 PDs that mapped to structure. This was repeated 10-times and the results averaged. Non cross-validated results were performed by using a slightly smaller set of 1451 SNPs that mapped to structure and could be assessed by all the other methods together with a random sample of 1451 PDs that could be assessed by all methods. Again this was repeated 10-times and the results averaged. The non-cross-validated values for SAAPpred give the fairest comparison with PolyPhen2 which is trained on the HumVar dataset. It is unclear exactly what data were used in training the most recent version of SIFT so there may be some overlap between training and test sets while MutationAssessor has no training set per se.

ISSN: 1471-2164