Skip to main content

Advertisement

Figure 8 | BMC Genomics

Figure 8

From: The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations

Figure 8

Performance of the machine learning method trained on different sized sets of data from SAAPdb. In each case, a balanced dataset of the required size was extracted at random from the SAAPdb dataset of mutations mapped to protein chains (Table 2) and random forests were trained and tested using 10-fold cross-validation. The graph clearly shows that performance drops as the dataset size decreases, showing a marked drop in performance with datasets below 10,000 samples in size (5,000 SNPs and 5,000 PDs).

Back to article page