Performance of the classifier as a function of number of contigs in the training set. We ran the classifier with increasing numbers of contigs, from 50 W & 50 non-W contigs to 200 W & 200 non-W contigs. This was done by subsetting the mapped contigs 100 times: for each iteration, the set of training contigs was randomly selected, and the remainder used for validation. The mean AUC for each training set size is shown. AUC (area under the ROC curve) is a commonly used statistic for model comparison.