A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing

BMC Genomics

Table 2 Model performance

	Present	Not present
High confidence variant	6622 / 6622 (100.0%) True positive prediction	0 / 6622 (0%) False positive prediction
Low confidence variant	44 / 557 (7.9%) False negative prediction	513 / 557 (92.1%) True negative prediction

Results of a 10-fold cross-validation of the model on all 7179 variants. Variants called in the NGS pipeline were tested by Sanger sequencing. Those that were confirmed by Sanger sequencing are reported here as “Present”, and those that did not confirm are reported here as “Not Present”. All the variants were evaluated by the machine learning model and categorized as a “High confidence variant” or a “Low confidence variant”

ISSN: 1471-2164