Skip to main content

Advertisement

Table 3 Features of low confidence variants

From: A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing

Low Confidence Present (n = 44) Not Present (n = 513)
  AF < 30% (37) AF ≥ 30% (7) AF < 30% (505) AF ≥ 30% (8)
Low coverage (20–30 reads) 0 2 15 2
Low GC content (GC20 < 0.25) 2 2 23 2
Low GC content (GC50 < 0.25) 1 0 5 0
High GC content (GC20 > 0.75) 1 0 206 1
High GC content (GC50 > 0.75) 0 0 35 1
Homopolymer (> = 10 within 20 bp) 0 3 15 7
Segmental duplication 16 4 67 0
Processed pseudogene 0 0 3 0
Repeatmasker 0 2 6 0
Other 17 2 246 0
  1. Common features of variants classified by the model as low confidence. “Present” indicates that the variant was confirmed present by Sanger sequencing, and “Not Present” indicates that the variant was not present by Sanger sequencing. The number of low confidence variants with AF (allele frequency) less than and greater than or equal to 30% are reported in parentheses. Note that some of the categories are not mutually exclusive