Skip to main content

Table 2 Effects of standard variant filtering methods on precision and recall

From: Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery

Caller and filtering strategy

Type

Filtering effects on SNPsa

Filtering effects on indelsa

Raw calls F1

Precision gain

Recall loss

Raw calls F1

Precision gain

Recall loss

DeepVariant

(default filter)

WGS

0.996

n.a.b

n.a.b

0.988

n.a.b

n.a.b

WES

0.996

n.ab

n.a.b

0.990

n.a.b

n.a.b

Clair3  (default filter)

WGS

0.991

0.0

0.0

0.983

0.0

0.0

WES

0.991

0.0

0.0

0.975

0.0045

0

Octopus  (standard filter)

WGS

0.987

0.0129

−0.0120

0.973

0.0328

− 0.0153

WES

0.992

0.0049

− 0.0028

0.967

0.0429

−0.0056

Octopus  (random forest filter)

WGS

0.987

0.0133

−0.0003

0.973

0.0379

0.0

WES

0.992

0.0104

−0.0471

0.967

0.0518

−0.0360

Strelka2

(default filter)

WGS

0.936

0.1152

− 0.0034

0.980

0.0125

−0.0026

WES

0.980

0.0274

−0.0026

0.969

0.0120

−0.0056

GATK (1D CNN, tranches 99.9/99.5)

WGS

0.987

0.0102

−0.0014

0.971

0.0189

0.0

WES

0.988

0.0039

−0.0063

0.962

0.0196

−0.0532

GATK (2D CNN, tranches 99.9/99.5)

WGS

0.987

0.0099

−0.0006

0.971

0.0310

0.0

WES

0.988

0.0108

−0.3548

0.962

0.0219

0.0909

GATK (recommended hard filtering)

WGS

0.987

0.0056

−0.0133

0.971

0.0067

−0.0052

WES

0.988

0.0078

−0.0125

0.962

0.0217

−0.0027

Freebayes (standard quality-based filter)

WGS

0.975

0.0469

−0.0301

0.950

0.0661

−0.0971

WES

0.983

0.0206

−0.0087

0.948

0.0695

0.0328

  1. aMedian values across all samples are shown, greater absolute value (precision gain/recall loss) for each filtering strategy is highlighted in bold; beffects of filtering on DeepVariant calls could not be assessed due to the structure of the output files