Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery

Barbitoff, Yury A.; Abasov, Ruslan; Tvorogova, Varvara E.; Glotov, Andrey S.; Predeus, Alexander V.

doi:10.1186/s12864-022-08365-3

Table 2 Effects of standard variant filtering methods on precision and recall

From: Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery

Caller and filtering strategy	Type	Filtering effects on SNPs^a			Filtering effects on indels^a
Caller and filtering strategy	Type	Raw calls F1	Precision gain	Recall loss	Raw calls F1	Precision gain	Recall loss
DeepVariant (default filter)	WGS	0.996	n.a.^b	n.a.^b	0.988	n.a.^b	n.a.^b
DeepVariant (default filter)	WES	0.996	n.a^b	n.a.^b	0.990	n.a.^b	n.a.^b
Clair3 (default filter)	WGS	0.991	0.0	0.0	0.983	0.0	0.0
Clair3 (default filter)	WES	0.991	0.0	0.0	0.975	0.0045	0
Octopus (standard filter)	WGS	0.987	0.0129	−0.0120	0.973	0.0328	− 0.0153
Octopus (standard filter)	WES	0.992	0.0049	− 0.0028	0.967	0.0429	−0.0056
Octopus (random forest filter)	WGS	0.987	0.0133	−0.0003	0.973	0.0379	0.0
Octopus (random forest filter)	WES	0.992	0.0104	−0.0471	0.967	0.0518	−0.0360
Strelka2 (default filter)	WGS	0.936	0.1152	− 0.0034	0.980	0.0125	−0.0026
Strelka2 (default filter)	WES	0.980	0.0274	−0.0026	0.969	0.0120	−0.0056
GATK (1D CNN, tranches 99.9/99.5)	WGS	0.987	0.0102	−0.0014	0.971	0.0189	0.0
GATK (1D CNN, tranches 99.9/99.5)	WES	0.988	0.0039	−0.0063	0.962	0.0196	−0.0532
GATK (2D CNN, tranches 99.9/99.5)	WGS	0.987	0.0099	−0.0006	0.971	0.0310	0.0
GATK (2D CNN, tranches 99.9/99.5)	WES	0.988	0.0108	−0.3548	0.962	0.0219	0.0909
GATK (recommended hard filtering)	WGS	0.987	0.0056	−0.0133	0.971	0.0067	−0.0052
GATK (recommended hard filtering)	WES	0.988	0.0078	−0.0125	0.962	0.0217	−0.0027
Freebayes (standard quality-based filter)	WGS	0.975	0.0469	−0.0301	0.950	0.0661	−0.0971
Freebayes (standard quality-based filter)	WES	0.983	0.0206	−0.0087	0.948	0.0695	0.0328

^aMedian values across all samples are shown, greater absolute value (precision gain/recall loss) for each filtering strategy is highlighted in bold; ^beffects of filtering on DeepVariant calls could not be assessed due to the structure of the output files

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com

BMC Genomics

Contact us