Skip to main content

Table 3 Effects of duplicate marking, realignment & recalibration on SNP calling accuracy

From: Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data

Call set

Site discovery

 

No. SNPs

Ti/Tv ratio

 

All

Known

Novel

dbSNP%

Known

Novel

Deep coverage with QUAL > 50

initial

96472

71534

24938

74.15%

2.50

1.73

realignment

94595

71374

23221

75.45%

2.50

1.84

recalibration

96316

71518

24798

74.25%

2.50

1.75

mark duplicate

96303

71502

24801

74.24%

2.50

1.73

Shallow coverage with QUAL > 20

initial

780490

607178

173312

77.79%

2.13

1.39

realignment

776560

606806

169754

78.14%

2.13

1.41

recalibration

783387

609601

173786

77.81%

2.13

1.40

mark duplicate

738198

583829

154369

79.09%

2.13

1.53

  1. SNPs were called for 5 samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > 50 for deep-coverage or QUAL > 20 for shallow coverage were considered as potentially variable sites.