Skip to main content

Table 3 Effects of duplicate marking, realignment & recalibration on SNP calling accuracy

From: Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data

Call set Site discovery
  No. SNPs Ti/Tv ratio
  All Known Novel dbSNP% Known Novel
Deep coverage with QUAL > 50
initial 96472 71534 24938 74.15% 2.50 1.73
realignment 94595 71374 23221 75.45% 2.50 1.84
recalibration 96316 71518 24798 74.25% 2.50 1.75
mark duplicate 96303 71502 24801 74.24% 2.50 1.73
Shallow coverage with QUAL > 20
initial 780490 607178 173312 77.79% 2.13 1.39
realignment 776560 606806 169754 78.14% 2.13 1.41
recalibration 783387 609601 173786 77.81% 2.13 1.40
mark duplicate 738198 583829 154369 79.09% 2.13 1.53
  1. SNPs were called for 5 samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > 50 for deep-coverage or QUAL > 20 for shallow coverage were considered as potentially variable sites.