Skip to main content

Table 2 Effects of data preprocessing on SNP calling accuracy

From: Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data

Call set
(QUAL > = 50)
Site discovery
  No. SNPs Ti/Tv ratio
  All Known Novel dbSNP% Known Novel
raw 640946 499377 141569 77.91% 2.19 1.65
filterY 630641 490722 139919 77.81% 2.19 1.65
trim 651391 502951 148440 77.21% 2.18 1.58
filterY&trim 640487 493741 146746 77.08% 2.18 1.58
  1. raw: without any preprocessing steps; filterY: removing those reads that fail the Illumina chastity filter; trim: trimming off low-quality tails from reads with the BWA parameter (-q 15); filterY&trim: removing those reads that fail the Illumina chastity filter and trimming off low quality tails. SNPs were called for five samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > = 50 were considered as potentially variable sites.