Skip to main content

Table 2 Effects of data preprocessing on SNP calling accuracy

From: Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data

Call set

(QUAL > = 50)

Site discovery

 

No. SNPs

Ti/Tv ratio

 

All

Known

Novel

dbSNP%

Known

Novel

raw

640946

499377

141569

77.91%

2.19

1.65

filterY

630641

490722

139919

77.81%

2.19

1.65

trim

651391

502951

148440

77.21%

2.18

1.58

filterY&trim

640487

493741

146746

77.08%

2.18

1.58

  1. raw: without any preprocessing steps; filterY: removing those reads that fail the Illumina chastity filter; trim: trimming off low-quality tails from reads with the BWA parameter (-q 15); filterY&trim: removing those reads that fail the Illumina chastity filter and trimming off low quality tails. SNPs were called for five samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > = 50 were considered as potentially variable sites.