Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing

ElSharawy, Abdou; Warner, Jason; Olson, Jeff; Forster, Michael; Schilhabel, Markus B; Link, Darren R; Rose-John, Stefan; Schreiber, Stefan; Rosenstiel, Philip; Brayer, James; Franke, Andre

doi:10.1186/1471-2164-13-500

Table 3 Efficiency and SNP detection rates of non-barcoded and pooled samples

From: Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing

Minimum read count for SNP call	Library ID	Positive control SNPs	Positive control SNPs in sample	Total SNPs in sample	Sensitivity	False discovery rate
5	768_1L	244	226	376	92,6%	39,9%
	768_2L	244	230	371	94,3%	38,0%
10	768_1L	244	212	277	86,9%	23,5%
	768_2L	244	212	267	86,9%	20,6%
20	768_1L	244	193	214	79,1%	9,8%
	768_2L	244	198	221	81,1%	10,4%

- Minimum read count for SNP call: Minimum number of non-reference allele counts required for a SNP to be considered detected.
- Positive control SNPs: Positive control SNPs generated from the non-pooled, non-barcoded data (759–764). Since the HapMap genotyping data was incomplete, even for known SNPs, we attempted to create a positive control set of SNPs within the targeted regions. If the SNP was detected within samples 759–764, a combined genotype was determined for that SNP position. For example, position X was determined to have a “CG” genotype in sample 759 and position X had the reference genotype of “CC” in samples 760–764, the predicted allele frequency would be 8.3% (1 in 12). In the non-pooled samples, a SNP with a non-reference allele frequency of 10-90% was considered a heterozygote. A homozygous SNP in non-pooled samples was defined as having >90% non-reference allele frequency. The number in this column represents the total number of SNPs that have a non-reference allele within a given pooled sample. Note that these positive control SNPs include HapMap samples with rs IDs, non-HapMap samples with rs IDs, and potentially novel SNPs.
- Positive control SNPs found: This number represents the number of positive control SNPs that were detected in a given pool with a given set of parameters.
- Total SNPs detected: This number represents the total number of SNPs found in a given pool with a given set of parameters. This number contains the “positive SNPs found” plus other SNPs. It is assumed that most of these SNPs are false positives since this number decreases significantly if you increase the stringency of your SNP detection parameters. However, some novel SNPs could exist in this set.
- Sensitivity: In this case, this is simply the percentage of positive controls SNPs found in a given pool with a given set of parameters. Sensitivity decreases as SNP detection stringency increases.
- False Discovery Rate: This was defined as (total SNPs detected – positive control SNPs found)/Total SNPs detected * 100.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com

BMC Genomics

Contact us