Skip to main content

Table 2 SAMQA Biological Tests

From: SAMQA: error classification and validation of high-throughput sequenced read data

Biological Tests Inclusion Criteria
Mapping quality Low Phred-adjusted mapping quality score
Read length Shortened read lengths for a given sequencing technology
Read count Low aggregate number of reads for a given sequencing technology
Read frequency Low number of reads for a given set of kilobase regions
Coverage Low coverage for a given read group, chromosome, or kilobase region
Structural variations High numbers of localized structural variation
Anomalous sequence data Instances of "random" chromosomes from human assembly [8]
Population estimates of structural variation Very high projected structural variation across different platform units
Read group correlation Low mapping quality correlation for megabase regions, across read groups
Low coverage correlation of megabase regions, across read groups
  1. These tests extract useful, biological features from the data for expert analysis. Other extraction tools (e.g. detection of polyadenylation within individual sequences, determinants of the feature-dimensional "shape" of the data, as through multidimensional Bayesian analysis) may be added as appropriate to the data or downstream analysis requires.