Skip to main content

Table 2 SAMQA Biological Tests

From: SAMQA: error classification and validation of high-throughput sequenced read data

Biological Tests

Inclusion Criteria

Mapping quality

Low Phred-adjusted mapping quality score

Read length

Shortened read lengths for a given sequencing technology

Read count

Low aggregate number of reads for a given sequencing technology

Read frequency

Low number of reads for a given set of kilobase regions

Coverage

Low coverage for a given read group, chromosome, or kilobase region

Structural variations

High numbers of localized structural variation

Anomalous sequence data

Instances of "random" chromosomes from human assembly [8]

Population estimates of structural variation

Very high projected structural variation across different platform units

Read group correlation

Low mapping quality correlation for megabase regions, across read groups

Low coverage correlation of megabase regions, across read groups

  1. These tests extract useful, biological features from the data for expert analysis. Other extraction tools (e.g. detection of polyadenylation within individual sequences, determinants of the feature-dimensional "shape" of the data, as through multidimensional Bayesian analysis) may be added as appropriate to the data or downstream analysis requires.