Skip to main content

Table 5 COAD/READ SAMQA Results

From: SAMQA: error classification and validation of high-throughput sequenced read data

Sample Group

Anomaly

Files affected

 

"CIGAR should have zero elements for unmapped read"

150 files

236 COAD/READ exon capture sequence files.

Files were completely unpaired in sequencing

5 files

 

Files contained unpaired reads

1 file

 

"CIGAR should have zero elements for unmapped read"

48 files

48 COAD/READ whole genome files

"MAPQ should be 0 for unmapped read"

48 files

 

"RG ID on SAMRecord not found in header"

18 files

  1. The technical tests were run across 236 exon capture sequences files and 48 full genome sequence files for COAD/READ cancer samples from the TCGA project. The results identified problems with the files, and also identified 6 files that could not be used for further analysis. The mapping issues found in the whole genome and exon datasets are due to a documented issue within the alignment tools where BWA maps beyond the reference. The tool flags it as an error, but it is non-fatal to SAMQA. The files and SAMQA output are provided in additional file 1.