Skip to main content

Table 3 Run Time for SAMQA on Hadoop

From: SAMQA: error classification and validation of high-throughput sequenced read data

Validation Test One Patient, aggregate time in minutes (over 23GB) Ten Patients, aggregate time in minutes (over 405GB) One Hundred Patients, aggregate time in minutes (over 3,904GB)
Mapping Ratio Test 1.10 6.98 66.91
Sequence Validation 5.40 66.11 621.54
Read Group Validation 8.40 86.38 670.79
Mapping Quality Test 9.03 57.27 706.02
Read Statistics 11.28 67.49 753.00
File Structure Validation 15.10 59.57 700.67
Chromosome Check 63.15 102.02 768.37
Pearson Coverage Correlation 279.41 280.50 3211.33
Pearson Mapping Quality Correlation 702.28 484.92 3327.94
  1. This table lists required time to completion at multiples of ten samples over an 80-core computational cluster running Hadoop. Each validation step includes a full Map-and-Reduce pass. All times are given in aggregate time to completion, assuming each file is scheduled for analysis after the previous one completes. The files selected are listed in additional file 2. File size is the only consideration for speed comparisons.