Skip to main content

Table 4 Run Time SAMQA on Hadoop vs. Standard Server

From: SAMQA: error classification and validation of high-throughput sequenced read data

SAMQA Run

Single Core Server (in hours)

Hadoop Cluster (aggregate in hours)

1 Sample (23GB)

1460.22

18.25

10 Samples (405GB)

1615.00

20.19

100 Samples (3904GB)

14435.42

180.44

  1. The SAMQA toolset consists of eleven different technical and biologically relevant tests run over each BAM file. Using the Hadoop cluster and data as referenced in Table 3, all of the tests can be run in parallel decreasing the overall time for analysis (files used for this test are listed in additional file 2).