FastQ dataset used to validate KvarQ. A total of 880 whole genome sequences in FastQ format from various sources were used in this study. All 880 genome sequences were scanned for phylogenetic classification and drug resistance mutation identification. Different, overlapping subsets were used to i) compare SNP calls obtained with KvarQ with SNP calls of our standard SNP-calling pipeline based on BWA and SAMtools for 206 isolates, ii) compare KvarQ phylogenetic classification of 321 MTBC isolates with previous phylogenetic information, iii) compare KvarQ drug resistance mutations with previously identified drug resistance associated mutations in 19 MTBC isolates, and iv) obtain additional information, i.e. phylogenetic classification and drug resistance mutations, from a “blind” set of 388 genome sequences from a recent study . More information can be found in Additional file 2.