Table 2 Effect of merging short one-sequence LCBs

From: seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

  Total alignment Mean number of Number of Number of short Precision Recall F-Score
  length sequences in LCB short LCBs one-sequence LCBs    
Simulated dataset (13 genomes)
 With merging step 4809015 9.2 0 0 0.993 0.475 0.643
 Without merging step 4789770 5.5 318 156 0.993 0.475 0.643
M. tuberculosis dataset (43 genomes)
 With merging step 4826979 16.1 0 0 - - -
 Without merging step 4859842 7.5 154 109 - - -
  1. We compare the results from sequentially aligning two genome datasets including and excluding the merging step in the workflow. For estimation of the fragmentation of the alignment we compare the total alignment length, the number of sequences per block and the number of small (< 10 bp) LCBs and focus on the ones containing only sequences from one genome. By comparing the precision, recall and F-score of both alignments compared to the true alignment of the simulated dataset we show that the accuracy of the alignment is not affected by the merging step