Skip to main content

Table 2 Effect of merging short one-sequence LCBs

From: seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

 

Total alignment

Mean number of

Number of

Number of short

Precision

Recall

F-Score

 

length

sequences in LCB

short LCBs

one-sequence LCBs

   

Simulated dataset (13 genomes)

 With merging step

4809015

9.2

0

0

0.993

0.475

0.643

 Without merging step

4789770

5.5

318

156

0.993

0.475

0.643

M. tuberculosis dataset (43 genomes)

 With merging step

4826979

16.1

0

0

-

-

-

 Without merging step

4859842

7.5

154

109

-

-

-

  1. We compare the results from sequentially aligning two genome datasets including and excluding the merging step in the workflow. For estimation of the fragmentation of the alignment we compare the total alignment length, the number of sequences per block and the number of small (< 10 bp) LCBs and focus on the ones containing only sequences from one genome. By comparing the precision, recall and F-score of both alignments compared to the true alignment of the simulated dataset we show that the accuracy of the alignment is not affected by the merging step