seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

BMC Genomics

Table 2 Effect of merging short one-sequence LCBs

	Total alignment	Mean number of	Number of	Number of short	Precision	Recall	F-Score
	length	sequences in LCB	short LCBs	one-sequence LCBs
Simulated dataset (13 genomes)
With merging step	4809015	9.2	0	0	0.993	0.475	0.643
Without merging step	4789770	5.5	318	156	0.993	0.475	0.643
M. tuberculosis dataset (43 genomes)
With merging step	4826979	16.1	0	0	-	-	-
Without merging step	4859842	7.5	154	109	-	-	-

We compare the results from sequentially aligning two genome datasets including and excluding the merging step in the workflow. For estimation of the fragmentation of the alignment we compare the total alignment length, the number of sequences per block and the number of small (< 10 bp) LCBs and focus on the ones containing only sequences from one genome. By comparing the precision, recall and F-score of both alignments compared to the true alignment of the simulated dataset we show that the accuracy of the alignment is not affected by the merging step

ISSN: 1471-2164