Skip to main content

Table 1 The synthetic datasets and the number of simulated sequence variations. The Average Sequence Identity (ASI) is estimated by the total mismatches divided by the number of nucleobases

From: GSAlign: an efficient sequence alignment tool for intra-species genomes

Dataset

Genome size

SNV

Small indel

large indel

ASI

simHG-1X

3,088,279,342

58,421,383

1,001,626

285,757

97.93%

simHG-3X

3,088,292,247

175,100,939

962,721

275,584

93.86%

simHG-5X

3,088,289,999

291,714,646

919,762

263,271

89.90%

NA12878

6,070,700,436

3,088,156

531,315

NA

99.84%