Skip to main content

Advertisement

Table 1 Identification errors of homopolymer length with different methods

From: Improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies

No Nt Pos Count Errors (%)
KNN Torrent suite Bayesian Reference Proposed approach
Weight Errors
1 A 1–75 144230 7.002 1.119 2.296 0.298 0.28 0.250
2 A 76–150 112776 12.121 1.651 4.722 0.489 0.34 0.453
3 A 151–225 97568 18.733 2.926 8.150 0.423 0.14 0.421
4 A 226–300 48033 22.292 4.655 10.259 0.535 0.24 0.510
5 C 1–75 88732 6.534 1.843 2.779 0.034 0.14 0.027
6 C 76–150 77650 10.382 2.489 4.595 0.556 0.36 0.121
7 C 151–225 63658 18.581 3.187 6.383 0.545 0.28 0.542
8 C 226–300 35736 17.910 4.600 6.159 0.926 0.30 0.923
9 G 1–75 97493 4.141 1.422 1.826 0.609 0.30 0.376
10 G 76–150 78192 14.874 1.623 3.864 0.322 0.32 0.152
11 G 151–225 64680 16.868 2.273 5.683 1.062 0.14 1.062
12 G 226–300 34116 18.754 2.492 7.985 0.147 0.12 0.147
13 T 1–75 156550 5.186 1.106 2.504 0.076 0.14 0.054
14 T 76–150 152034 11.446 1.571 5.780 0.342 0.30 0.297
15 T 151–225 111090 14.720 2.331 7.290 0.419 0.32 0.362
16 T 226–300 68448 13.912 3.315 8.240 0.723 0.28 0.599
  1. “Count” means the number of each class of homopolymers. “KNN” means the method of K nearest neighbors. “Reference” means only reference information is used in the designed model(Weight = 0)