Skip to main content

Table 1 Identification errors of homopolymer length with different methods

From: Improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies

No

Nt

Pos

Count

Errors (%)

KNN

Torrent suite

Bayesian

Reference

Proposed approach

Weight

Errors

1

A

1–75

144230

7.002

1.119

2.296

0.298

0.28

0.250

2

A

76–150

112776

12.121

1.651

4.722

0.489

0.34

0.453

3

A

151–225

97568

18.733

2.926

8.150

0.423

0.14

0.421

4

A

226–300

48033

22.292

4.655

10.259

0.535

0.24

0.510

5

C

1–75

88732

6.534

1.843

2.779

0.034

0.14

0.027

6

C

76–150

77650

10.382

2.489

4.595

0.556

0.36

0.121

7

C

151–225

63658

18.581

3.187

6.383

0.545

0.28

0.542

8

C

226–300

35736

17.910

4.600

6.159

0.926

0.30

0.923

9

G

1–75

97493

4.141

1.422

1.826

0.609

0.30

0.376

10

G

76–150

78192

14.874

1.623

3.864

0.322

0.32

0.152

11

G

151–225

64680

16.868

2.273

5.683

1.062

0.14

1.062

12

G

226–300

34116

18.754

2.492

7.985

0.147

0.12

0.147

13

T

1–75

156550

5.186

1.106

2.504

0.076

0.14

0.054

14

T

76–150

152034

11.446

1.571

5.780

0.342

0.30

0.297

15

T

151–225

111090

14.720

2.331

7.290

0.419

0.32

0.362

16

T

226–300

68448

13.912

3.315

8.240

0.723

0.28

0.599

  1. “Count” means the number of each class of homopolymers. “KNN” means the method of K nearest neighbors. “Reference” means only reference information is used in the designed model(Weight = 0)