Skip to main content

Table 1 Error rate comparison between DPS, DNN and Omni-PolyA derived by using different feature sets from benchmark poly(A) dataset

From: Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

Variant

Size

Error rate (%)

DPS model

DNN model

Omni-PolyA model

DPS feature set

DPS feature set

DPS feature set

Omni-PolyA feature set

Omni-PolyA feature set PAS-weak data pooled

AATAAA

5190

23.72

16.80

14.02

14.20

14.20

ATTAAA

2400

16.63

15.50

14.00

12.50

12.50

AAGAAA

1250

14.00

16.88

11.84

10.80

11.36

AAAAAG

1230

8.05

8.29

4.87

5.85

5.45

AATACA

880

20.00

17.72

13.52

14.09

13.52

TATAAA

780

18.08

21.28

20.38

14.74

13.85

ACTAAA

690

23.33

23.04

19.56

16.23

14.49

AGTAAA

670

19.55

22.98

16.71

14.77

13.13

GATAAA

460

21.74

16.73

13.69

10.65

8.48

AATATA

410

18.05

20.00

16.82

15.85

13.41

CATAAA

410

20.00

26.34

24.14

14.39

14.39

AATAGA

370

18.38

15.40

12.93

12.97

11.62

Average

 

19.25

17.07

14.08

12.99

12.50

  1. ‘Size’ corresponds to the number of samples for each PAS motif variant. The ‘error rate’ is the percentage of misclassified motifs; it is equal to 1-accuracy. DPS results correspond to those obtained by applying the method described in Kalkatawi et al. [31]. ‘Average’ denotes the weighted average of a column. The error rate of the best performing model for each PAS variant is highlighted in bold. Columns 5–7 show the results obtained by Omni-PolyA derived from different feature sets. Seventh column results are obtained by pooling the PAS-weak variants sequences to expand the training data (see Methods)