Skip to main content

Table 2 Experimental data distribution

From: Time series-based hybrid ensemble learning model with multivariate multidimensional feature coding for DNA methylation prediction

Dataset

Species

Type

Training(70%)

Validation(15%)

Testing(15%)

Total

1

H.sapiens

5hmC

2915

624

624

4163

2

M.musculus

5hmC

5152

1103

1103

7358

3

C.equisetifolia

4mC

2772

593

593

3958

4

F.vesca

4mC

22116

4739

4739

31594

5

S.cerevisiae

4mC

2772

593

593

3958

6

Tolypocladium

4mC

21456

4598

4598

30,652

7

D.melanogaster

6 mA

15668

3357

3357

22382

8

R.chinensis

6 mA

838

180

180

1198

9

Xoc BLS256

6 mA

24102

5164

5164

34430

10

C.elegans

6 mA

11146

2388

2388

15922

11

T.thermophile

6 mA

11146

2388

2388

15922

12

A.thaliana

6 mA

44622

9562

9562

63746

13

H.sapiens

6 mA

25670

5500

5500

36670

14

C.equisetifolia

6 mA

8492

1820

1820

12132

15

F.vesca

6 mA

4344

930

930

6204

16

S.cerevisiae

6 mA

5300

1136

1136

7572

17

Tolypocladium

6 mA

4730

1014

1014

6758

Total

213241

45689

45689

304619