Skip to main content

Table 7 The two percentage distributions of the top 10 most abundant species to which sequences in the positive training dataset and the independent positive test dataset belong

From: Deep learning for HGT insertion sites recognition

The positive training dataset

Percentage (%)

The independent positive test dataset

Percentage (%)

Microbacterium esteraromaticum

13.13

Faecalibacterium prausnitzii A2-165

7.69

Mycolicibacterium monacense

7.36

Microbacterium esteraromaticum

4.84

Mycobacterium sp. 852002-51961_SCH5331710

3.08

Prevotella copri DSM 18205

4.38

Faecalibacterium prausnitzii A2-165

2.39

Mycobacterium sp. 852002-51961_SCH5331710

3.5

Collinsella aerofaciens ATCC 25986

1.97

Mycolicibacterium monacense

3.04

Collinsella sp. 4_8_47FAA

1.94

Bacteroides stercoris ATCC 43183

2.53

Gemmiger formicilis

1.69

Roseburia faecis

2.33

Collinsella sp. TF06-26

1.64

Roseburia intestinalis L1-82

2.01

Bifidobacterium longum

1.55

Gemmiger formicilis

1.56

Bacteroides caccae

1.50

Acinetobacter sp. AR2-3

1.48