Skip to main content

Table 1 Summary of different feature models.

From: A structure-based Multiple-Instance Learning approach to predicting in vitrotranscription factor-DNA interaction

Name

Length of k-mer kernel

Number of instances per bag

Number of features per instance

Description

MIL3D_kmer

k ∈ [5, 8]

35-k+1

k-3+1 triplets per k-mer * 6 base structural features per triplet

For each of the 35-k+1 different continuous k-mers in the 35-mer, for each of the k-3+1 triplets, map the structural features to the k-mer sequentially. The feature vector of one k-mer represent an instance, and the 35-k+1 instances form a bag.

SIL3D

3

1

198 (6 structural features per triplet * 33 continous 3-mers in the 35-mer)

For each of the 33 different continous 3-mers in the 35-mer, map the 6 structural features to the 3-mer.

kmer_counting

k ∈ [3, 8]

1

4k (number of occurrences of all different k-mers)

For each 35-mer DNA sequence in the PBM array, count the number of occurrences for each of the 4k k-mers; map the k-mer counter table to the PBM sequence. This k-mer based method has been widely used in the previous decades and has been proven to be still very effective at present [5].

3+4+5mer_counting

3, 4 and 5

1

1344 (64+256+1024)

For each 35-mer DNA sequence in the PBM array, map the above 3 counter tables (including 3-mer, 4-mer and 5-mer tables) to the sequence.