Skip to main content

Table 1 Summary of different feature models.

From: A structure-based Multiple-Instance Learning approach to predicting in vitrotranscription factor-DNA interaction

Name Length of k-mer kernel Number of instances per bag Number of features per instance Description
MIL3D_kmer k [5, 8] 35-k+1 k-3+1 triplets per k-mer * 6 base structural features per triplet For each of the 35-k+1 different continuous k-mers in the 35-mer, for each of the k-3+1 triplets, map the structural features to the k-mer sequentially. The feature vector of one k-mer represent an instance, and the 35-k+1 instances form a bag.
SIL3D 3 1 198 (6 structural features per triplet * 33 continous 3-mers in the 35-mer) For each of the 33 different continous 3-mers in the 35-mer, map the 6 structural features to the 3-mer.
kmer_counting k [3, 8] 1 4k (number of occurrences of all different k-mers) For each 35-mer DNA sequence in the PBM array, count the number of occurrences for each of the 4k k-mers; map the k-mer counter table to the PBM sequence. This k-mer based method has been widely used in the previous decades and has been proven to be still very effective at present [5].
3+4+5mer_counting 3, 4 and 5 1 1344 (64+256+1024) For each 35-mer DNA sequence in the PBM array, map the above 3 counter tables (including 3-mer, 4-mer and 5-mer tables) to the sequence.