Name | Length of k-mer kernel | Number of instances per bag | Number of features per instance | Description |
---|---|---|---|---|
MIL3D_kmer | k ∈ [5, 8] | 35-k+1 | k-3+1 triplets per k-mer * 6 base structural features per triplet | For each of the 35-k+1 different continuous k-mers in the 35-mer, for each of the k-3+1 triplets, map the structural features to the k-mer sequentially. The feature vector of one k-mer represent an instance, and the 35-k+1 instances form a bag. |
SIL3D | 3 | 1 | 198 (6 structural features per triplet * 33 continous 3-mers in the 35-mer) | For each of the 33 different continous 3-mers in the 35-mer, map the 6 structural features to the 3-mer. |
kmer_counting | k ∈ [3, 8] | 1 | 4k (number of occurrences of all different k-mers) | For each 35-mer DNA sequence in the PBM array, count the number of occurrences for each of the 4k k-mers; map the k-mer counter table to the PBM sequence. This k-mer based method has been widely used in the previous decades and has been proven to be still very effective at present [5]. |
3+4+5mer_counting | 3, 4 and 5 | 1 | 1344 (64+256+1024) | For each 35-mer DNA sequence in the PBM array, map the above 3 counter tables (including 3-mer, 4-mer and 5-mer tables) to the sequence. |