Skip to main content

Advertisement

Table 3 Effect of sequence redundancy on algorithm cross-validation performance. HMM models were constructed using the 186 bootstrap sequences used to train HMM266, then tested for accuracy and coverage against non-overlapping positive and negative test sets. "Max. sequence similarity" refers to the maximum number of amino acid position matches allowed for the sequences in a given row, either within or between test and training sets. Jack-knife (leave-one-out) testing for each row was performed against the training set described in that row.

From: Predicting N-terminal myristoylation sites in plant proteins

Model Name Max. sequence similarity Number train seqs. Number positive test seqs. Number negative test seqs. Accuracy (TP+TN)/TOTAL Coverage TP/(TP+FN) Jack-knife Detection
HMM186B 24/25 residues (96%) 186 80 185 96.6% 96.3% 98.4%
HMM162B 20/25 residues (80%) 162 53 128 96.1% 92.5% 96.9%
HMM151B 15/25 residues (60%) 151 42 102 98.6% 95.2% 96.7%
HMM127B 10/25 residues (40%) 127 25 94 97.5% 96.0% 96.1%