Skip to main content
Figure 1 | BMC Genomics

Figure 1

From: A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis

Figure 1

Performance comparison of the RF model trained on different types of input data. 500 RF models with randomly generated P1 and P2 sets, to correct for class A and P inbalance, were trained on each of the following 6 types of data: the full-length amino acid sequences, the signal peptides (SP) and the mature protein amino acid sequences, each analyzed with either the residue frequency of single amino acids or the frequency of 2 adjacent amino acids. When the 6 top-performing models of each input type are compared, the model trained with full-length protein sequences with the 2 adjacent amino acids combination shows the highest overall accuracy (89%) and A protein recall (90%).

Back to article page