Skip to main content

Table 3 The similarities of proteins and their frameshifts (aligned by FrameAlign)

From: Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance

Type Species Number of CDSs Average Similarity Number of Gaps
δ12 δ13 δ23 δ MAX MIN
Real CDSs (FrameAlign) H. sapiens 71,853 0.492 ± 0.043 0.472 ± 0.044 0.434 ± 0.040 0.466 ± 0.029 0.713 0.194 2
P. troglodytes 15,781 0.491 ± 0.046 0.468 ± 0.046 0.431 ± 0.042 0.463 ± 0.030 0.625 0.311 2
M. musculus 27,208 0.484 ± 0.046 0.469 ± 0.042 0.426 ± 0.040 0.460 ± 0.029 0.739 0.286 2
X. tropicalis 7706 0.481 ± 0.042 0.481 ± 0.041 0.439 ± 0.037 0.467 ± 0.028 0.644 0.353 2
D. rerio 14,151 0.471 ± 0.044 0.468 ± 0.040 0.408 ± 0.040 0.449 ± 0.030 0.614 0.314 2
D. melanogaster 23,936 0.475 ± 0.046 0.457 ± 0.044 0.362 ± 0.047 0.431 ± 0.030 0.689 0.236 2
C. elegans 29,227 0.450 ± 0.047 0.475 ± 0.045 0.421 ± 0.043 0.449 ± 0.032 0.634 0.224 2
A. thaliana 35,378 0.442 ± 0.045 0.477 ± 0.044 0.412 ± 0.041 0.444 ± 0.031 0.882 0.244 2
S. cerevisiae 5889 0.461 ± 0.041 0.510 ± 0.042 0.423 ± 0.038 0.465 ± 0.029 0.692 0.259 2
E.coli 4140 0.435 ± 0.046 0.426 ± 0.047 0.372 ± 0.043 0.411 ± 0.030 0.571 0.237 2
Average 235,269 0.468 ± 0.045 0.470 ± 0.043 0.413 ± 0.041 0.450 ± 0.030 0.882a 0.194a 2
Random CDSs (FrameAlign) Three frames 100,000 0.394 ± 0.028 0.394 ± 0.028 0.395 ± 0.028 0.394 ± 0.016 0.477 0.330 2
Three random CDSs 100000 × 3 0.383 ± 0.028 0.383 ± 0.028 0.383 ± 0.028 0.383 ± 0.018 0.458 0.304 0
  1. aVery large/small similarity values were observed in a few very short or repetitive peptides