Skip to main content

Table 2 The similarities of proteins and their frameshifts (aligned by ClustalW or MSA)

From: Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance

Type Species Number of CDSs Average Similarity Num of Gaps
δ12 δ13 δ23 δ MAX MIN
Real CDSs (ClustalW) H. sapiens 71,853 0.474 ± 0.039 0.454 ± 0.046 0.433 ± 0.043 0.464 ± 0.033 0.890 0.271 53.3
P. troglodytes 15,781 0.473 ± 0.04 0.452 ± 0.047 0.431 ± 0.042 0.463 ± 0.034 0.657 0.309 48.9
M. musculus 27,208 0.469 ± 0.038 0.448 ± 0.046 0.43 ± 0.041 0.459 ± 0.033 0.739 0.286 52.5
X. tropicalis 7706 0.477 ± 0.038 0.455 ± 0.044 0.439 ± 0.042 0.466 ± 0.032 0.638 0.320 36.8
D. rerio 14,151 0.465 ± 0.036 0.443 ± 0.043 0.433 ± 0.038 0.454 ± 0.032 0.658 0.332 51.4
D. melanogaster 23,936 0.455 ± 0.039 0.432 ± 0.045 0.426 ± 0.039 0.444 ± 0.033 0.702 0.250 69.4
C. elegans 29,227 0.475 ± 0.037 0.444 ± 0.042 0.441 ± 0.042 0.459 ± 0.032 0.750 0.261 50.4
A. thaliana 35,378 0.468 ± 0.038 0.439 ± 0.042 0.436 ± 0.043 0.453 ± 0.032 0.828 0.217 47.6
S. cerevisiae 5889 0.482 ± 0.043 0.451 ± 0.042 0.463 ± 0.047 0.467 ± 0.035 0.692 0.259 39.7
E.coli 4140 0.441 ± 0.039 0.415 ± 0.043 0.408 ± 0.042 0.428 ± 0.032 0.614 0.280 45.6
Average 235,269 0.468 ± 0.039 0.443 ± 0.044 0.434 ± 0.042 0.456 ± 0.033 0.890a 0.217a 49.6
Random CDSs (ClustalW) Three frames 100000 × 3 0.475 ± 0.019 0.428 ± 0.020 0.427 ± 0.020 0.452 ± 0.013 0.512 0.391 80.1
Three random CDSs 100000 × 3 0.476 ± 0.019 0.429 ± 0.020 0.428 ± 0.020 0.452 ± 0.013 0.520 0.388 137.1
Random CDSs (MSA) Three frames 100000 × 3 0.409 ± 0.06 0.411 ± 0.059 0.448 ± 0.044 0.410 ± 0.055 0.541 0.207 108.27
Three random CDSs 100000 × 3 0.411 ± 0.06 0.413 ± 0.059 0.447 ± 0.043 0.412 ± 0.055 0.540 0.201 109.47