Skip to main content

Table 2 The similarities of proteins and their frameshifts (aligned by ClustalW or MSA)

From: Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance

Type

Species

Number of CDSs

Average Similarity

Num of Gaps

δ12

δ13

δ23

δ

MAX

MIN

Real CDSs (ClustalW)

H. sapiens

71,853

0.474 ± 0.039

0.454 ± 0.046

0.433 ± 0.043

0.464 ± 0.033

0.890

0.271

53.3

P. troglodytes

15,781

0.473 ± 0.04

0.452 ± 0.047

0.431 ± 0.042

0.463 ± 0.034

0.657

0.309

48.9

M. musculus

27,208

0.469 ± 0.038

0.448 ± 0.046

0.43 ± 0.041

0.459 ± 0.033

0.739

0.286

52.5

X. tropicalis

7706

0.477 ± 0.038

0.455 ± 0.044

0.439 ± 0.042

0.466 ± 0.032

0.638

0.320

36.8

D. rerio

14,151

0.465 ± 0.036

0.443 ± 0.043

0.433 ± 0.038

0.454 ± 0.032

0.658

0.332

51.4

D. melanogaster

23,936

0.455 ± 0.039

0.432 ± 0.045

0.426 ± 0.039

0.444 ± 0.033

0.702

0.250

69.4

C. elegans

29,227

0.475 ± 0.037

0.444 ± 0.042

0.441 ± 0.042

0.459 ± 0.032

0.750

0.261

50.4

A. thaliana

35,378

0.468 ± 0.038

0.439 ± 0.042

0.436 ± 0.043

0.453 ± 0.032

0.828

0.217

47.6

S. cerevisiae

5889

0.482 ± 0.043

0.451 ± 0.042

0.463 ± 0.047

0.467 ± 0.035

0.692

0.259

39.7

E.coli

4140

0.441 ± 0.039

0.415 ± 0.043

0.408 ± 0.042

0.428 ± 0.032

0.614

0.280

45.6

Average

235,269

0.468 ± 0.039

0.443 ± 0.044

0.434 ± 0.042

0.456 ± 0.033

0.890a

0.217a

49.6

Random CDSs (ClustalW)

Three frames

100000 × 3

0.475 ± 0.019

0.428 ± 0.020

0.427 ± 0.020

0.452 ± 0.013

0.512

0.391

80.1

Three random CDSs

100000 × 3

0.476 ± 0.019

0.429 ± 0.020

0.428 ± 0.020

0.452 ± 0.013

0.520

0.388

137.1

Random CDSs (MSA)

Three frames

100000 × 3

0.409 ± 0.06

0.411 ± 0.059

0.448 ± 0.044

0.410 ± 0.055

0.541

0.207

108.27

Three random CDSs

100000 × 3

0.411 ± 0.06

0.413 ± 0.059

0.447 ± 0.043

0.412 ± 0.055

0.540

0.201

109.47