In silico analysis of bacterial translation factors reveal distinct translation event specific pI values

Background Protein synthesis is a cellular process that takes place through the successive translation events within the ribosome by the event-specific protein factors, namely, initiation, elongation, release, and recycling factors. In this regard, we asked the question about how similar are those translation factors to each other from a wide variety of bacteria? Hence, we did a thorough in silico study of the translation factors from 495 bacterial sp., and 4262 amino acid sequences by theoretically measuring their pI and MW values that are two determining factors for distinguishing individual proteins in 2D gel electrophoresis in experimental procedures. Then we analyzed the output from various angles. Results Our study revealed the fact that it’s not all same, or all random, but there are distinct orders and the pI values of translation factors are translation event specific. We found that the translation initiation factors are mainly basic, whereas, elongation and release factors that interact with the inter-subunit space of the intact 70S ribosome during translation are strictly acidic across bacterial sp. These acidic elongation factors and release factors contain higher frequencies of glutamic acids. However, among all the translation factors, the translation initiation factor 2 (IF2) and ribosome recycling factor (RRF) showed variable pI values that are linked to the order of phylogeny. Conclusions From the results of our study, we conclude that among all the bacterial translation factors, elongation and release factors are more conserved in terms of their pI values in comparison to initiation and recycling factors. Acidic properties of these factors are independent of habitat, nature, and phylogeny of the bacterial species. Furthermore, irrespective of the different shapes, sizes, and functions of the elongation and release factors, possession of the strictly acidic pI values of these translation factors all over the domain Bacteria indicates that the acidic nature of these factors is a necessary criterion, perhaps to interact into the partially enclosed rRNA rich inter-subunit space of the translating 70S ribosome. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07472-x.


Background
The translation is a complex universal biological process that takes place in a large macromolecular machine called ribosome in all living organisms. It is an energyexhaustive cellular process. In Escherichia coli, 40% of the total cellular energy is utilized by the translation system [1]. With the help of specific protein factors and aminoacyl tRNAs, ribosomes carry out protein synthesis following the decoding of the genetic information from mRNA in successive events, namely, initiation, elongation, and termination (release and recycling). The protein factors that are involved in the successive events are initiation factors (IF), elongation factors (EF), release factors (RF), and ribosome recycling factors (RRF). Here, the accurate coordination of every participant protein factor is necessary to perform the process successfully. Based on several years of biochemical and structural biological studies worldwide, fairly detailed knowledge of the mechanisms of cellular protein synthesis is now known [2][3][4]. However, in the broad aspect, which characteristics of the translation factors i.e., IF, EF and RF are necessary to be conserved for the accuracy of the universal process of protein synthesis among the different kinds of organisms need to be investigated.
In this study, we focused on the charge distribution (in terms of acidic and basic properties) of the translation factors throughout the domain Bacteria to comprehend the importance of the influence of the charge distribution of these factors on their accommodation on the ribosome and thus in their functions during this process of translation. For this, we made use of the principle of the 2D gel electrophoresis [5], whereby, we computed the pI values using the "Compute pI/Mw tool -ExPASy" (https://web.expasy.org/compute_pi/) online webserver. This web server calculates the pI values of proteins using pK values of amino acids as defined in [6][7][8], which were determined by examining polypeptide migration in an immobilized pH gradient (between pH 4.5 to 7.3) gel environment with 9.2 M and 9.8 M urea at 15°C or 25°C. In that study, the authors determined the focusing positions of 29 polypeptides of known amino acid sequence within a narrow range of immobilized pH gradients i.e., between pH 4.5 to 7.3 under denaturing conditions with 9.2 M and 9.8 M urea at 15°C or 25°C, respectively. They separately calculated the pI values of those proteins from their amino acid sequences. The comparison of isoelectric points of the proteins calculated from their amino acid sequences showed reliably good accuracy with the experimentally determined pl values. The reliability of the tool is broad, except for the study of highly basic proteins and small proteins. As the translational factors are not highly basic and also not too small, we believed our study was within the scope of the above mentioned web-based method. Our study revealed that the bacterial translational elongation and release factors have similar pI value distribution, and that was strictly acidic throughout the domain Bacteria. Irrespective of the habitat, nature, or the phylogeny of the bacterial species as well as irrespective of the different shapes, sizes, and functions of the elongation and release factors, these factors had strictly acidic pI values. We believe, our study indicates that the charge distribution of these factors might play important roles in the fidelity of the process of translation.

pI and molecular weight value distribution of translation protein factors
In the process of translation, we found a unique pattern of pI value distribution as depicted in Fig. 1a, (see Additional file 1; Table S1). The initiation factors, IF1, and IF3 were strictly basic except IF2. Conversely, the elongation and release factors were strictly acidic. On the other hand, like IF2, RRF also showed a broad range of pI value distribution ranging from acidic to basic. All the four quartiles of initiation factor 1 (IF1) and initiation factor 3 (IF3) were above pI 7. The elongation factor Tu (EF-Tu), elongation factor G (EF-G), elongation factor 4 (EF-4), & elongation factor P (EF-P) and the release factor 1 (RF1), release factor 2 (RF2), & release factor 3 (RF3) had all the four quartiles in the acidic range. For the comprehensive in silico study, along with the pI values, we also studied the molecular weight (MW) value distribution of these translation protein factors (Fig. 1b), (see Additional file 1; Table S1). Like pI value distribution, the protein IF2 showed a wide range of variations in MW value distribution as well (Fig. 1b). All the other proteins showed precise MW value distribution. A surprising observation is to be noted here that although RRF proteins showed a highly variable pI value distribution, their MW value distribution was quite narrow.

Statistical analysis of pI values of translation factors
We further performed asymptotic tests [11] for 5% quantile and 95% quantile (Table 1) of these translation factors. We found that the p values corresponding to the null hypotheses (H0: q05 ≥ 7, and H0: q95 ≤ 9.95) for the 5 and 95% quantiles, respectively, for both the initiation factors, IF1 and IF3 to be more than 0.05, from which we inferred that 90% data lied in basic pI values, i.e., between 7 to 9.95. On the contrary, in the case of elongation (EF-Tu, EF-G, EF-4, and EF-P) and release factors (RF1, RF2, and RF3), 90% of data lied in completely acidic pI values i.e., between 4.635 and 6.225 (p values corresponding to H0: q05 ≥ 4.635 and H0: q95 ≤ 6.225 turned out to be more than 0.05, respectively). But we found a different scenario in the case of initiation factor, IF2, and ribosome recycling factor, RRF. In both these cases, 90% of data stretched in between acidic 5.1 to

Amino acid frequency distribution of elongation and release factors
Interestingly, when we randomly chose 60 amino acid sequences (representing 60 bacterial species) of each of the elongation and release factors and calculated their amino acid frequencies, we found the occurrence of a high frequency of glutamic acid in all of those factors, (Fig. 2). In 2001, Schwartz et al. [12] also observed that the cytosolic acidic proteins were also found to have a high frequency of glutamic acid.

Surface charge distribution of the elongation and release factors
To further understand our observation, in the viewpoint of physiological context, we focused on the surface charge distribution of the atomic coordinates of these elongation and release factors; EF-Tu (PDB ID: 2FX3) [13], EF-G (PDB ID: 3J0E) [14], EF-4 (PDB ID: 3DEG) [15], EF-P (PDB ID: 3OYY) [16], RF1 (PDB ID: 4V7P) [17], RF2 (PDB ID: 5MGP) [18], and RF3 (PDB ID: 4 V85) [19]. We used online APBS-PDB2PQR software [20,21], which employs Poisson-Boltzmann electrostatics calculations to analyze the surface charge of the translation protein factors mentioned above. We found out that though there are some patches of positive charges (blue) on the surface, the overall charge of all these factors (Fig. 3) is negative (red). We provided all the PDB IDs, studied here, in Table 2.

Relation of pI values of IF2 and RRF proteins with phylogeny
Since IF2 had a wide range of pI value distribution from acidic to basic, we performed phylogenetic analysis (Fig. 4a) of the IF2 proteins (Additional file 1; Table S1) to investigate the relation of its pI value distribution with the phylogeny. In the case of the phylum Proteobacteria, we found that the class of Gammaproteobacteria (blue) and Betaproteobacteria (verdigris) were acidic (with only a few exceptions). Whereas the class Alphaproteobacteria (brown) had few genera as acidic (i.e., Ehrlichia spp.) and some genera as basic (i.e., Brucella spp. and Bartonella spp.), and others had both acidic and basic (i.e., Rickettsia spp.) pI values. In the case of other phyla, Chlorobi (cyan), Cyanobacteria (red), Thermotogae (yellow), and Deinococcus-Thermus (light grey), they had mostly acidic pI values, whereas the Chlamydiae (saffron) and Spirochaetes (light green) had basic pI values.
The pI values of the IF2 protein in phyla Firmicutes (pink) and Actinobacteria (light blue) and Tenericutes (purple) had both the acidic and basic pI values.
The phylogenetic analysis of RRF (which had a wide range of pI value distribution) showed that the pI value distribution of RRF (Fig. 4b), (Additional file 1; Table  S1) like IF2 (Fig. 4a) also linked to the phylogeny. We found that different classes of Proteobacteria had different pI value distribution. The Gammaproteobacteria (blue), and Alphaproteobacteria (brown) (with a few exceptions e.g., Genus; Salmonella spp. of Gammaproteobacteria and Genus; Rickettsia spp. and Ehrlichia spp.

Discussion
Our study revealed that irrespective of external environments or bacterial phylum, all the translation factors (except IF2 and RRF) are conserved throughout the domain Bacteria in terms of isoelectric point value distribution. Along with the translation process, we did additional studies on the pI value distribution of the two other universal processes of central dogma i.e., replication and transcription processes in domain Bacteria. We studied 529 number of bacterial sp., and 1707 number of amino acid sequences for replication (Additional file 2; Table  S2) and 488 number of bacterial sp., and 1998 number of amino acid sequences for transcription (Additional file 3; Table S3). In the case of replication and transcription, some of the proteins showed a narrow range and others showed a wide range of pI value (Additional file 2; Fig. S1 and Additional file 3; Fig. S3 respectively) and molecular weight value (Additional file 2; Fig. S2 and Additional file 3; Fig. S4 respectively) distribution. Unlike translation factors, we found no specific pattern of pI value distribution of the proteins involved in the individual steps of the initiation, elongation, and release in those two processes. So, in conclusion, the observation of our study of the precise pI value distribution of the translation factors throughout the domain Bacteria indicates that the overall acidity or basicity of translation factors is an essential feature in the process of translation. The proteins involved in the initiation event of the process of translation i.e., initiation factors, were basic, whereas in the cases of the elongation and release events, i.e., elongation and release factors were strictly acidic due to the high frequency of negatively charged amino acids i.e., glutamic acids (Fig. 2). If we focus on the mode of interaction of these factors with the ribosome, we can categorize the facts i.e., initiation factors, IF1, IF2, and IF3 are involved in the formation of the 30S initiation complex, which is an open complex. On the other hand, the elongation and release factors interact with the ribosome when the 50S ribosomal subunit binds to the 30S initiation complex and all these three initiation factors eject from the initiation complex. Both the elongation and release factors irrespective of Fig. 2 Amino acid frequency distribution of elongation and release factors. In each case of the elongation (EF-F, EF-G, EF-4, and EF-P) and release factors (RF1, RF2, and RF3), we selected 60 amino acid sequences that correspond to 60 bacterial species to study the amino acid frequency distribution. Each colour represented each randomly selected bacterial species Jana and Datta BMC Genomics (2021) 22:220 these proteins' different shapes, sizes, and functions interact with the A site of the semi-enclosed intersubunit space of the translating 70S ribosome. Another important fact needs to be noted that the process of initiation of translation takes some seconds [22][23][24] to assemble the ribosome on the mRNA with the accordance of initiation factors but the elongation process happens at a faster rate than initiation. Several amino acids are incorporated within a second [22][23][24] and it continues until the whole mRNA gets read and the stop codon appears.
Based on our observation, if we focus our discussion on the molecular details of the individual steps of the process of translation, the importance of the charge distribution of the factors for the proper electrostatic interaction during this process will help to understand the process in a more comprehensive depiction. In case of initiation, a detailed biochemical and mutagenesis study on the interaction on IF1 and 30S ribosomal subunit showed that IF1 interacts with the 530 loop and helix 44 of 16S rRNA [25], which contains a highly negative charge. Thus the part of that surface region of IF1 is responsible for the interaction, which has the positive surface potential [25]. In the case of IF3, studies showed that site-directed mutagenesis of positively charged eight arginine residues, which are present in the IF3C domain, play an important role in the interaction with the 30S ribosomal subunit [25,26].
In the case of elongation and release factors, in 2004, Trylska et al. [27], measured the electrostatic potential of the ribosomal A-site. They found a  [52], and RF3 -70S ribosome (PDB ID: 6GXM) [53]. The gray dotted boxes showed the surface charge distribution of the elongation and release factors [13][14][15][16][17][18][19]. All the domains of these factors were marked on the right side and the left side of their structures. The calculated electrostatic net charge of EF-Tu (PDB ID: 2FX3) was − 1.40e +01e, EF-G (PDB ID: 3J0E) was − 1.50e +01e, EF-4 (PDB ID: 3DEG) was − 2.00e +01e, EF-P (PDB ID: 3OYY) was − 8.00e +00e, RF1 (PDB ID: 4V7P) was − 1.40e +01e, RF2 (PDB ID: 5MGP) was − 2.60e +01e, RF3 (PDB ID: 4 V85) was − 7.00e +00e. Red and blue colour indicated negative charge and positive charge respectively whereas white colour indicates neutral charge positive potential area in the A-site of the 70S ribosome complex that was mainly contributed by S12, L11, and S19 proteins. Biochemical and structural studies have shown that elongation factors; EF-Tu [28][29][30], EF-G [31][32][33], EF-4 [34], EF-P [35] interact with L11 protein, which is found to have the positive potential [27]. This positive potential contributed by these proteins of the A-site may be necessary for the interaction as it has been found that mutant lacking L-11 is extremely compromised in E. coli [36]. EF-G interacts with the S12 and S19 proteins as well [37]. This kind of interaction of the complementary electrostatic potential of the translation factors and the proteins of the A-site may help in the proper accommodation of these factors in the A-site. In this direction, a recent study [38], sheds light on the role of electrostatic interactions on the accommodation of cognate aa-tRNA in the A site, as well. In the next step, the rotation of the 30S ribosomal subunit with respect to the ratchet-like motion of the 50S ribosomal subunit causes the rearrangement of the electrostatic potential of the A-site i.e., a reduction of the positive potentials around the A-site. Thus it promotes the process of translocation [27] of tRNA from A-site to P-site and then from P-site to E-site. In the case of release factors, the positive potential of L11 causes the proper accommodation of the negative potential containing release factors, RF1 and RF2. After the RF3-induced ribosome rearrangements, the interactions between RF1/RF2 and the L11 region break, which causes the release of RF1/RF2 [39,40]. On the other hand, the wide range of pI value distribution of IF2 and RRF reveals that the conservedness with respect to the acidic and basic properties of this translation factor may not be as important as the other translation factors in bacteria.
In this study, we took into account a wide range of bacterial species that belong to the entire domain of Bacteria on earth. For the sake of survival, bacteria evolve numerous mechanisms to adapt to that environment. The habitat of these bacteria vary in a wide range from the soil, water, food, industrial waste, deep ocean, acidic hot springs, in symbiotic and parasitic relationships with animals and plants, and radioactive waste also [41]. The nature of these bacteria are also different (i.e., acidophiles, alkaliphiles, aerobic, anaerobic, phototrophs, chemotrophs, nitrogen-fixing Bacteria, nitrifying and denitrifying bacteria, bioluminescent bacteria, free-living bacteria, enteric bacteria, and obligate intracellular parasites) [41]. Irrespective of the wide range of phylogeny, habitat, and nature of these bacteria, our statistical test showed that except IF2 and RRF, all the initiation, elongation, and release factors are conserved in terms of pI values all over the domain Bacteria.
Besides the elongation factors, the highly conserved basic pI value distribution of the initiation factors, IF1 and IF3, indicated that the pI values of these two translation factors are also not affected by phylogeny, nature, or habitat of the bacteria. The wide range of pI value distribution of IF2 and RRF ( Fig. 4a and Fig.  4b respectively) unveiled that different phyla of bacteria had different traits of pI value distribution.

Conclusions
We concluded our study with a pictorial description of our findings in Fig. 5, where we depicted the mean pI value distribution along with the standard deviation values of all the translation factors in bacteria that showed distinct translation event specificity.

Data collection
We studied the following translation factors viz., IF1, IF2, IF3, EF-Tu, EF-G, EF-4, EF-P, RF1, RF2, RF3, and RRF from bacteria that directly interact with ribosome. Between the reviewed and unreviewed categories of the protein sequences of the UniProt [42] database, we collected the reviewed only for the accuracy of sample data. We removed all the incomplete fragments and repeated sequences as well to circumvent erroneous assumptions.
We calculated pI and MW values from 4262 reviewed amino acid sequences (Additional file 1; Table S1) of the bacterial translation factors, and those pI values, and MW values, and the corresponding accession numbers had been provided with the Additional file 1; Table S1.

Method of pI value and MW value calculation
We used the "Compute pI/MW tool" (http://web. expasy.org/compute_pi/) of the ExPaSy-Bioinformatic resource portal to calculate the pI value and MW value. We chose this "Compute pI/MW tool" webserver because it shows reasonable good agreements of the calculated pI values with the experimentally determined pI values [6][7][8].

Statistical test
We performed the asymptotic test [11] for the translation proteins for 5% quantile and 95% quantile. We calculated the p values corresponding to the null hypotheses for 5% and 95% quantiles for the translation proteins in MATLAB (R2019b) software (https://in. mathworks.com/products/new_products/release2019b. html). We generated all the graphs of this study in Ori-ginPro 8.5 software (Origin (Pro), "Version 2019b") [43].

Electrostatic potential calculation
We downloaded the following atomic coordinates, viz., 2FX3 of EF-Tu, 3J0E of EF-G, 3DEG of EF-4, 3OYY of EF-P, 4V7P of RF1, 5MGP of RF2, and 4V85 of RF3 from the Protein Data Bank (PDB) database (www.rcsb. org). We deleted all the ions, and solvents, and other chemical modifications using Chimera software [44] (https://www.rbvi.ucsf.edu/chimera/). We calculated charges of these factors in APBS-PDB2PQR software, (https://server.poissonboltzmann.org/), that uses the Poisson Boltzmann equation to calculate the charge of a molecule. We used the output file to visualize the surface charge of these factors in Chimera software.

Phylogenetic analysis
We used MEGA7 software [45,46] to investigate the distribution of the pI values of IF2 and RRF protein in the bacterial taxonomy. We used primary amino acid sequences to construct the phylogenetic trees in both the cases of IF2 and RRF protein. We used 500 bootstrap replicates to analyze the phylogenetic tree, and we presented here the tree having the highest log-likelihood.
(See figure on previous page.) Fig. 4 Phylogenetic tree constructed using primary amino acid sequences of IF2 and RRF proteins. a Phylogenetic analysis of IF2 protein. b Phylogenetic analysis of RRF protein. In both cases, a and b, we analyzed the evolutionary history using the Maximum Likelihood method based on the JTT matrix-based model. We took 500 bootstrap replicates to build the phylogenetic tree. Numbers near the branches refer to the bootstrap percentages (greater than 50% bootstrap replicates only shown here). The tree had been drawn to scale after eliminating all positions containing gaps and missing data. The branch lengths were measured by the number of substitutions per site. Blue triangles and red circles refer to the basic and acidic pI values respectively