Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: Insights into the molecular basis of cold adaptation of proteins

Background Cold adapted or psychrophilic organisms grow at low temperatures, where most of other organisms cannot grow. This adaptation requires a vast array of sequence, structural and physiological adjustments. To understand the molecular basis of cold adaptation of proteins, we analyzed proteomes of psychrophilic and mesophilic bacterial species and compared the differences in amino acid composition and substitution patterns to investigate their likely association with growth temperatures. Results In psychrophilic bacteria, serine, aspartic acid, threonine and alanine are overrepresented in the coil regions of secondary structures, whilst glutamic acid and leucine are underrepresented in the helical regions. Compared to mesophiles, psychrophiles comprise a significantly higher proportion of amino acids that contribute to higher protein flexibility in the coil regions of proteins, such as those with tiny/small or neutral side chains. Amino acids with aliphatic, basic, aromatic and hydrophilic side chains are underrepresented in the helical regions of proteins of psychrophiles. The patterns of amino acid substitutions between the orthologous proteins of psychrophiles versus mesophiles are significantly different for several amino acids when compared to their substitutions in orthologous proteins of within the mesophiles or psychrophiles. Conclusion Current results provide quantitative substitution preferences (or avoidance) of amino acids that lead to the adaptation of proteins to cold temperatures. These finding would help future efforts in selecting mutations for rational design of proteins with enhanced psychrophilic properties.


Background
Microorganisms that live under forbidding conditions are called extremophiles, whose discovery points out the unique adaptability of primitive life-forms. These microorganisms are grouped according to their optimal growth conditions in which they exist such as acidophiles (exhib-iting optimum growth in acidic pH conditions), alkaliphiles (thriving in alkaline pH conditions), barophiles (surviving under great pressures), endoliths (living in deep inside rocks), halophiles (thriving in high salt concentrations), psychrophiles (optimal temperature below 20°C), and the thermophiles (optimal temperature between 45-80°C), hyperthermophiles (optimal temperature above 80°C) [1]. The largest coverage of known extremophile conditions of the earth's biosphere is below 10°C. For example, three fourths of earth is covered by oceans, which maintain an average temperature of one to three degrees centigrade. Furthermore, the vast land areas of the Arctic and Antarctic are permanently frozen throughout the year [1]. Other few examples of cryo habitats include cold deserts, high alpine soils, sea ice, cold caves, marine sediments, permafrost soils, glacier, snow etc.
The majority of known psychrophiles belong to varieties of archaea and bacteria, and a few species of yeast, fungi and algae [2]. The ability to thrive at life-endangering effects of low temperatures, close to freezing point of water, requires a vast array of adaptations from all their cellular components, including their membranes, energygenerating systems, protein synthesis machinery, biodegradative enzymes and the components responsible for nutrient uptake etc., to maintain metabolism, sustain growth and reproduction compatible with life in these low temperature conditions [3,4]. Having evolved with special mechanisms, the psychrophiles successfully colonized these niches [2,5]. Psychrophilic proteins display sequences and structures comparable with those of their meso and (hyper) thermophilic homolog's, especially enzymes with their ability to work efficiently as catalysts at low temperatures [6]. The thermolability of these proteins at moderate temperatures warrant tremendous industrial applications in biotechnology, bioremediation, food, textiles, detergents bio-catalysis under low-water conditions and detergents etc [5][6][7][8][9].
Due to above facts, historically starting from mid-1970's, much attention was paid mainly to sequence and structural attributes contributing to adaptation of proteins (mainly enzymes) to high temperature conditions. Many investigators have compared sequence and structurebased parameters among thermophilic and mesophilic proteins [10]. With the advent of pioneering efforts in late 1990's in solving three dimensional structures of cryophilic enzymes such as alpha-amylase [11]; alkaline protease [12]; triose phosphate isomerase [13]; malate dehydrogenase [14] from Antarctic microorganisms, and due to handful of available structures in the protein data bank (PDB), groups have focused to address the structural basis of proteins in cold adaptation [4,[15][16][17][18][19][20].
The steady increase in sequencing of proteomes of extremophiles has opened many new avenues in understanding adaptations to extreme conditions [16,[21][22][23][24][25]. A comprehensive comparison of global amino acid preferences and substitution patterns as deduced from proteomes of different organisms is now possible [26][27][28].
Using homologous sequences, clustering along with various statistical methods; we conducted an extensive analysis of proteomes of psychrophilic, mesophilic, thermophilic and hyperthermophilic microorganisms to examine a possible correlation of amino acid substitution patterns with adaptation to their respective optimal growth conditions. In this manuscript we discuss the results from comparative analysis of fully sequenced proteomes of six members from each of psychrophilic and mesophilic organisms.

Results
On average we analyzed 2,816 proteins with 875,219 amino acids per proteome of mesophiles and 3665 proteins with 1,169,678 amino acids per proteome of psychrophiles. The amino acid (AA) frequencies given in Table 1 show that some of the AA differed significantly in psychrophile proteomes when compared to mesophile proteomes. When compared to psychrophiles, the mesophile proteomes show larger standard deviation for residues indicating that the six proteomes of mesophiles we used are considerably more divergent than the proteomes of psychrophiles. The frequencies of individual amino acids as well as property groups were further analyzed with student t-test.

Amino acid composition preferences
The t-test results demonstrate significant preferences in frequencies of amino acid occurrences and property groups in psychrophilic proteomes as compared to mesophilic proteomes or vice versa ( Table 1). The compositional trend of AA is somewhat similar in both types of genomes. However, as indicated by t-values from Table 1, there are a few AA residues such as A, D, S and T, significantly preferred in psychrophiles as compared to mesophiles. On the other hand, AA residues E and L are significantly less favored in psychrophile proteomes. When comparing frequencies of occurrences of property groups of AAs, we observe that tiny/small and neutral amino acid groups are significantly preferred in psychrophiles where as charged, basic, aromatic and hydrophilic groups are significantly less favored as shown by their corresponding t-values in Table 1. When we compared the AA compositions of the sequences in alignments of respective orthologous proteins alone (the data not given), we observed similar trends.

Secondary Structural Elements
The composition of AA of psychrophilic and mesophilic proteomes in three major secondary structural elements, α-helices, β-sheets and coils, is given in Table 2. Collectively taken, the psychrophilic proteomes contain significantly less number of residues (~2%) in the α-helices and significantly more number of residues (~2%) in the coil regions. The majority of amino acids exhibit similar com-positions in either of the two genome sequences. However, amino acids E, F L, N and Y show significantly low frequencies in α-helices of psychrophilic proteomes and amino acids A, D, G, S, T, and V are significantly high in the coil region of psychrophilic proteomes. The amino acid, E is significantly low in the coil region of psychrophilic proteomes. Except in an increase in Alanine residues, β-sheets of psychrophile proteomes did not show any significant changes as compared to mesophiles.
When evaluated for the frequencies of occurrence of property groups of amino acids, the majority of them show significantly low frequencies in helices of psychrophilic proteomes. Except in tiny group of AAs, the β-sheet regions of psychrophile proteomes did not show any significant changes ( Table 2). The tiny, small, hydrophobic, neutral, acidic, aliphatic, and non-polar amino acid groups showed significantly high frequencies in the coil/ loop regions of psychrophilic proteomes.

Comparative Proteome Analysis
Towards identification of residue substitutions, likely to have undergone in psychrophilic proteins as the species adapted to cold temperatures, a comparative proteome analysis was performed on the basis of amino acid substitutions occurred between the orthologous protein sequences of psychrophile and mesophile proteomes. The orthologous sequence pairs for a protein sequence with The average (Avg) values among each set of proteomes along with their standard deviations (SD) are also given. Significant significant length coverage only were considered for the analysis. Coverage of hits with respect to each proteome is shown in Table 3. On average 16.3% psychrophile proteins have orthologous proteins in mesophiles we used. This was cross-checked using mesophile proteomes as query sequences and the psychrophile proteomes as subject sequences, where we found 13.9% orthologs (Table  3). We used sequence alignments of these sequence pairs to compute substitutions of amino acids between mesophilic and psychrophilic proteomes and vice versa and the obtained values were averaged. On average, ten to twenty five percent of sequences from individual proteomes exhibited best hit homologues from members of other thermal groups (Table 3). This percent depended on the size of the proteome under consideration. The higher the number of proteins in a query proteome the higher percentage of hits from the subject proteomes searched.
This may be because some of the paralogous sequences selecting the same protein as its ortholog (this redundancy was removed in our final data). We also considered homologous proteins among the psychrophiles and mesophiles, and calculated the substitutions within them to use as background substitution frequencies [see Eqn. (ii) and (iii)]. We observed, on average 16.7% and 17.1% orthologous proteins within the psychrophilic and mesophilic proteomes, respectively (Table 3).

Amino Acid Substitution Patterns
Log odd scores (LOS) of AA substitutions were calculated using the frequency of occurrence of substitutions among orthologous proteins of psychrophiles and mesophiles by normalizing with frequency of occurrence of substitutions within the proteomes of the same temperature sensitive group. In LOS calculation the substitutions influenced by factors other than temperature are nullified and values represent true substitution due to cold adaptation of species. In Table 4 we show LOS Meso values computed using equation (ii) as described in methods. These values clearly show that psychrophilic proteins avoid containing the amino acids E, F, K, N and Y. On the other hand they prefer containing the residues A, D, G, S, and T as compared to mesophile proteins. The individual values in Table 4 show that to what extant certain substitutions are favored or avoided depending on corresponding LOS Meso score +ve or -ve, respectively. For example the W in mesophiles mutating to S in the psychrophiles is highly favored with LOS scores of 10.9. On the other hand G in mesophiles mutating to K in psychrophiles is avoided with LOS score of -12.4. The LOS Psychro [Eqn. (iii)] scores calculated using substitution frequencies among psychrophiles as normalizing factor have shown similar results (data not shown).
Student t-test was further applied to evaluate level of significance of LOS substitution scores and these data were shown in Additional files (Additional file 1 &2).  (Table 4) with their corresponding t-values (Additional file 1) greater than 1.37 or less than -1.37 could be considered as significantly preferred or avoided, respectively, at 90% confidence level (shaded in color, Table 4). It can be seen from the table that there are about 45% substitutions that are shown to be significant. Further, the LOS Meso for AA property group substitutions are given in Table 5 and their corresponding t-values are given in Additional file 2. It is clearly seen from Table 5 that there is high preference for tiny, small and neutral AAs whereas charged (including both basic and acidic) aromatic and hydrophilic AAs are avoided significantly in the psychrophiles as compared to mesophiles.

Discussion
Our objective in this study was to analyze systematically the compositional variation and substitution preferences of amino acids in proteomes of psychrophiles compared to the proteomes of mesophiles to investigate general proteome wide characteristics for cold adaptation. We considered total compositional differences in proteomes as well as compositional differences in their orthologous proteins alone. We performed analysis at different levels, through simple amino acid compositions, student t-test and finally by substitution patterns in their orthologous proteins. Some of the methods we used were previously applied [26,[29][30][31] but not to the complete proteome analysis.
In psychrophiles individual residue compositions show that there is a significant preference for A, D, S and T content and significant avoidance of E and L content and moderate preference for G and avoidance for F and K content (Table 1). All these residue preferences and avoidance directly show a strong correlation with respect to avoidance for helical content in psychrophiles, as S, D and G are helix breakers [32] and T is a helix indifferent. Likewise, the presence of E tends to favor formation of helical structures and L tends to stabilize helical structures [32] that are highly avoided in psychrophiles. Amino acid D is observed to be unstable at high temperatures and therefore its frequency observed to decrease as optimal growth temperature of organisms increase [33]. Reverse trends are observed for E to counter the trend in favor of making ion pair interactions to form salt-bridges at higher temperatures [34,35]. Helix destabilizing beta-branched residues (I, T and V) are preferred in beta sheets and loop regions of psychrophilic proteins [36,37]. The substitution pattern in the orthologous proteins of two temperature groups show several interesting features that are not readily seen in the simple AA compositions. On other hand, they strongly support observed differences in compositions apart from giving additional insights, such as what specific substitutions were more favored or avoided as shown by LOS values.

Nonpolar Amino Acids
Our results in this study confirm that overall composition of nonpolar AA group did not show much difference in proteins belonging to psychrophiles and mesophiles, but there is a decrease in nonpolar AA group frequency in helices and a significant increase in loop regions among proteins of psychrophiles. This is in accordance with earlier findings that there are more nonpolar amino acids on the exposed surface area of the majority of psychrophilic proteins [35,38] as more loops are observed on surface regions. Among nonpolar residues, I, L and V belong to aliphatic group of residues, which are significantly reduced to favor protein flexibility. It has been widely accepted that the aliphatic amino acids would contribute to the hydrophobic interaction for maintaining conformational stability and rigidity in core region of the proteins [36,39,40]. Observed low average hydropathy and low aliphatic residues in psychrophiles are mainly contributed by significantly low in L composition [35].

Tiny and Small Amino acids
Tiny and small amino acids are those with short side chains and are unable to participate in long range interactions among secondary structural elements and are usually confined to form local interactions. Overall their compositions are significantly increased in beta sheets and loops of psychrophilic proteins over mesophilic counter parts. This is also clear from substitutions observed in orthologous proteins. The amino acid G is devoid of side chain, is more flexible with greater rotational freedom, is capable of making cavities in the core parts of the proteins structures [39] and was shown to be in less frequency in thermophiles [41]. Whereas P with pyrolidine ring structure has restricted conformations and was shown to occur in higher frequency in thermophiles [40]. Our LOS values confirm that the amino acid G is preferred and the P is avoided in psychrophiles as compared to mesophiles.

Charged amino acids
Charged residues are polar and hydrophilic. They contribute to ion pair electrostatic interactions that are important binding force for maintaining conformational stability in surface of the proteins [37,39,42]. The more charged residues were found in thermophilic proteins than in mesophilic proteins [36,40]. The charged AA (especially basic and hydrophilic) residues are significantly avoided in psychrophiles (Table 5). Present analysis also supports the notion that these residues are significantly avoided in psychrophiles as observed from compositions as well as LOS scores. The most striking feature of psychrophilic proteins is an increase in amino acid D and decrease in amino acid E over mesophiles. The charged residues in mesophiles are mainly replaced with small and tiny residues in psychrophiles.

Aromatic Amino acids
Psychrophilic proteins and their secondary structural elements show significant decrease in aromatic amino acids. Especially F and Y that are capable of binding to cationic amino acid side chains of K and R in forming cation-π interactions and play important role in stabilizing three dimensional structure of proteins [43][44][45]. Our studies show that, a significant decrease in aromatic residues occurs by substitution with tiny/small and neutral amino acids in psychrophiles (Table 5). Finally, our LOS values, combining with PAM or BLOSUM mutual substitution scores, assist in selecting suitable mutations in designing mesophilic proteins to optimally function in cold temperatures or vice versa. For example substitution of E in mesophilic proteins with D may not change chemical characteristics significantly but may result in optimizing protein function in cold temperatures. In similar lines we observe that amino acid K is highly avoided and R is preferred in psychrophiles although both being basic amino acids.

Conclusion
We analyzed compositions of individual amino acid residues, amino acid groups and their distribution pattern in secondary structures and then computed and quantified their substitution patterns and directional preferences between mesophiles and psychrophiles. Significant differences in composition of amino acid residues were observed between the mesophilic and psychrophilic proteins as summarized below: (i) we observed an increase in frequency of individual amino acids like A, D, S and T that avoid helices and a decrease in E and L amino acids; (ii) There is an increase in small/tiny and neutral group residues which contribute to protein flexibility and a decrease in charged amino acids, particularly basic as well as hydrophilic residues that contribute to ionic interactions; (iii) there is a decrease in aromatic amino acids residues that contribute to the cation-π interactions; (iv) there is a decrease in aliphatic residues which provide good covering and masking to produce hydrophobic pockets that are involved in stabilizing protein structure; (v) there is a reduction in amino acid preferences for helices and an increase of coil forming residues; (vi) we also observed a significant level of substitutions of aliphatic and charged amino acids by tiny/small or neutral amino acids; (vii) the results from this analysis, especially significant t-values of LOS substitution pairs, can be used as a knowledge base in rational design of mutations for engineering of mesophilic proteins to function optimally in cold temperatures or vice versa.

Methods
Proteome sequences from six members each of available completely sequenced species of psychrophiles and mesophiles (listed in the foot note of Table 1) are collected. They were selected randomly from independent genuses of mesophiles and 6 of 9 available completely sequenced genomes of psychrophiles to control plausible variations from phylogenetic non-independence (PNI) as related species may share the similar traits due to shared ancestry. These proteome sequences were downloaded from the NCBI proteome project server in the fasta format. The growth temperatures of these species were obtained from NCBI [46] and/or PGTdb (Prokaryotic Growth Temperature database) [47]. We computed the frequencies of amino acid residues in the protein sequences of psychrophilic and mesophilic proteomes. We also grouped the amino acids into 12 property groups [48] as follows: Acidic amino acids group include D and E; aliphatic: I, L and V; aromatic: H, F, W and Y; basic: R, H, and K; The +ve values indicates that the corresponding substitution of mesophilic amino acid property groups to the respective psychrophilic groups is higher than such substitutions within the mesophilic homologous proteins. The values shown to be significant at 90% confidence level by student ttest are shown in bold.  Table 1). Some of the amino acids are included in more than one property group.

Student t-test
To compare the means of two groups of data, t-test is essentially a good tool for the signal-to-noise metaphor used in research analysis. IIn present analysis, we compared the mean frequencies of single amino acids, 12 different property groups of amino acids and three secondary structural elements from all protein sequences of psychrophilic and mesophilic proteomes considered in this analysis. The t-values are calculated as follows: Where Var Psychro and Var Meso are the variance of residues or property groups; (F Psychro ) and (F Meso ) are mean frequencies of psychrophilic and mesophilic proteomes respectively. The n Psychro and n Meso are the total number of psychrophilic and mesophilic proteomes investigated in this study, respectively. Based on student's t-distribution table of significance, critical values for such t-test at various probabilities are as follows (see Table 6): If t-value is positive and greater than critical value at 10% probability (1.372) then the mean frequency (F Psychro ) of psychrophilic proteomes is significantly greater than that of the mesophilic proteomes (F Meso ) at 90% or higher confidence level. If the frequency of residue or property group t-value is negative and less than -1.372 then the mean frequency of psychrophilic proteomes (F Psychro ) is significantly less than that of mesophilic proteomes (F Meso ) at 90% or higher confidence level [18,40,49].

Secondary Structure Prediction
We predicted secondary structural elements in protein sequences using GTOP [50] and/or PSIPRED [51]. We used these predictions to compute frequencies of different amino acids and property groups of residues in three major secondary structural regions, helix (H), strand (E) and coil (C). PSIPRED is a highly reliable secondary structure prediction method with ~83% reported prediction accuracy. We have also tested its prediction accuracy on some of the known psychrophilic and mesophilic proteins to see if there is any significant difference in its prediction for psychrophilic proteins. A total of about 25 proteins, approximately 5000 residues, from each we observed that PSIPRED prediction accuracy was 78.42% and 80.89% for psychrophilic and mesophilic proteins, respectively.

Comparative Analysis of Amino Acid Substitutions
All protein sequences from each mesophilic species in dataset were searched against each proteome of psychrophylic species and vice versa, using BLASTP [52] with 10 -3 expectation value cutoff and considerable length coverage. We picked up pair-wise alignments obtained from BLAST results of each protein sequence in a query proteome that showed best hit homolog (ortholog) in the subject proteome. The pair-wise alignments (exclusion of gapped regions) were parsed to calculate amino acid substitution counts between the two proteins from respective proteomes. The substitution counts were normalized to total amino acids present in their respective proteomes pairs individually and finally to all the pairs. The resultant frequency of substitutions was further used to calculate two types of likelihood log odd scores (LOS): Where F(X Meso → Y Psychro ) represent normalized frequency of amino acid X in mesophile substituted by an amino acid Y in psychrophile. The LOS values are calculated by using background substitution frequencies among the mesophilic and/or psychrophilic proteomes in the denominator. The LOS scores, therefore, indicated the pattern of substitutions that are predominantly due to their thermal adaptation and therefore minimize the effect of substitutions due to any speciation events in the evolution process.

Authors' contributions
MRPR and BVBR conceived of the study and participated in its design and coordination. MRPR carried out the analysis and authored the first draft of this manuscript. BVBR was involved in useful discussions and provided comments and revisions to the final version of this text. Both authors read and approved the final manuscript.