The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms
© Kiraga et al; licensee BioMed Central Ltd. 2007
Received: 07 September 2006
Accepted: 12 June 2007
Published: 12 June 2007
The distribution of isoelectric point (pI) of proteins in a proteome is universal for all organisms. It is bimodal dividing the proteome into two sets of acidic and basic proteins. Different species however have different abundance of acidic and basic proteins that may be correlated with taxonomy, subcellular localization, ecological niche of organisms and proteome size.
We have analysed 1784 proteomes encoded by chromosomes of Archaea, Bacteria, Eukaryota, and also mitochondria, plastids, prokaryotic plasmids, phages and viruses. We have found significant correlation in more than 95% of proteomes between the protein length and pI in proteomes – positive for acidic proteins and negative for the basic ones. Plastids, viruses and plasmids encode more basic proteomes while chromosomes of Archaea, Bacteria, Eukaryota, mitochondria and phages more acidic ones. Mitochondrial proteomes of Viridiplantae, Protista and Fungi are more basic than Metazoa. It results from the presence of basic proteins in the former proteomes and their absence from the latter ones and is related with reduction of metazoan genomes. Significant correlation was found between the pI bias of proteomes encoded by prokaryotic chromosomes and proteomes encoded by plasmids but there is no correlation between eukaryotic nuclear-coded proteomes and proteomes encoded by organelles. Detailed analyses of prokaryotic proteomes showed significant relationships between pI distribution and habitat, relation to the host cell and salinity of the environment, but no significant correlation with oxygen and temperature requirements. The salinity is positively correlated with acidicity of proteomes. Host-associated organisms and especially intracellular species have more basic proteomes than free-living ones. The higher rate of mutations accumulation in the intracellular parasites and endosymbionts is responsible for the basicity of their tiny proteomes that explains the observed positive correlation between the decrease of genome size and the increase of basicity of proteomes. The results indicate that even conserved proteins subjected to strong selectional constraints follow the global trend in the pI distribution.
The distribution of pI of proteins in proteomes shows clear relationships with length of proteins, subcellular localization, taxonomy and ecology of organisms. The distribution is also strongly affected by mutational pressure especially in intracellular organisms.
The abundance of genomic data allows for answering questions concerning the whole genomes' or proteomes' structure and evolution. One of the most intriguing recent findings is the distribution of isoelectric point (pI) of proteins in the whole proteomes. So called virtual 2D gels (i.e. plots where pI of proteins is plotted against their molecular weight) seem to be universal – in all proteomes they are usually bimodal with very low fractions of proteins with pI close to 7.4 [1–8]. More detailed analyses have shown that the distribution is connected with specific properties of basic and acidic residues of amino acids and combinations of their pK values. The probability of constructing a protein with pI close to 7.4 of naturally occurring amino acids is low. This is in agreement with expectations because proteins are most insoluble, least reactive and unstable in pH close to their pI, and pH of the majority of the cell interior compartments is close to 7.5 [5, 9]. Thus, this property of proteomes could be the result of selection at the very early steps of evolution.
Different relationships between pI and other phenomena were discovered. Schwartz et al.  observed the correlation between the trimodal distribution of pI and the subcellular localization of proteins. Knight et al.  characterized more than 100 prokaryotic and eukaryotic proteomes by virtual 2D-gels and observed very little or no relations to phylogeny while significant relationship existed with the ecological niche of organisms. Moreover, they found a negative correlation between proteome size and basicity for the smallest and the most basic proteomes. Also very well documented has been the shift of pI distribution toward acidicity in halophilic bacteria which is probably related to their adaptation to high salt environments [10–12]. Similarly, it has been assumed that the basic proteomes of Coxiella burnetti and Helicobacter pylori are connected with an adaptation to acidic environment  and the basic proteomes of some Archaea are an adaptation to high temperature . On the other hand, Nandi et al.  have focused on an evolutionary approach and found that the molecular weight of proteins is a much more conserved feature than their pI value. They concluded that a lot of orthologous proteins change their pI between acidic and basic and only a few stay exclusively acidic or basic in different organisms. Furthermore, Schwartz et al.  and Knight et al.  found that membrane proteins are larger and more basic than non-membrane ones whereas Nandi et al.  found that many orthologous membrane proteins have very variable pI values and may be used as markers to predict the organism's ecological niche.
In this paper we have broadened our analysis to proteomes of organelles, viruses and bacteriophages and applied different methods and parameters to describe quantitatively the pI distribution of proteomes. One of the aims of these studies is uncovering the relationship between the pI and the protein length. In the previous analyses, such correlations even if existing, were neglected. Furthermore, we have analysed the relations between the pI distributions of proteomes and the taxonomy and ecology of corresponding organisms considering different taxonomical levels and ecological signatures such as habitat, relation to the host cell, salinity, oxygen and temperature requirements. Moreover, we have tried to explain the observed relationship between the proteome size and the pI distribution of proteomes .
Results and Discussion
General properties of isoelectric point distributions in proteomes
To simplify comparative analyses of the pI distribution of different proteomes, we have divided the whole sets of proteins into two sets called later the acidic and basic sets. The division point corresponds to the pI value for which the pI distribution reaches the minimum between acidic and basic sets (see the Methods section for details). For most of proteomes the division point was between 7.4 and 7.5. The distribution of the values of division points is very narrow – the range is 7.22 – 7.54 with the average and median 7.41. We have not found any correlation between the pH of these points and the taxonomical or ecological classification of organisms and genome sizes. The universality of the pH value of the minimum in pI distributions supports the conclusion of other authors that the bimodal distribution of pI results from intrinsic chemical properties of amino acids .
For each proteome we have calculated the average pI separately for the basic and the acidic sets of proteins, the average length of proteins and the pI bias (see the Methods section). The pI bias simply describes the asymmetry of the bimodal distribution of pI (Fig. 1). It ranges from -100% to 100%. These two extreme values indicate that all proteins in a given proteome are acidic or basic, respectively. Value close to zero means that a given proteome has a similar fraction of acidic and basic proteins. We have found that if a proteome has more acidic proteins, the average pI of proteins of the acidic set is lower and the relationship is statistically significant (correlation coefficient, r = 0.76, p < 0.001) – additional data file 5A. On the other hand, if the basic proteins prevail in the proteome it is not connected with the shift of average pI value of basic proteins with exception of some intracellular microorganisms (additional data file 5B). We have also found that the basic sets have greater variance of their pI than the acidic sets that indicates greater diversity of the basic proteins.
Relationships between pI value and size of proteins
Properties of artificial and real E. coli proteomes.
Correlation coefficient for
pI bias [%]
the average amino acid composition of E. coli proteins;
the length of E. coli proteins
-32 (-29 to -36)
equal frequencies of amino acids;
the length of E. coli proteins
-5 (-8 to -2)
the average amino acid composition of E. coli proteins;
the uniform length distribution in the range of E. coli proteins
-50 (-53 to -46)
the real proteome of E. coli K12
The results mean that the longer proteins can maintain the neutral or nearly neutral pI whereas the most extreme pI values are specific for shorter proteins. It probably results from purely statistical reasons. The shorter proteins show higher fluctuation of amino acid composition, which strongly influences their pI – incorporation of even one charged residue shifts significantly the protein pI to the lower or higher value. On the other hand, long proteins, usually composed of more charged amino acids much better buffer the effect of fluctuation in their composition and can keep their pI even close to 7.4. This 'statistical' explanation of the relationship between proteins size and pI does not exclude the possibility that this relation may have some biological consequences in a diversification of proteins' structure and function. For example, if there exists a selection constraint to generate a very acidic or very basic protein, the easiest way to accomplish that is "to make" it short. Actually, very basic proteins interacting with nucleic acids – ribosomal proteins – are usually very short. We think that more proteins subjected to such selection should be found. These proteins may belong to specific regulatory proteins, transcriptional factors, modulators, signalling proteins, small proteins interacting with other proteins etc.
We have also noticed a difference in the length of acidic and basic proteins. The comparative analyses have shown that acidic proteins are significantly longer than basic ones (t-Student test, Benjamini-Hochberg adjusted p < 0.05) in more than 95% of Archaea, Bacteria and Eukaryota proteomes. The acidic proteins are on average 73 and 107 amino acid residues longer than the basic ones in the case of Prokaryota and Eukaryota, respectively. Probably it is connected with the presence of very short ribosomal proteins in the basic set and long aminoacyl-tRNA synthetases in the acidic one .
Nandi et al.  observed that the size of orthologous proteins found in closely related organisms is much more conserved than their pI. We compared the length of orthologous proteins of many proteomes whose pI values were changed from acidic to basic or vice versa but we did not observe any statistically significant differences in the length of these proteins, either. We found that the change of pI did not depend on the length of the compared orthologs.
Relationships between pI values of proteomes and taxonomy
Our analyses of prokaryotic proteomes performed on the lower taxonomical level did not show any relation between the pI bias and phylogeny, which is in agreement with the results of Knight et al. . Many monophyletic or closely related groups (e.g. taxa of Archaea or proteobacteria) were not grouped when the pI bias was used as a criterion. It indicates that the pI distribution has not been conserved during evolution of prokaryotes. We did not observe such a relationship for eukaryotic proteomes, either.
Comparison of the pI bias for subsets of organellar proteomes.
Nuclear-coded proteins targeted to organelle
the whole sequence
the mature protein
mitochondrion – Protista
mitochondrion – Viridiplantae
mitochondrion – Fungi
mitochondrion – non-Chordata
mitochondrion – Chordata
The observed phylogenetic signal in the pI bias is probably the result of different composition of proteomes. Mitochondrial proteomes of Viridiplantae, Protista and Fungi are usually several times larger than Metazoa proteomes that contain usually 12–13 proteins. Actually, the most of proteins (e.g. ribosomal proteins) that are absent from Metazoa but present in Viridiplantae, Protista and Fungi are very basic (additional data file 7). Therefore, when we performed the same analysis based on the proteomes consisting only of 12 proteins present in the most of mitochondrial proteomes the phylogenetic signal disappeared and the pI bias of basic proteomes became neutral or even acidic (for Fungi). We did not find any relationship between the pI bias and phylogeny on lower taxonomic levels: among metazoan phyla and subgroups of Craniata when analysing both full proteomes and proteomes containing only the 12 common proteins.
Similarly to Knight et al.  but analysing a larger set of proteomes, we have found significant correlation between the pI bias of proteomes coded by prokaryotic genomes and their plasmids (N = 63, r = 0.74, p < 0.001). There is no such correlation between the pI bias of eukaryotic nuclear-coded proteomes and proteomes of their organelles (N = 36, r = 0.04, p = 0.81). We have obtained similar results when analysing the average pI values of acidic and basic sets separately and no correlation was found in these two cases when the average length of proteins in these two sets was studied. It seems that organelle genomes are more independent than plasmids to follow the genomic trends in pI, which seems to be not true for plasmids. For example, the organelles are separated by two membranes and posses their own replicational, transcriptional and translational machinery. Plasmids code generally for more basic proteins than chromosomes but a mutational pressure or some selection constraints affect these two proteomes simultaneously and gene transfer between plasmids and chromosomes occurs probably more often than between organellar and nuclear genomes. If more data are available, it would be interesting to analyze such relationships between viral or phagal proteomes and proteomes of infected organisms.
Relationships between pI values and subcellular localization of proteins
The analysed mitochondrial and plastid proteomes include only proteins coded by the organellar genomes. However, there are many proteins encoded by nuclear genes, which are targeted to the organelles. The nuclear-encoded organellar proteins are usually equipped with N-terminal targeting signals (transit peptides) responsible for their import into organelles. These peptides are cleaved off by organellar peptidases after the proteins are transported. Because these presequences are rich in basic amino acid residues, the pI of premature unprocessed proteins (i.e. the whole sequences) should be more basic than the pI of mature proteins (i.e. without transit peptide). In the Tab. 2 we have compared the pI bias of the premature and mature proteins with proteomes coded by organellar genomes. Because mitochondrial proteomes differ between various taxonomical groups, we analysed them separately. (See additional data file 8 for distributions of pI and relationships between length of proteins and their pI for these proteomes.) As it was expected, the presence of very basic transit peptides (pI bias 93 – 100%) in the premature proteins shifts the pI distribution of these proteins towards basicity (i.e. higher values of pI bias), whereas most mature proteins are acidic. Interestingly, only in plastid and mitochondrial proteomes of plants a weak surplus of acidic premature proteins still exists. The nuclear-encoded organellar proteomes generally differ from the organelle-encoded ones. The pI bias values only of the mitochondrial premature proteins of Viridiplantae and Fungi and mitochondrial mature proteins of Chordata fall within the quartile range of the pI bias of corresponding organelle-encoded proteomes.
Comparison of the pI bias for proteomes of different subcellular localization.
integral to membrane
Relationships between pI values of proteomes and ecology of their organisms
Statistical analysis of the pI bias for different ecological groups.
relation to host cell
The analyses showed that salinity is positively correlated with the acidity of proteomes – the more halophilic organisms have more acidic proteomes. It agrees with results of other authors who observed in halophiles predominance of acidic over basic residues [18–22] and low isoelectric point of their proteins [10–12]. Extremely halophilic and moderately halophilic bacteria are present only in the 'acidic' class and mesohalophiles disappear in the 'basic' class (Fig. 5C). This relationship is usually explained by the higher stability and solubility of proteins rich in acidic residues in hypersaline environment [19, 23–28].
Considering habitat preferences, host-associated organisms have the least acidic proteomes compared to other groups and aquatic bacteria possess the most acidic ones. In the 'acidic' class aquatic bacteria are the most overrepresented group and host-associated species the most underrepresented one (Fig. 5D). On the other hand, the 'basic' class contains only host-associated microorganisms. Although proteomes of host-associated species are shifted towards more basic proteomes, they are still acidic on average (Tab. 4). However, a more detailed classification of organisms considering their relation to the host cell has revealed that proteomes of intracellular bacteria are on average basic and extracellular and free-living/intracellular species have slightly acidic proteomes (Fig. 5E). More acidic proteomes are characteristic of free-living/extracellular and free-living species. Actually, these two groups are overrepresented in the 'acidic' class whereas intracellular bacteria are strongly overrepresented in the 'basic' class. The results show that the more an organism is related to the host cell the more basic proteome it has. The explanation of this result will be discussed in the next section where the relationship between the pI bias, proteome size and GC content of genome is considered. We have also noticed that all intracellular organisms that have slightly acidic proteomes (Anaplasma, Brucella, Chlamydiae, Ehrlichia) and the majority of them (with only one exception) that have slightly basic proteomes with the pI bias ≤ 20% (Bartonella, Coxiella, Parachlamydia, Rickettsia conorii, Tropheryma, Wolbachia) reside and usually replicate in vacuoles or phagosomes. What is interesting, R. conorii has the least basic proteome within the Rickettsia genus and as the only representative of its genus was observed in a vacuole . It would imply that environment of vacuoles modifies proteomes of intracellular organisms towards acidicity.
Ecological changes quite strongly and quickly influence the pI bias in the course of evolution of proteomes because the changes in the pI bias are seen even among closely related organisms. We have gathered all analysed species belonging to the same genus and having different ecological assignments (see additional data file 10). In every case the pI bias of host-associated species is more shifted towards higher values (i.e. basicity) than the bias of species living in the multiple environments. The most pronounced examples are two species of Burkholderia of which one lives in a terrestrial habitat and posses the acidic proteome and the other one is associated with a host and has the basic proteome. The proteomes of other host-associated species, although shifted towards basicity, are still acidic probably because these species are still facultative and extracellular parasites. Moreover, the clear shift of the pI bias is visible when species living in different salinity requirements are compared. The proteomes of halophilic and mesohalophilic species have more acidic proteins than non-halophilic ones.
Relationships between pI values of proteomes, their sizes and GC content of genomes
The relationship between pI value of proteomes, their sizes and GC content of genomes was analysed by Knight et al. . They concluded that there is no correlation between median pI and GC content in small basic proteomes and have opened the discussion about the reasons of basicity of tiny proteomes by giving two unverified potential explanations: selection of proteins in the proteome and selection within particular proteins. To see how general these relations are, we have performed analyses on a larger set of proteomes using the pI bias and considering the relation of organisms to host cell in more detail.
The observed correlations are clearer when only species with proteomes smaller than 1000 proteins are considered. The correlation coefficients change from -0.64 to -0.82 for the relationship between the pI bias and proteome size and from -0.49 to -0.58 for the relationship between the pI bias and GC content and are statistically significant. Therefore, the hypothesis that the inefficiency of DNA repairing mechanisms in intracellular microorganisms causes the large AT bias of their small genomes and in consequence greater content of basic lysine  can not be completely excluded. It is possible that there are some other reasons of the observed relationships between the basicity of the proteome and the reduction of its size such as: gain or loss of some groups of proteins causing the pI shift of the whole proteome or selection for pI changes of particular proteins, e.g. involved in adaptation to the new environment or host . The former hypothesis explains at least the differences between mitochondrial proteomes (see above).
The results indicate that AT bias alone cannot explain the observed basicity of tiny proteomes. We have not observed any correlation for these virtual proteomes between the pI bias and proteome size also. It seems that the acceleration of mutation accumulation itself could be responsible for the basicity of tiny proteomes. It is in agreement with the well-documented higher evolutionary rate of intracellular bacteria than their free-living relatives. It has been suggested that the higher rate of evolution results from enhanced mutation rate  and/or Muller's ratchet effect or the easier fixation of mutations by genetic drift in small asexual populations [37–46]. Accordingly, the increase of basicity of proteomes and the increase of mutational AT bias would be paralelly ongoing phenomena resulting from the higher evolution rate of genomes of intracellular bacteria. The elevated AT bias probably results from the elimination of genes encoding DNA repair and recombination-associated enzymes or at least from the decrease in their efficiency in intracellular organisms [31, 32, 37, 47–49]. These enzymes would normally correct the error-related tendency toward AT enrichment, for example the deamination of cytosine to uracil, which is then replaced by thymine.
To confirm our explanation we have analysed 39 sets of much conserved orthologous proteins present in each of 100 selected prokaryotic organisms, representing quite uniformly the values of the pI bias. We have assumed that if the increase in substitution rate affects the pI distribution we should observe a positive correlation between the pI of these proteins and the pI bias of the whole proteomes. Actually, for 36 of them we have found such statistically significant correlation (see additional data file 12). Three sets, for which the correlation was not significant, represent the most basic ribosomal proteins and maybe their further shift was improbable. The results indicate that even much conserved proteins subjected to strong selectional constraints follow the global trend in the pI distribution. Such a relationship concerns probably many other proteins because analyses of pI of Clusters of Orthologous Groups (COGs) showed that proteins of only few clusters are conserved and stay in the same acidic or basic set while the majority of them jump between the two sets . Many of these promiscuous proteins are membrane proteins that have direct contact with the external environment and may be considered adaptive proteins. Therefore it would favour the hypothesis about the selection for pI changes of particular proteins. However, it does not fully explain the basicity of small prokaryotic proteomes because the changes of pI concern also many non-membrane and much conserved proteins. (Relationships between pI and other phenomena we discussed in additional data file 13.)
It would be interesting to investigate the relationship between the change of the pI distribution of proteomes and the transition of organisms from the free to intracellular way of life on different stages of genome reduction. Very insightful results concerning this subject would give sequencing and analysis of reduced genomes of bacterial endosymbionts identified in eukaryotic hosts, e.g. a cyanobacterium in amoeba Paulinella chromatophora, a cyanobacterium Cyanothece in diatom Rhopalodia gibba, a Gram-negative bacterium in diatom Pinnularia nobilis and many different bacterial endosymbionts found in various species of insects. Additional interesting results would come from analyses of nucleomorphs – small, reduced eukaryotic nuclei found in certain plastids present in some groups of algae such as cryptomonads and chlorarachniophytes. Moreover, it is interesting to estimate how many mutations are required or should be accepted to remodel one proteome to another in the aspect of their pI. However, many factors such as length, amino acid composition, pI of the original proteins, mutation rate and patterns of nucleotide substitutions and resulting patterns of amino acid substitutions should be taken into account.
Although the distribution of pI of proteins in proteomes is generally bimodal, different species have different abundance of acidic and basic proteins that is correlated with ecology of these species, especially with habitat, relation to the host cell and salinity of the environment. The pI distribution is also related with taxonomy of organism but only on higher taxonomical levels and subcellular localization of some proteomes. The other factor that shapes the distribution is the rate of mutations accumulation. The rate is higher in intracellular organisms than in free-living ones and it is responsible for the basicity of tiny proteomes that explains the observed relationship between the proteome size and pI of proteomes.
Proteomic sets were downloaded from different sources (see additional data file 1 for details): National Center for Biotechnology Information, European Bioinformatics Institute, DOE Joint Genome Institute, Broad Institute of MIT and Harvard, The Institute for Genomic Research (TIGR), Welcome Trust Sanger Institute, Ensembl project, Stanford Genomic Resources, Virginia Commonwealth University, National Institute of Genetics, Japan, PlasmoDB, International Fugu Genome Consortium, Genoscope, DictyBase and SilkDB. We have not analysed proteomes containing less than 10 proteins. In sum we have analysed 1784 proteomes grouped in the following sets encoded by: chromosomes in Archaea (22), Bacteria (210) and Eukaryota (63), mitochondria (720), plastids (42), prokaryotic plasmids (319), phages (245) and viruses (163). Note that in the paper, in the proteomes of Archaea, Bacteria and Eukaryota only chromosome-encoded proteins are referred to. Sequences of nuclear-encoded proteins with annotated transit peptides and targeted to mitochondria (2021) or plastids (1173) were downloaded from UniProt database . Proteomes of 12 subcellular localizations (in sum 13,039 proteins) were extracted from non-redundant datasets from DBSubLoc database .
Isoelectric points were calculated using the standard iterative algorithm [52, 53] that gives relatively precise results of pI calculations for raw protein sequences [1, 2]. The algorithm is used in the Compute pI/Mw tool at the ExPASy server . The source code of the algorithm was kindly supplied by Elisabeth Gasteiger.
Each proteome was divided into two sets named the acidic and the basic one according to the pI value of its proteins. To find the point of division of proteomes for the two sets, we ranked proteins according to their pI values, cut off 10% tails of both acidic and basic proteins and the rest of the proteome distribution was scanned for the largest difference in pI between two neighbouring proteins. The point of division between acidic and basic proteins was set in the middle of this distance. Because of statistical reasons, the procedure was applied only to big chromosome-encoded proteomes of Archaea, Bacteria and Eukaryota. Because of the narrow range and the universality of the mid-point, we assumed as the division point of smaller proteomes encoded by plasmids, mitochondria, plastids, viruses and phages the median of the mid point calculated for the big proteomes which equals 7.41.
Proteomes were characterized by the average pI and the average length of proteins separately for the basic and the acidic sets of proteins and by the "pI bias" (b) describing the relation between the number of proteins in the basic set and the number of proteins in the acidic set: b = 100· (Nbasic-Nacidic)/(Nbasic+Nacidic), where Nacidic and Nbasic denote the numbers of proteins in the acidic and the basic sets, respectively.
The different sets of proteomes were clustered by the UPGMA (Unweighted Pair-Group Method Arithmetic Averages) method based on the median of the pI bias. The clustering was performed with the neighbour program from the PHYLIP 3.6 package . To evaluate the reliability of specific clades in UPGMA trees we created 10 000 matrices of median of the pI bias generated by the random sampling of 2/3 members of each group of proteomes (subsampling method). Then we applied the neighbor and consense programs (from the PHYLIP package) to calculate the percent of randomised trees containing a given clade. Moreover, the WeightLESS program  was used to perform the WLS-LRT (Weighted least-squares likelihood ratio test) and F-test.
All analysed prokaryotic proteomes were classified according to five ecological signatures: habitat (aquatic, host-associated, multiple, specialized, terrestrial), relation to host cell (extracellular, free living, free living/extracellular, free living/intracellular, intracellular), salinity (extreme halophilic, moderate halophilic, mesohalophilic, non-halophilic), oxygen (aerobic, anaerobic, facultative, microaerophilic) and temperature requirements (hyperthermophilic, mesophilic, psychrophilic, thermophilic). The classification was based on the data published on the NCBI web site, papers related to the sequenced genomes and other sources. A given species was assigned only to one subgroup, the most typical for its ecological property. To analyse the relationship between the pI bias and ecological classification of proteomes the analysed proteomes of particular ecological subgroups were distributed among three classes of the pI bias ('acidic': -90% to -30%, 'neutral': -30% to 30% and 'basic': 30% to 90%). Then the observed numbers of proteomes in the given class were compared to the expected ones by χ2 test.
Sets of orthologous proteins of prokaryotic organisms were downloaded from Microbial Genome Databse – MBGD .
The non-parametric Kruskal-Wallis test and t-Student test were applied accordingly to determine statistical significance of tested hypotheses. The Benjamini-Hochberg multiple comparisons procedure for controlling the false discovery rate was used .
We would like to thank Elisabeth Gasteiger for sending us the source code of the algorithm calculating pI. We are grateful to two anonymous Reviewers for their insightful comments and suggestions. The work was supported by Polish Foundation for Science and by grant #1016/S/IGiM/05. It was done in the frame of the ESF program COST Action P10.
- Link AJ, Hays LG, Carmack EB, Yates JR: Identifying the major proteome components of Haemophilus influenzae type-strain NCTC 8143. Electrophoresis. 1997, 18: 1314-1334. 10.1002/elps.1150180808.PubMedView ArticleGoogle Scholar
- Link AJ, Robison K, Church GM: Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12 . Electrophoresis. 1997, 18: 1259-1313. 10.1002/elps.1150180807.PubMedView ArticleGoogle Scholar
- VanBogelen RA, Abshire KZ, Moldover B, Olson ER, Neidhardt FC: Escherichia coli proteome analysis using the gene-protein database. Electrophoresis. 1997, 18: 1243-1251. 10.1002/elps.1150180805.PubMedView ArticleGoogle Scholar
- Urquhart BL, Cordwell SJ, Humphery-Smith J: Comparison of Predicted and Observed Properties of Proteins Encoded in the Genome of Mycobacterium tuberculosis H37Rv. Biochem Biophys Res Commun. 1998, 253: 70-79. 10.1006/bbrc.1998.9709.PubMedView ArticleGoogle Scholar
- VanBogelen RA, Schilles EE, Thomas JD, Neidhardt FC: Diagnosis of cellular states of microbial organisms using proteomics. Electrophoresis. 1999, 20: 2149-2159. 10.1002/(SICI)1522-2683(19990801)20:11<2149::AID-ELPS2149>3.0.CO;2-N.PubMedView ArticleGoogle Scholar
- Schwartz R, Ting CS, King J: Whole Proteome pI Values Correlate with Subcellular Localizations of Proteins for Organisms within the Three Domains of Life. Genome Res. 2001, 11: 703-709. 10.1101/gr.GR-1587R.PubMedView ArticleGoogle Scholar
- Knight ChG, Kassen R, Hebestreit H, Rainey PB: Global analysis of predicted proteomes: Functional adaptation of physical properties. Proc Natl Acad Sci USA. 2004, 101: 8390-8395. 10.1073/pnas.0307270101.PubMed CentralPubMedView ArticleGoogle Scholar
- Weiller GF, Caraux G, Sylvester N: The modal distribution of protein isoelectric points reflects amino acid properties rather than sequence evolution. Proteomics. 2004, 4: 943-949. 10.1002/pmic.200200648.PubMedView ArticleGoogle Scholar
- Arakawa T, Timasheff SE: Theory of protein solubility. Methods Enzymol. 1985, 114: 49-77.PubMedView ArticleGoogle Scholar
- Kennedy SP, Ng WV, Salzberg SL, Hood L, DasSarma S: Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res. 2001, 11: 1641-1650. 10.1101/gr.190201.PubMed CentralPubMedView ArticleGoogle Scholar
- Goo YA, Roach J, Glusman G, Baliga NS, Deutsch K, Pan M, Kennedy S, DasSarma S, Ng WV, Hood L: Low-pass sequencing for microbial comparative genomics. BMC Genomics. 2004, 5: 3-10.1186/1471-2164-5-3.PubMed CentralPubMedView ArticleGoogle Scholar
- Mongodin EF, Nelson KE, Daugherty S, Deboy RT, Wister J, Khouri H, Weidman J, Walsh DA, Papke RT, Sanchez Perez G, Sharma AK, Nesbo CL, MacLeod D, Bapteste E, Doolittle WF, Charlebois RL, Legault B, Rodriguez-Valera F: The genome of Salinibacter ruber : convergence and gene exchange among hyperhalophilic bacteria and archaea. Proc Natl Acad Sci USA. 2005, 102: 18147-18152. 10.1073/pnas.0509073102.PubMed CentralPubMedView ArticleGoogle Scholar
- Seshadri R, Paulsen IT, Eisen JA, Read TD, Nelson KE, Nelson WC, Ward NL, Tettelin H, Davidsen TM, Beanan MJ, Deboy RT, Daugherty SC, Brinkac LM, Madupu R, Dodson RJ, Khouri HM, Lee KH, Carty HA, Scanlan D, Heinzen RA, Thompson HA, Samuel JE, Fraser CM, Heidelberg JF: Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Nat Acad Sci USA. 2003, 100: 5455-5460. 10.1073/pnas.0931379100.PubMed CentralPubMedView ArticleGoogle Scholar
- Kawashima T, Amano N, Koike H, Makino S, Higuchi S, Kawashima-Ohya Y, Watanabe K, Yamazaki M, Kanehori K, Kawamoto T, Nunoshiba T, Yamamoto Y, Aramaki H, Makino K, Suzuki M: Archaeal adaptation to higher temperatures revealed by genomic sequence of Thermoplasma volcanium. Proc Natl Acad Sci USA. 2000, 97: 14257-14262. 10.1073/pnas.97.26.14257.PubMed CentralPubMedView ArticleGoogle Scholar
- Nandi S, Mehra N, Lynn A, Bhattacharya A: Comparison of theoretical proteome: Identification of COGs with conserved and variable PI with the multimodal PI distribution. BMC Genomics. 2005, 6: 116-10.1186/1471-2164-6-116.PubMed CentralPubMedView ArticleGoogle Scholar
- Baldauf SL: A Search for the Origins of Animals and Fungi: Comparing and Combining Molecular Data. Am Nat. 1999, 154 (S4): S178-S188. 10.1086/303292.PubMedView ArticleGoogle Scholar
- Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF: A kingdom-level phylogeny of eukaryotes based on combined protein data. Science. 2000, 290: 972-977. 10.1126/science.290.5493.972.PubMedView ArticleGoogle Scholar
- Lanyi JK: Salt-dependent properties of proteins from extremely halophilic bacteria. Bacteriol Rev. 1974, 38: 272-290.PubMed CentralPubMedGoogle Scholar
- Bonnete F, Madern D, Zaccai G: Stability against denaturation mechanisms in halophilic malate dehydrogenase "adapt" to solvent conditions. J Mol Biol. 1994, 244: 436-447. 10.1006/jmbi.1994.1741.PubMedView ArticleGoogle Scholar
- Ng WV, Kennedy S, Mahairas GG, Berquist B, Pan M, Shukla HD, Lasky SR, Baliga N, Thorsson V, Sbrogna J, Swartzell S, Weir D, Hall J, Dahl TA, Welti R, Goo YA, Leithauser B, Keller K, Cruz R, Danson MJ, Hough DW, Maddocks DG, Jablonski PE, Krebs MP, Angevine CM, Dale H, Isenbarger TA, Peck RF, Pohlschroder M, Spudich JL, Jung K, Alam M, Freitas T, Hou S, Daniels CJ, Dennis PP, Omer AD, Ebhardt H, Lowe TM, Liang P, Riley M, Hood L, DasSarma S: Genome Sequence of Halobacterium Species NRC-1. Proc Nat Acad Sci USA. 2000, 97: 12176-12181. 10.1073/pnas.190337797.PubMed CentralPubMedView ArticleGoogle Scholar
- Karlin S, Brocchieri L, Trent J, Blaisdell BE, Mrazek J: Heterogeneity of genome and proteome content in Bacteria, Archaea, and Eukaryotes. Theor Popul Biol. 2002, 61: 367-390. 10.1006/tpbi.2002.1606.PubMedView ArticleGoogle Scholar
- Fukuchi S, Yoshimune K, Wakayama M, Moriguchi M, Nishikawa K: Unique amino acid composition of proteins in halophilic bacteria. J Mol Biol. 2003, 327: 347-357. 10.1016/S0022-2836(03)00150-5.PubMedView ArticleGoogle Scholar
- Rao JK, Argos P: Structural stability of halophilic proteins. Biochemistry. 1981, 20: 6536-6543. 10.1021/bi00526a004.PubMedView ArticleGoogle Scholar
- Dym O, Mevarech JL, Sussman L: Structural features that stabilize halophilic malate dehydrogenase from an archaebacterium. Science. 1995, 267: 1334-1346. 10.1126/science.267.5202.1344.View ArticleGoogle Scholar
- Eisenberg H: Life in unusual environments: progress in understanding the structure and function of enzymes from extreme halophilic bacteria. Arch Biochem Biophys. 1995, 318: 1-5. 10.1006/abbi.1995.1196.PubMedView ArticleGoogle Scholar
- Madern D, Zaccai G: Stabilisation of halophilic malate dehydrogenase from Haloarcula marismortui by divalent cations – effects of temperature, water isotope, cofactor and pH. Eur J Biochem. 1997, 249: 607-611. 10.1111/j.1432-1033.1997.00607.x.PubMedView ArticleGoogle Scholar
- Elcock AH, McCammon JA: Electrostatic contributions to the stability of halophilic proteins. J Mol Biol. 1998, 280: 731-748. 10.1006/jmbi.1998.1904. [http://www.genome.org/cgi/external_ref?access_num=10.1006/jmbi.1998.1904&link_type=DOI]PubMedView ArticleGoogle Scholar
- Oren A, Mana L: Amino acid composition of bulk protein and salt relationships of selected enzymes of Salinibacter ruber, an extremely halophilic Bacterium. Extremophiles. 2002, 6: 217-223. 10.1007/s007920100241.PubMedView ArticleGoogle Scholar
- Teysseire N, Boudier JA, Raoult D: Rickettsia conorii entry into Vero cells. Infect Immun. 1995, 63: 366-374.PubMed CentralPubMedGoogle Scholar
- Douglas S, Zauner S, Fraunholz M, Beaton M, Penny S, Deng LT, Wu X, Reith M, Cavalier-Smith T, Maier UG: The highly reduced genome of an enslaved algal nucleus. Nature. 2001, 410: 1091-1096. 10.1038/35074092.PubMedView ArticleGoogle Scholar
- Moran NA: Genome reduction in bacterial pathogens. Cell. 2002, 108: 583-586. 10.1016/S0092-8674(02)00665-7.PubMedView ArticleGoogle Scholar
- Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H: Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature. 2000, 407: 81-86. 10.1038/35024074.PubMedView ArticleGoogle Scholar
- Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, Hattori M, Aksoy S: Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat Genet. 2002, 32: 402-407. 10.1038/ng986.PubMedView ArticleGoogle Scholar
- Moran NA: Tracing the evolution of gene loss in obligate symbionts. Curr Opin Microbiol. 2003, 6: 512-518. 10.1016/j.mib.2003.08.001.PubMedView ArticleGoogle Scholar
- Muto A, Osawa S: The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA. 1987, 84: 166-169. 10.1073/pnas.84.1.166.PubMed CentralPubMedView ArticleGoogle Scholar
- Itoh T, Martin W, Nei M: Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts. Proc Natl Acad Sci USA. 2002, 99: 12944-12948. 10.1073/pnas.192449699.PubMed CentralPubMedView ArticleGoogle Scholar
- Moran NA: Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc Nat Acad Sci USA. 1996, 93: 2873-2878. 10.1073/pnas.93.7.2873.PubMed CentralPubMedView ArticleGoogle Scholar
- Andersson SGE, Kurland CG: Reductive evolution of resident genomes. Trends Microbiol. 1998, 6: 263-268. 10.1016/S0966-842X(98)01312-2.PubMedView ArticleGoogle Scholar
- Lambert JL, Moran NA: Deleterious mutations destabilize ribosomal RNA of endosymbionts. Proc Natl Acad Sci USA. 1998, 95: 4458-4462. 10.1073/pnas.95.8.4458.PubMed CentralPubMedView ArticleGoogle Scholar
- Clark MA, Moran NA, Baumann P: Sequence evolution in bacterial endosymbionts having extreme base composition. Mol Biol Evol. 1999, 16: 1586-1598.PubMedView ArticleGoogle Scholar
- Wernegreen JJ, Moran NA: Evidence for genetic drift in endosymbionts (Buchnera): analyses of protein-coding genes. Mol Biol Evol. 1999, 16: 83-97.PubMedView ArticleGoogle Scholar
- Funk DJ, Wernegreen JJ, Moran NA: Intraspecific variation in symbiont genomes: bottlenecks and the aphid Buchnera association. Genetics. 2001, 157: 477-489.PubMed CentralPubMedGoogle Scholar
- Moran NA, Mira A: The process of genome shrinkage in the obligate symbiont, Buchnera aphidicola . Genome Biol. 2001, 2: 1-0054. 10.1186/gb-2001-2-12-research0054.View ArticleGoogle Scholar
- van Ham RCHJ, Kamerbeek J, Palacios C, Rausell C, Abascal F, Bastolla U, Fernández JM, Jiménez L, Postigo M, Silva FJ, Tamames J, Viguera E, Latorre A, Valencia A, Morán F, Moya A: Reductive genome evolution in Buchnera aphidicola. Proc Nat Acad Sci USA. 2003, 100: 581-586. 10.1073/pnas.0235981100.PubMed CentralPubMedView ArticleGoogle Scholar
- Woolfit M, Bromham L: Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes. Mol Biol Evol. 2003, 20: 1545-1555. 10.1093/molbev/msg167.PubMedView ArticleGoogle Scholar
- Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, Brownlie JC, McGraw EA, Martin W, Esser C, Ahmadinejad N, Wiegand C, Madupu R, Beanan MJ, Brinkac LM, Daugherty SC, Durkin AS, Kolonay JF, Nelson WC, Mohamoud Y, Lee P, Berry K, Young MB, Utterback T, Weidman J, Nierman WC, Paulsen IT, Nelson KE, Tettelin H, O'Neill SL, Eisen JA: Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol. 2004, 2: 0327-0341. 10.1371/journal.pbio.0020327.View ArticleGoogle Scholar
- Moran NA, Wernegreen JJ: Are mutualism and parasitism irreversible evolutionary alternatives for endosymbiotic bacteria? Insights from molecular phylogenetics and genomics. Trends Ecol Evol. 2000, 15: 321-326. 10.1016/S0169-5347(00)01902-9.PubMedView ArticleGoogle Scholar
- Tamas I, Klasson L, Canbäck B, Näslund K, Eriksson AS, Sandström J, Wernegreen J, Moran NA, Andersson SGE: 50 million years of genomic stasis in endosymbiotic bacteria. Science. 2002, 296: 2376-2379. 10.1126/science.1071278.PubMedView ArticleGoogle Scholar
- Klasson L, Andersson SG: Evolution of minimal-gene-sets in host-dependent bacteria. Trends Microbiol. 2004, 12: 37-43. 10.1016/j.tim.2003.11.006.PubMedView ArticleGoogle Scholar
- The UniProt Consortium : The Universal Protein Resource (UniProt). Nucleic Acids Res. 2007, 35: D193-D197. 10.1093/nar/gkl929.PubMed CentralView ArticleGoogle Scholar
- Guo T, Hua S, Ji X, Sun Z: DBSubLoc: database of protein subcellular localization. Nucleic Acids Res. 2004, 32: D122-124. 10.1093/nar/gkh109.PubMed CentralPubMedView ArticleGoogle Scholar
- Bjellqvist B, Hughes GJ, Pasquali Ch, Paquet N, Ravier F, Sanchez JCh, Frutiger S, Hochstrasser D: The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis. 1993, 14: 1023-1031. 10.1002/elps.11501401163.PubMedView ArticleGoogle Scholar
- Bjellqvist B, Basse B, Olsen E, Celis JE: Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis. 1994, 15: 529-539. 10.1002/elps.1150150171.PubMedView ArticleGoogle Scholar
- Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Bairoch A: Protein Identification and Analysis Tools on the ExPASy Server. The Proteomics Protocols Handbook. Edited by: Walker JM. 2005, Humana Press. Full text – Copyright Humana Press, 112: 531-552.Google Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences. 2004, University of Washington, SeattleGoogle Scholar
- Sanjuan R, Wrobel B: Weighted least-squares likelihood ratio test for branch testing in phylogenies reconstructed from distance measures. Syst Biol. 2005, 54: 218-229. 10.1080/10635150590923308.PubMedView ArticleGoogle Scholar
- Uchiyama I: MBGD: microbial genome database for comparative analysis. Nucleic Acids Res. 2003, 31: 58-62. 10.1093/nar/gkg109.PubMed CentralPubMedView ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995, 57: 289-300.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.