A pursuit of lineage-specific and niche-specific proteome features in the world of archaea
© Roy Chowdhury and Dutta; licensee BioMed Central Ltd. 2012
Received: 25 January 2012
Accepted: 12 June 2012
Published: 12 June 2012
Archaea evoke interest among researchers for two enigmatic characteristics –a combination of bacterial and eukaryotic components in their molecular architectures and an enormous diversity in their life-style and metabolic capabilities. Despite considerable research efforts, lineage- specific/niche-specific molecular features of the whole archaeal world are yet to be fully unveiled. The study offers the first large-scale in silico proteome analysis of all archaeal species of known genome sequences with a special emphasis on methanogenic and sulphur-metabolising archaea.
Overall amino acid usage in archaea is dominated by GC-bias. But the environmental factors like oxygen requirement or thermal adaptation seem to play important roles in selection of residues with no GC-bias at the codon level. All methanogens, irrespective of their thermal/salt adaptation, show higher usage of Cys and have relatively acidic proteomes, while the proteomes of sulphur-metabolisers have higher aromaticity and more positive charges. Despite of exhibiting thermophilic life-style, korarchaeota possesses an acidic proteome. Among the distinct trends prevailing in COGs (Cluster of Orthologous Groups of proteins) distribution profiles, crenarchaeal organisms display higher intra-order variations in COGs repertoire, especially in the metabolic ones, as compared to euryarchaea. All methanogens are characterised by a presence of 22 exclusive COGs.
Divergences in amino acid usage, aromaticity/charge profiles and COG repertoire among methanogens and sulphur-metabolisers, aerobic and anaerobic archaea or korarchaeota and nanoarchaeota, as elucidated in the present study, point towards the presence of distinct molecular strategies for niche specialization in the archaeal world.
KeywordsAmino acid usage Isoelectric point COG distribution Methanogen Sulphur metaboliser Korarachaeota Oxygen requirement
Over the past few decades, the process of establishing archaea as the third domain of life has been a stunning event in the world of life science. The world became familiar with this kingdom in 1977, when Woese & Fox  first proposed archaebacteria (subsequently renamed archaea) as a major domain - distinct from bacteria and eukaryotes but on equal footing with them. Prior to this three domain classification of life, which has been described by Makarova & Koonin  as “arguably one of the most important scientific discoveries of the twentieth century”; many of the ‘would-be’ archaea, used to be grouped under the bacterial lineage [2–4]. Phylogenetic analyses of rRNA and some proteins involved in the processes of translation, transcription, and replication have placed the notion of archaea on a firm footing [5–9]. Analysis of small subunit rRNA sequences revealed that there are two distinct phyla viz. euryarchaeota and crenarchaeota within this third domain . Three more distinguished phyla viz. nanoarchaeota, korarchaeota and thaumarchaeota have later been introduced to the domain of archaea [11–17].
Subsequent work on archaea has revealed a lot of surprises that have invoked an urge in the scientific community to explore the world of these microbial life forms. Archaea have a unique mosaic combination of “eubacterial form and eukaryotic content”. Like bacteria, they are single- celled prokaryotes, devoid of nucleus or other cell organelles . They usually share some major aspects of genome organisation and expression strategy such as presence of single circular chromosome and absence of introns, the operonic organisation of certain genes, presence of ribosomal-binding (Shine-Dalgarno) sites and so on; though there are some reports on the presence of archaeal introns [19, 20]. Yet they possess a number of genes and metabolic pathways - especially the ones associated with the processes of transcription, translation and replication - typical of eukaryotes [21, 22]. More than 30 ribosomal proteins are shared between the archaea and the eukarya that are not found in the bacteria. The structure of chromatin, presence of histones, significant similarity between proteins involved in information processing systems - all indicate a close evolutionary link between archaea and eukaryotes [23–26]. Archaea also possess some unique characteristics not shared by other domains. For example, their membrane is made of ether linked lipids. The glycerol phosphate backbone has got an opposite stereochemistry as compared to bacteria or eukaryotes [27, 28]. They also exhibit some unique metabolism like methanogenesis and several unique enzymes e.g. specific types of DNA topoisomerases and DNA polymerases [29–31]. Till date there has been no report on archaeal virulence, but they have been found associated with the diseased state of colon and periodontal diseases [32, 33].
Another intriguing feature of archaea is their unusual ability to survive and thrive in the extreme environmental conditions, such as in thermal vents, volcanic springs, hypersaline basins, alkaline lakes, acid mines, or even in petroleum deposits at deep underground which is completely devoid of oxygen. Furthermore, certain groups of archaea employ distinct strategies for energy conversion and hence, are characterised by special metabolic traits like methane production under anaerobic conditions, or sulphur respiration. Adaptation to such extreme environment or to atypical metabolism is expected to require special, adaptive gene and/or protein features - clearly distinguishable from those of the organisms living under the conventional ecological conditions. There are some reports on the molecular, physiological and evolutionary mechanisms of adaptation of some specific groups of extremophilic microbes, including some archaea, such as the organisms adapted to high temperature or salinity [34–39].
But to our knowledge, no comprehensive comparative study on lineage-specific and/or niche- specific genome/proteome features of the archaeal world has so far been reported.
Therefore, the domain archaea seems a deep sea where the researchers can dive into to get more and more information about their specific characteristics. Availability of complete genome sequences of hundreds of archaea has paved a way for comparative genomics and proteomics study. The lack of established model systems for large-scale experimentation on archaeal biology has made in-silico genome data mining even more crucial for archaeal genomics than they are in the cases of bacteria and eukaryotes. The present analysis offers the first large-scale comparative study of the proteomic architectures of all the archaeal species of publicly available genome sequences. Special emphasis has been given on the comparative analysis of methanogenic and sulphur metabolising archaea with an aim to unveil the special niche-specific molecular features, if any, of these two groups of microbes with specialised life-style. Identification of such features may not only give an insight into the molecular mechanism of ecological adaptation in archaea, but may also be important from the metagenomic or biotechnological view-points.
Results and discussion
Analysis of the whole archaeal dataset
Amino acid usage profile within the groups
However, a closer look at the heat map reveals that the amino acid preferences by archaeal species are not solely governed by the mutational bias of the respective genomes. Taxonomic or ecological background of the species may also play important roles in shaping their protein composition. In many cases, members of the same phyla, class or order appear under distinct nodes far apart from one another, yet they share some common compositional features, which may not always comply with their genomic GC-bias. For instance, P. torridus (P. tor), a thermoplasmata species, appears under node F, far apart from T. acidophilum (T.aci) and T. volcanium (T. vol), two other members of thermoplasmata that clustered together under node W.
But in all these species, the usages of Phe and Met, two residues encoded by AU-rich codons, are higher than sulfolobales, methanococci or nanoarchaeota– the genomic GC-contents of which are comparable to or lower than those of thermoplasmata. These three species are also typified by relatively low usage of Glu, Leu and Cys and higher usage of Ser. The sulfolobales having similar GC-bias as P. tor (35-37%) in general, segregate together with P. tor under node F, but they differ in the usage patterns of amany of the residues like Leu, His, Asp, Trp etc., and ma ny of these features are also shared by M. sedula (M. Sed) - the only member of sulfolobales in the dataset with much higher GC-content (46%, under node U). A trend of lower usage of Asp and Met and higher usage of Leu and Val is observed in all thermoprot eales including C. maquilingensis (C. maq) with much lower GC-content (43%), which has clustered together with M. sed and D. kam - far apart from other thermoproteales (GC-content >50%). Each of the three single-member phyla in the dataset, namely nanoarchaeota, korarchaeota and thaumarchaeota, exhibit distinct trends in amino acid usage and appear as singular species in separate branches under the nodes N, O and G respectively. A detail examination of the biological implications of such conspicuous amino acid usage patterns is, however, beyond the scope of the present study and will be taken up separately in future.
From their proteome compositional features, archaea appear to adapt to specific niche or life- style. Most of the methanogens exhibit relatively high frequencies of Cys. All the halophiles have clustered under node I (except H.wal) and are marked with high usage of Asp, Thr and His and low occurrence of Cys, Leu and Met. It is worth noting that H. walsbyi (H. wal) (displayed under node J) has much lower GC-content (48%) than other halophiles (> 60%), yet it shares many typical features of high salt-adapted proteomes like under representation of Lys, Phe, Tyr, Met and Leu (all of these except Leu are encoded by AU-rich codons), and over representation of Asp, Thr, His etc. It may, therefore, be said that the genomic GC-bias, taxonomic history and life-style or niche adaptation – all have played important roles in sculpting the amino acid composition of an archaeon.
Physico-chemical characteristics of the proteomes of different groups of archaea
We have categorised the organisms according to their classes in case of euryarchaeota (Figure 2a) and down to their orders in case of crenarchaeota (Figure 2b), as the entire crenarchaeal group comes under a single class viz. thermoprotei. The remaining three phyla viz. korarchaeota, nanoarchaeota and thaumarchaeota (Figure 2c), have only one fully sequenced organism in each case, so there is no need of any further division.
In most of the cases, bimodal distributions of isoelectric points are observed with an acidic peak at pI range of 5.0–5.5 and a basic peak at ~9.5 [Here, “acidic peak” refers to the frequency peak in the isoelectric point plot, where the pI range lies around the acidic pH region < pH 7.0, similarly, “basic peak” refers to the region around pH >7.0]. Being the largest phylum in the archaeal world, euryarchaeota consists of eight classes and for all these classes except thermococci, the acidic peak is significantly higher than the basic peak, implying the overall acidic nature of the euryarchaeal proteomes, irrespective of their genomic GC-bias or niche adaptation. Among these, Halobacteria, a group of halophilic archaea, has the most acidic proteome showing a large acidic peak around pI 4.0 and almost no peak at basic pI – a feature attributable to over representation of Asp and under representation of Lys, as observed earlier in most of the microbial halophiles [37, 40]. The only methanopyri in the dataset, M. kandleri, which is known to have dual adaptation to high salinity and high temperature, also exhibits a large and sharp acidic peak around pI 5.0 along with a small basic peak. A large acidic peak at pI 5.0 is also displayed by methanobacteria. Though their salinity adaptation is not yet reported, they have been found in large amounts in the tropical estuarine sediments along with other halophilies . For all other euryarchaeal proteomes, acidic peaks (at pI values around 6.0) are slightly larger than the respective basic peaks (at pI values around 10.0), implying that these proteomes are also comparatively acidic in nature, whereas thermococci stands out as an exception, probably owing to their adaptation to high temperature and sulphur metabolism.
Crenarchaeal organisms are all under one class viz. thermoprotei. Though we have further divided them into three orders, they do not exhibit any significant variation in their pI profiles. For all three orders, proteomes are comparatively basic in nature (Figure 2b).
As reported earlier, nanoarchaeota, being a parasitic-hyperthermophile, has a highly basic proteome, while the proteome of thaumarchaeota, being mesophilic in nature, is comparatively acidic (Figure 2c) [12, 34]. Korarchaeota, in spite of being a hyperthermophile, has an acidic proteome, which is quite surprising in view of earlier reports on thermal adaptation of microbial proteomes . However, considering the fact that korarchaeal samples were collected from the Obsidian Pool, Yellowstone National Park , the possibility of hypersaline adaptation in K. cryptofilum cannot be ruled out and in that case, the halophilic signatures of its proteome may overshadow the thermophilic characteristics, as observed in M.kandleri.
Distribution of cluster of orthologous groups of proteins
As revealed in Figure 3, the overall COGs distribution profiles of the archaeal groups are, in general, much closer to that of E. coli than to the Yeast profile. Majority of the COGs categories related to cellular processes & signaling are present in relatively low frequencies in archaea as compared to the eukaryotic representative S. cerevisiae. However, proteins belonging to the categories M (cell wall/memberane/envelope biogenesis) and N (cell motility) have higher frequencies in archaea as well as in E. coli than in yeast. The COGs related to metabolism also have in relatively high frequencies in archaea than in Yeast.
R & S, the two categories of poorly characterised genes, together encompass around 20 to 28% of predicted gene-products in each group of archaeal proteomes (Figure 3). Among the well characterised COGs categories, the ones showing the highest abundances in distribution profiles across different classes/orders are E (amino acid transport and metabolism), J (translation, ribosomal structure and biogenesis) and C (energy production and conversion), whereas the categories showing largest standard deviations are J (translation, ribosomal structure and biogenesis), T (signal transduction mechanism) and L (replication, recombination and repair). Nanoarchaeota shows a very different COGs distribution pattern with 34.5% of its total proteome falling under the translation, ribosomal structure and biogenesis (J) category ( Additional file 2). Strikingly enough, most of the COGs categories pertaining to metabolism, such as C, G, E, H, I, P and Q are significantly underrepresented in N. equitans. Such a conspicuous trend in COGs distribution in N. equitans may be attributed to the parasitic/symbiotic lifestyle of the organism. Methanopyri, the halophilic, thermophilic archaeon, is characterised with comparatively low amount of K (transcription) and P (inorganic ion transport & metabolism) category.
Detail analysis of methanogenic and sulphur-metabolising archaea
The dataset of 69 archaeal species under study includes various types of extremophiles – the species thriving in extreme habitats such as thermal vents or hypersaline water as well as the species exhibiting specialised metabolism, such as methanogenesis or sulphur metabolism. Since the distinct genome/proteome features of thermophilic and halophilic organisms have been reported earlier [44, 45], an attempt is made in the present study to delineate the niche-specific molecular features, if any, of the groups of archaea exhibiting specialised metabolic traits, i.e., the methane-producing archaea and the sulphur-oxidising/sulphur-reducing archaea (for details, see Additional file 3). It is worth mentioning at this point that these two groups of archaea also contain some thermophilic/hyperthermophilic and acidophilic organisms and as already mentioned, M. kandleri exhibits dual adaptation to thermophilic and halophilic environments .
Usage of amino acids with no GC bias at the codon levels
As revealed in Figure 4, the two factors that primarily govern the usage of these eight amino acid residues are the oxygen requirement and temperature adaptation of the respective archaeal species. All aerobic species along with C. maq, the only micro-aerophilic archaeon in the dataset, are completely segregated from the anaerobic organisms and clustered exclusively under the node i, suggesting the significant influence of respiratory habits of the organisms on preferences for these amino acids. Dominance of temperature adaptation of the species is apparent from the observation that there are two major nodes a and b in Figure 4, dividing whole set of methanogens and sulphur-metabolisers under consideration into two major clusters, where all mesophiles except M aeolicus have been segregated under the node a, and all hyperthermophiles and majority of the thermophiles (except M. the, M. del and M. kan) have clustered together under the node b.
All crenarchaeal species except S. mar are clustered together under the node e. A carefu l examination of the accompanying heat map suggests that the unexpected segregation of S.mar and M.aeo apparently represents an artifact, since this segregation might be attributed to the similarity in the “others” column, which represents total frequencies of occurrence of the residues encoded by GC-rich/AU-rich codons.
In the heat map, all methanogenic archaea, in spite of their difference in genomic GC content and habitat, show an affinity for higher usage of Cys residues in their proteomes as compared to sulphur-metabolising counterparts. If we consider in terms of frequencies of occurrence, the Cys usages are almost double in case of methanogens . Asp is more abundant in methanogens than the other group, whereas Leu is used more in sulphur metabolisers.
Highest and lowest values of amino acids in the methanogenic and sulphur metabolising orders
Order with the highest value
Max value (%)
Min value (%)
Order with the lowest value
Unclassified methanogen RC-I
Correspondence analysis on amino acid usage with two groups
Correspondence analysis on the unbiased amino acid usage dataset segregates the two types of organisms diagonally, i.e. the variable which decides the segregation should have almost similar correlation with both the axes (Figure 5b). The first two axes explain 63.47% of the total variation as a whole. Since the genomic GC-bias has hardly any influence on the usage of these amino acid residues, significant correlations of isoelectric point have been observed with both the axes(with r = 0.63 and −0.60 respectively).
Comparative analysis of physico-chemical features of proteomes
Surface charge distribution
Amino acid substitution in orthologous sequences
Since comparative studies on isoelectric po int distribution and surface charge distribution in proteins of methanogenic and sulphur-metabolising archaea have revealed a clear trend in higher usage of acidic residues in the former group of species (Figure 6 & 7), it is tempting to examine the general trend, if any, in the amino acid substitution patterns between the orthologous proteins from members of these two different groups. Among all the organisms chosen for this study, we selected two organisms from two groups such that their thermal adaptations, genomic GC contents, genome sizes, and the phylum they are placed in are identical. M.del and T.onn suited well ( Additional file 3) for analysing amino acid substitution in their orthologous sequences.
Thus, a distinct trend in the resultant substitution patterns across the orthologs from these organisms can be attributed to the differences in their metabolic traits. The amino acid sequences of these orthologous genes were aligned using ClustalW and the amino acid replacements are arranged in a 20 × 20 matrix using Substitution Pattern Analysis Software Tool (SPAST), a program in C++, developed in-house . Frequencies of all possible amino acid replacements (i.e. (20 × 19)/2 = 190 possible pairs of replacements) between the orthologous protein sequences were determined in the direction from the methanogenic archaea M.del to the sulphur metabolising archaea T.onn, following the method reported by Paul et al., described in details in the Materials & Methods section .
Top 20 amino acid pairs displaying highest bias in terms of differences and ratios in number of forward (methanogens → sulphur metaboliser) and reverse (sulphur metaboliser → methanogen) replacements in 213 orthologous proteins from M.del to T. onn
Amino acid replacements between 213 orthologous proteins of M.del & T. onn
Most biased in gain
Most biased in ratio
R → K
Q → W
E → K
C → M
D → E
H → W
S → K
C → I
D → K
C → S
S → A
C → N
S → E
D → W
I → L
R → W
R → E
S → K
M → L
C → A
G → K
M → W
T → V
C → V
A → K
C → G
L → F
S → W
M → I
D → Y
C → A
C → T
S → T
C → F
D → N
D → K
T → L
V → W
H → Y
I → W
COGs category distribution in the two groups
Among different COGs categories, certain categories like J (translation, ribosomal structure & biogenesis), L (replication, recombination & repair), M (cell wall/membrane biogenesis), C(energy production & conversion) etc. exhibit more intra-species divergences within a specific class/order, while the categories like U (intracellular trafficking, secretion & vesicular transport), V (defence mechanism), O(post translational modifications, protein turnover, chaperons) or Q (secondary metabolite biosynthesis, transport and catabolism) show little var iation within or even across different taxonomic orders. Interestingly enough, all three orders under the class thermoproteales of crenaerchaeota, namely sulfolobales, desulfurococcales and thermoproteales exhibit, in general, appreciable intra-order inter-species variations in frequencies of occurrences for most of the COGs categories pertaining to metabolism, but not for the COGs categories under Cellular Processes & Signaling. Such intra-class variations in frequencies are relatively less, in general, in cases of euryarchaeal organisms, both for metabolism and cellular processes & signaling COGs categories (Figure 8, left panel).
Identification of core, but exclusive COGs of methanogens
Methanogen-specific COGs ID and their descriptions
Domains found on TIGR
Domains found on Pfam
Methyl coenzyme M reductase, alpha subunit
met-coenzyme M reductase
met-coenzyme M reductase
Methyl coenzyme M reductase, beta subunit
Methyl coenzyme M reductase, subunit C
Methyl coenzyme M reductase, subunit D
Nitrogenase molybdenum-iron protein, alpha and beta chains
methanogenesis marker protein 13
Nitrogenase subunit NifH (ATPase)
Predicted metal-binding transcription factor
methanogenesis marker protein 9
Predicted peptidyl-prolyl cis-trans isomerase (rotamase), cyclophilin family
methanogenesis marker protein 3
methanogenesis marker protein 4
Predicted thiamine-pyrophosphate-binding protein
Selenophosphate synthetase-related proteins
methanogenesis marker protein 2
Tetrahydromethanopterin S-methyltransferase, subunit A
Methyl transferase (Mtr)
Methyl transferase (Mtr)
Tetrahydromethanopterin S-methyltransferase, subunit B
Tetrahydromethanopterin S-methyltransferase, subunit C
Tetrahydromethanopterin S-methyltransferase, subunit D
Tetrahydromethanopterin S-methyltransferase, subunit E
Uncharacterized protein conserved in archaea
methanogenesis marker protein 5
Domain of unknown function (DUF)
Uncharacterized protein conserved in archaea
methanogenesis marker protein 17
Uncharacterized protein conserved in archaea
methanogenesis marker protein 14
Uncharacterized protein conserved in archaea
Uncharacterized protein conserved in archaea
methanogenesis marker protein 6
Uncharacterized protein related to methyl coenzyme M reductase subunit C
methanogenesis marker protein 7
Estimation of COGs shared mutually between distinct groups of methanogens and sulphur metaboliser
The present study gives an account of amino acid usage, physico-chemical features and COG repertoire of 69 archaeal species of varying GC-content, habitats, respiratory habits and metabolism. Amino acid usage pattern in archaea, in general, is dominated by the genomic GC- content, but in some cases niche-specialisation overrules the GC-bias. For amino acids having no GC-bias at their codon levels, environmental factors like oxygen requirement or temperature adaptation appear to be the primary selection forces. Among the physico-chemical parameters, the overall charge profile and aromaticity of proteins seem to modulate or be modulated by the metabolic traits and/or niche adaptation of the respective species. All methanogenic proteomes, irrespective of their temperature or salinity adptation, are relatively acidic and have higher usage of Cys, while the proteomes of sulphur metabolisers are more basic and aromatic in nature. The atypical (acidic) nature of the thermophilic archaeon K. cryptofilum is surprising and demands further investigation in future. So far as COGs repertoire is concerned, crenarchaeal organisms display higher intra-order variations as compared to euryarchaeal counterparts, especially for the proteins involved in metabolism, probably because the divergence of sulphur reduction pathways from those of sulphur oxidation. There are 22 COGs, which are found in all methanogenic archaea under study, not in any other archaea. No such core COGs could be found exclusively within sulphur-metabolising groups.
Identification of distinct trends in amino acid usage, physicochemical properties and COG distribution profiles in methanogens and sulphur-metabolisers, aerobic and anaerobic archaea or korarchaeota and nanoarchaeota point towards the diverse evolutionary strategies for niche specialisation in the archaeal world. Characterisation of such niche-specific features may have far-reaching implications of metagenomic or biotechnological perspectives.
The complete genome sequences and the predicted protein coding sequences of 69 (all the fully sequenced archaea available by the year 2009) archaea have been downloaded from NCBI GenBank. In order to reduce the sampling errors, the annotated ORFs having less than 100 codons in every genome have been excluded from the analysis. Additional file 1 and Additional file 3 show the basic information about all the archaea under study and about the two studied groups.
Amino acid usage
Relative amino acid usage frequencies for each organism have been calculated from CODONW . Heat map represents the pictorial version of amino acid frequencies all the organisms, where the colour gradient from red to green in every column shows the increasing values of abundance for a particular amino acid.
Cluster analysis and correspondence analysis on amino acid usage
To find out the inter-proteomic differences between organisms, the correspondence analysis and the cluster analysis on amino acid composition are carried out using STATISTICA (version 6.0) for all organisms . Correspondence analysis has been done for the two groups of organisms viz. sulphur metabolisers and methanogens. This analysis generates a series of orthogonal axes with each subsequent axis explaining decreasing amount of contribution to the total variation in the dataset.
Indices used to identify the amino acid usage pattern
To identify the major factors influencing the amino acid usage we calculated the average hydrophobicity (Gravy Score), aromaticity, aliphatic index, instability index and isoelectric point distribution for every organism. We observed significant variation of isoelectric point distribution in case of all the organisms and aromaticity distribution pattern along with the pI distribution in case of the two studied groups, hence we present here the same.
Surface charge distribution
The surface charge distributions are mapped onto the predicted surface using the program MOLMOL . The protein used here is glyceraldehyde-3-phosphate-dehydrogenase from both the methanogenic and the sulphur metabolising archaea viz. M. jannaschii DSM 2661 and S.solfataricus P2 respectively (PDB ID 2YYY and 1B7G).
Amino acid exchange bias with orthologous sequences
Orthologous sequences between M.del and T.onn are taken using the tBlastx program . Orthologs are defined as those with more than or equal to 40% similarities and less than 20% difference in length and e value ≤ 1E-10. The amino acid sequences of 213 orthologous genes are aligned using the pairwise alignment program ClustalW  and the amino acid replacements are obtained in the form of a matrix, using a program developed in-house in Visual Basic . Under unbiased conditions, the ratio of forward to reverse substitutions is expected to be 1:1 for each pair of residues. To test this hypothesis, the observed and expected numbers (based on a 1:1 ratio) are recorded for each pair of residues and the chi-square test is applied to assess the significance of the directional bias, if any, at significance levels of 10–3 to 10–6. For a given pair of amino acids, the ‘forward’ direction exhibited the more common of the two replacements in the conversion of methanogenic to the sulphur metabolising proteins. To assess the significance of the directional bias, if any, replacement values are compared by 2 × 2 contingency tables having one degree of freedom. For each pair of replacements, the first and second rows of the contingency table represented the number of replacements from one particular residue (say, i) to another (say, j) of the pair and the total count of the remaining replacements (say, k) from the residue i (where k ≠ j) respectively.
COGs (cluster of orthologous groups of proteins) distribution
The predicted COGs annotations for each protein of all the organisms have been done with the help of WebMGA server . Only those proteins are taken for which COGs IDs have been annotated and hence other proteins are excluded. The percentage number of COGs category present in every organism has been calculated. For a better resolution of the gene possession of every phylum, the methanogens and sulphur metabolisers have been further divided into their corresponding classes and orders respectively.
Cluster of Orthologous Group of proteins
Substitution Pattern Analysis Software Tool.
We are grateful to Dr. Sandip Paul, Microbiology, University of Washington, for thoughtful and constructive suggestions during the progress of the study and for critical reading of the manuscript. We thankfully acknowledge the infrastructural support obtained from the Bioinformatics Centre of this institute. This work was supported by Council of Scientific and Industrial Research (Project no. CMM 017). ARC is supported by Senior Research Fellowships from Council of Scientific and Industrial Research, India.
- Woese CR, Fox GE: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci. 1977, 74 (11): 5088-10.1073/pnas.74.11.5088.PubMed CentralView ArticlePubMedGoogle Scholar
- Makarova KS, Koonin EV: Comparative genomics of Archaea: how much have we learned in six years, and what's next. Genome Biol. 2003, 4 (8): 115-10.1186/gb-2003-4-8-115.PubMed CentralView ArticlePubMedGoogle Scholar
- Woese CR: The archaeal concept and the world it lives in: a retrospective. Photosynth Res. 2004, 80 (1): 361-372.View ArticlePubMedGoogle Scholar
- Woese CR: The birth of the Archaea: a personal retrospective. in Archaea. 2007, Garrett, RA, Klenk HP, Malden, MA, USA; Blackwell Publishing Ltd, 1-View ArticleGoogle Scholar
- Woese CR, Magrum LJ, Fox GE: Archaebacteria. J Mol Evol. 1978, 11 (3): 245-252. 10.1007/BF01734485.View ArticlePubMedGoogle Scholar
- Woese CR, Kandler O, Wheelis ML: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci. 1990, 87 (12): 4576-10.1073/pnas.87.12.4576.PubMed CentralView ArticlePubMedGoogle Scholar
- Woese CR, Gupta R: Are archaebacteria merely derived â€˜prokaryotesâ€™?. Nature. 1981, 289 (5793): 95-96. 10.1038/289095a0.View ArticlePubMedGoogle Scholar
- Pace NR, Olsen GJ, Woese CR: Ribosomal RNA phylogeny and the primary lines of evolutionary descent. Cell. 1986, 45 (3): 325-10.1016/0092-8674(86)90315-6.View ArticlePubMedGoogle Scholar
- Zillig W: Comparative biochemistry of Archaea and Bacteria. Curr Opin Genet Dev. 1991, 1 (4): 544-551. 10.1016/S0959-437X(05)80206-0.View ArticlePubMedGoogle Scholar
- Fox GE, Stackebrandt E, Hespell RB, Gibson J, Maniloff J, Dyer TA, Wolfe RS, Balch WE, Tanner RS, Magrum LJ: The phylogeny of prokaryotes. Science. 1980, 209 (4455): 457-463. 10.1126/science.6771870.View ArticlePubMedGoogle Scholar
- Auchtung TA, Takacs-Vesbach CD, Cavanaugh CM: 16 S rRNA phylogenetic investigation of the candidate division "Korarchaeota". Appl Environ Microbiol. 2006, 72 (7): 5077-5082. 10.1128/AEM.00052-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Brochier-Armanet C, Boussau B, Gribaldo S, Forterre P: Mesophilic Crenarchaeota: proposal for a third archaeal phylum, the Thaumarchaeota. Nat Rev Microbiol. 2008, 6 (3): 245-252. 10.1038/nrmicro1852.View ArticlePubMedGoogle Scholar
- Elkins JG, Podar M, Graham DE, Makarova KS, Wolf Y, Randau L, Hedlund BP, Brochier-Armanet C, Kunin V, Anderson I: A korarchaeal genome reveals insights into the evolution of the Archaea. Proc Natl Acad Sci U S A. 2008, 105 (23): 8102-8107. 10.1073/pnas.0801980105.PubMed CentralView ArticlePubMedGoogle Scholar
- Huber H, Hohn MJ, Rachel R, Fuchs T, Wimmer VC, Stetter KO: A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature. 2002, 417 (6884): 63-67. 10.1038/417063a.View ArticlePubMedGoogle Scholar
- Molloy S: Archaea: Thaumarchaeota go it alone. Nat Rev Microbiol. 2011, 9 (12): 832-PubMedGoogle Scholar
- Nealson K: A Korarchaeote yields to genome sequencing. Proc Natl Acad Sci U S A. 2008, 105 (26): 8805-8806. 10.1073/pnas.0804670105.PubMed CentralView ArticlePubMedGoogle Scholar
- Schouten S, Hopmans EC, Baas M, Boumann H, Standfest S, Konneke M, Stahl DA, Sinninghe Damste JS: Intact membrane lipids of "Candidatus Nitrosopumilus maritimus," a cultivated representative of the cosmopolitan mesophilic group I Crenarchaeota. Appl Environ Microbiol. 2008, 74 (8): 2433-2440. 10.1128/AEM.01709-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Keeling PJ, Charlebois RL, Ford Doolittle W: Archaebacterial genomes: eubacterial form and eukaryotic content. Curr Opin Genet Dev. 1994, 4 (6): 816-822. 10.1016/0959-437X(94)90065-5.View ArticlePubMedGoogle Scholar
- Lykke-Andersen J, Aagaard C, Semionenkov M, Garrett RA: Archaeal introns: splicing, intercellular mobility and evolution. Trends Biochem Sci. 1997, 22 (9): 326-331. 10.1016/S0968-0004(97)01113-4.View ArticlePubMedGoogle Scholar
- Watanabe Y, Yokobori S, Inaba T, Yamagishi A, Oshima T, Kawarabayasi Y, Kikuchi H, Kita K: Introns in protein-coding genes in Archaea. FEBS Lett. 2002, 510 (1–2): 27-30.View ArticlePubMedGoogle Scholar
- Lecompte O, Ripp R, Thierry JC, Moras D, Poch O: Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res. 2002, 30 (24): 5382-5390. 10.1093/nar/gkf693.PubMed CentralView ArticlePubMedGoogle Scholar
- Londei P: Evolution of translational initiation: new insights from the archaea. FEMS Microbiol Rev. 2005, 29 (2): 185-200. 10.1016/j.fmrre.2004.10.002.View ArticlePubMedGoogle Scholar
- Werner F, Eloranta JJ, Weinzierl RO: Archaeal RNA polymerase subunits F and P are bona fide homologs of eukaryotic RPB4 and RPB12. Nucleic Acids Res. 2000, 28 (21): 4299-4305. 10.1093/nar/28.21.4299.PubMed CentralView ArticlePubMedGoogle Scholar
- Gaspin C, Cavaille J, Erauso G, Bachellerie JP: Archaeal homologs of eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus genomes. J Mol Biol. 2000, 297 (4): 895-906. 10.1006/jmbi.2000.3593.View ArticlePubMedGoogle Scholar
- Reeve JN, Sandman K, Daniels CJ: Archaeal histones, nucleosomes, and transcription initiation. Cell. 1997, 89 (7): 999-1002. 10.1016/S0092-8674(00)80286-X.View ArticlePubMedGoogle Scholar
- Hartung S, Hopfner KP: Lessons from structural and biochemical studies on the archaeal exosome. Biochem Soc Trans. 2009, 37 (Pt 1): 83-87.View ArticlePubMedGoogle Scholar
- Tornabene TG, Langworthy TA: Diphytanyl and dibiphytanyl glycerol ether lipids of methanogenic archaebacteria. Science. 1979, 203 (4375): 51-10.1126/science.758677.View ArticlePubMedGoogle Scholar
- Woese CR: Bacterial evolution. Microbiol Mol Biol Rev. 1987, 51 (2): 221-Google Scholar
- Bapteste E, Brochier C, Boucher Y: Higher-level classification of the Archaea: evolution of methanogenesis and methanogens. Archaea. 2005, 1 (5): 353-363. 10.1155/2005/859728.PubMed CentralView ArticlePubMedGoogle Scholar
- Forterre P, Gribaldo S, Gadelle D, Serre MC: Origin and evolution of DNA topoisomerases. Biochimie. 2007, 89 (4): 427-446. 10.1016/j.biochi.2006.12.009.View ArticlePubMedGoogle Scholar
- Ishino Y, Komori K, Cann IK, Koga Y: A novel DNA polymerase family found in Archaea. J Bacteriol. 1998, 180 (8): 2232-2236.PubMed CentralPubMedGoogle Scholar
- Cavicchioli R, Curmi PM, Saunders N, Thomas T: Pathogenic archaea: do they exist?. Bioessays. 2003, 25 (11): 1119-1128. 10.1002/bies.10354.View ArticlePubMedGoogle Scholar
- Lepp PW, Brinig MM, Ouverney CC, Palm K, Armitage GC, Relman DA: Methanogenic Archaea and human periodontal disease. Proc Natl Acad Sci U S A. 2004, 101 (16): 6176-6181. 10.1073/pnas.0308766101.PubMed CentralView ArticlePubMedGoogle Scholar
- Das S, Paul S, Bag SK, Dutta C: Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaptation. BMC Genom. 2006, 7: 186-10.1186/1471-2164-7-186.View ArticleGoogle Scholar
- Egorova K, Antranikian G: Industrial relevance of thermophilic Archaea. Curr Opin Microbiol. 2005, 8 (6): 649-655. 10.1016/j.mib.2005.10.015.View ArticlePubMedGoogle Scholar
- Leveque E, Janecek S, Haye B, Belarbi A: Thermophilic archaeal amylolytic enzymes. Enzym Microb Technol. 2000, 26 (1): 3-14. 10.1016/S0141-0229(99)00142-8.View ArticleGoogle Scholar
- Paul S, Bag SK, Das S, Harvill ET, Dutta C: Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. Genome Biol. 2008, 9 (4): 70-10.1186/gb-2008-9-4-r70.View ArticleGoogle Scholar
- Waters E, Hohn MJ, Ahel I, Graham DE, Adams MD, Barnstead M, Beeson KY, Bibbs L, Bolanos R, Keller M: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc Natl Acad Sci U S A. 2003, 100 (22): 12984-12988. 10.1073/pnas.1735403100.PubMed CentralView ArticlePubMedGoogle Scholar
- Zierenberg RA, Adams MWW, Arp AJ: Life in extreme environments: Hydrothermal vents. Proc Natl Acad Sci. 2000, 97 (24): 12961-10.1073/pnas.210395997.PubMed CentralView ArticlePubMedGoogle Scholar
- Fukuchi S, Yoshimune K, Wakayama M, Moriguchi M, Nishikawa K: Unique amino acid composition of proteins in halophilic bacteria. J Mol Biol. 2003, 327 (2): 347-357. 10.1016/S0022-2836(03)00150-5.View ArticlePubMedGoogle Scholar
- Singh SK, Verma P, Ramaiah N, Chandrashekar AA, Shouche YS: Phylogenetic diversity of archaeal 16 S rRNA and ammonia monooxygenase genes from tropical estuarine sediments on the central west coast of India. Res Microbiol. 2010, 161 (3): 177-186. 10.1016/j.resmic.2010.01.008.View ArticlePubMedGoogle Scholar
- Kreil DP, Ouzounis CA: Identification of thermophilic species by the amino acid compositions deduced from their genomes. Nucleic Acids Res. 2001, 29 (7): 1608-1615. 10.1093/nar/29.7.1608.PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN: The COG database: an updated version includes eukaryotes. BMC Bioinforma. 2003, 4: 41-10.1186/1471-2105-4-41.View ArticleGoogle Scholar
- Ventosa A, Societies FoEM, Sevilla Ud: Halophilic microorganisms. 2004, Springer,View ArticleGoogle Scholar
- Lobry JR, Chessel D: Internal correspondence analysis of codon and amino-acid usage in thermophilic bacteria. J Appl Genet. 2003, 44 (2): 235-262.PubMedGoogle Scholar
- Slesarev AI, Mezhevaya KV, Makarova KS, Polushin NN, Shcherbinina OV, Shakhova VV, Belova GI, Aravind L, Natale DA, Rogozin IB: The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc Natl Acad Sci. 2002, 99 (7): 4644-10.1073/pnas.032671499.PubMed CentralView ArticlePubMedGoogle Scholar
- Klipcan L, Frenkel-Morgenstern M, Safro MG: Presence of tRNA-dependent pathways correlates with high cysteine content in methanogenic Archaea. Trends Genet. 2008, 24 (2): 59-63. 10.1016/j.tig.2007.11.007.View ArticlePubMedGoogle Scholar
- Burggraf S, Stetter K, Rouviere P, Woese C: Methanopyrus kandleri: an archaeal methanogen unrelated to all other known methanogens. Syst Appl Microbiol. 1991, 14: 346-10.1016/S0723-2020(11)80308-5.View ArticlePubMedGoogle Scholar
- Zeldovich KB, Berezovsky IN, Shakhnovich EI: Protein and DNA sequence determinants of thermophilic adaptation. Plos Comput Biol. 2007, 3 (1): 62-72.View ArticleGoogle Scholar
- Isupov MN, Fleming TM, Dalby AR, Crowhurst GS, Bourne PC, Littlechild JA: Crystal structure of the glyceraldehyde-3-phosphate dehydrogenase from the hyperthermophilic archaeon Sulfolobus solfataricus. J Mol Biol. 1999, 291 (3): 651-660. 10.1006/jmbi.1999.3003.View ArticlePubMedGoogle Scholar
- Malay AD, Bessho Y, Ellis MJ, Antonyuk SV, Strange RW, Hasnain SS, Shinkai A, Padmanabhan B, Yokoyama S: Structure of glyceraldehyde-3-phosphate dehydrogenase from the archaeal hyperthermophile Methanocaldococcus jannaschii. Acta Crystallographica Section F-Struct Biol Crystallization Comm. 2009, 65: 1227-1233. 10.1107/S1744309109047046.View ArticleGoogle Scholar
- Koradi R, Billeter M, Wüthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph. 1996, 14 (1): 51-55. 10.1016/0263-7855(96)00009-4.View ArticlePubMedGoogle Scholar
- Bag SK, Paul S, Ghosh S, Dutta C: Reverse Polarization in Amino acid and Nucleotide Substitution Patterns Between Humanâ€“Mouse Orthologs of Two Compositional E xtrema. DNA Res. 2007, 14 (4): 141-154. 10.1093/dnares/dsm015.PubMed CentralView ArticlePubMedGoogle Scholar
- Peden J: CodonW. 1997Google Scholar
- Statsoft I: STATISTICA for Windows (Computer program manual). 1995, Tulsa, OK StatisoftGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMedGoogle Scholar
- Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protocol Bioinformatics. 2002, 23: 2.3.1-2.3.22.Google Scholar
- Wu ST, Zhu ZW, Fu LM, Niu BF, Li WZ: WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genom. 2011, 12: 444-10.1186/1471-2164-12-444.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.