Distinct, ecotype-specific genome and proteome signatures in the marine cyanobacteria Prochlorococcus
- Sandip Paul†1,
- Anirban Dutta†1,
- Sumit K Bag2, 3,
- Sabyasachi Das2, 4 and
- Chitra Dutta1, 2Email author
© Paul et al; licensee BioMed Central Ltd. 2010
Received: 5 October 2009
Accepted: 10 February 2010
Published: 10 February 2010
The marine cyanobacterium Prochlorococcus marinus, having multiple ecotypes of distinct genotypic/phenotypic traits and being the first documented example of genome shrinkage in free-living organisms, offers an ideal system for studying niche-driven molecular micro-diversity in closely related microbes. The present study, through an extensive comparative analysis of various genomic/proteomic features of 6 high light (HL) and 6 low light (LL) adapted strains, makes an attempt to identify molecular determinants associated with their vertical niche partitioning.
Pronounced strand-specific asymmetry in synonymous codon usage is observed exclusively in LL strains. Distinct dinucleotide abundance profiles are exhibited by 2 LL strains with larger genomes and G+C-content ≈ 50% (group LLa), 4 LL strains having reduced genomes and G+C-content ≈ 35-37% (group LLb), and 6 HL strains. Taking into account the emergence of LLa, LLb and HL strains (based on 16S rRNA phylogeny), a gradual increase in average aromaticity, pI values and beta- & coil-forming propensities and a decrease in mean hydrophobicity, instability indices and helix-forming propensities of core proteins are observed. Greater variations in orthologous gene repertoire are found between LLa and LLb strains, while higher number of positively selected genes exist between LL and HL strains.
Strains of different Prochlorococcus groups are characterized by distinct compositional, physicochemical and structural traits that are not mere remnants of a continuous genetic drift, but are potential outcomes of a grand scheme of niche-oriented stepwise diversification, that might have driven them chronologically towards greater stability/fidelity and invoked upon them a special ability to inhabit diverse oceanic environments.
Evolution of a microbe is often driven by its environment or life-style. Microorganisms adapted to some specialized environmental conditions have been reported to display conspicuous genome and/or proteome features [1–8]. Species of widely varying taxonomic origins, but thriving in same/similar environmental conditions such as high temperature or high salinity, may converge to similar genome and/or proteome composition. In contrast, closely related bacterial species inhabiting distinct ecological niches may display substantial genomic diversity [1–3, 6, 8–11]. Unveiling the plausible causes/consequences, at the genome and proteome levels, of such niche-dependent evolution of the microbial world poses a major challenge to the present-day life-scientists. The marine cyanobacterium Prochlorococcus marinus , having multiple ecotypes exhibiting distinct niche-specific phenotypic as well as genotypic characteristics, offers a useful system to address this issue.
General features of 12 Prochlorococcus strains under study
Accession no. (Ref_Seq)
Genome Size (Mb)
G+C- content (%)
Low light adapted strains
P. marinus str. MIT 9313
P. marinus str. MIT 9303
P. marinus subsp. marinus str.CCMP1375 (SS120)
P. marinus str. MIT 9211
P. marinus str. NATL1A
P. marinus str. NATL2A
High light adapted strains
P. marinus str. AS9601
P. marinus str. MIT 9312
P. marinus subsp. pastoris str.CCMP1986 (MED 4)
P. marinus str. MIT 9515
P. marinus str. MIT 9215
P. marinus str. MIT 9301
It is worth mentioning in this context that Prochlorococcus is the first documented example of genome shrinkage along with A+T enrichment in a free-living organism . Earlier examples of genome reduction had been restricted to endosymbionts or pathogens with a host-dependent lifestyle, which evolve under the constraint of frequent population bottlenecks with a subsequent increase in genetic drift [2, 3, 26–29]. Considering the abundance of P. marinus in the marine ecosystem, their reductive genome evolution might not be influenced by similar population bottlenecks and resulting genetic drifts, and thus seems to be a more complex phenomenon to explain. Although P. marinus genome evolution has been investigated previously [25, 30] and the event of genome shrinkage have been ascribed to various factors related to their growth in oligotrophic waters [20, 23, 31], selection for metabolic economy [25, 31, 32], loss of low fitness genes , and smaller cell sizes , it is still unclear to what extent it has been driven by any random genetic drift and/or other specific selection force(s). Our analyses indicate that the ecotype-specific molecular signatures exhibited by P. marinus strains under study are not mere remnants of a continuous genetic drift, but a potential outcome of niche-oriented stepwise diversification of Prochlorococcus, orchestrated by an array of interplaying adaptive forces.
In an attempt to understand the trends in molecular evolution in Prochlorococcus, we have analyzed various genome and proteome characteristics of 6 LL and 6 HL strains of P. marinus. The analyses of genome/proteome in P. marinus include the study of trends in codon, dinucleotide and amino acid usages, gene synteny of orthologous sequences, intergenic sequence composition, physicochemical properties of the encoded proteins and the extent of positive selection among different strains. These analyses were primarily directed towards the identification of niche-specific variations within different Prochlorococcus strains.
Strand-specific asymmetry in synonymous codon usage in low light adapted P. marinus genomes
General features and correlations of GC3 and GT3 content with first two axes of COA on RSCU values of genes in 12 Prochlorococcus genomes
ORFs under study
G+C- content (%)
% of total variation explained by COA on RSCU
Correlation coefficient (r)
Axis 1 vs.
Axis 2 vs.
P. marinus str. MIT 9313
P. marinus str. MIT 9303
P. marinus subsp. marinus str. CCMP1375
P. marinus str. MIT 9211
P. marinus str. NATL1A
P. marinus str. NATL2A
P. marinus str. AS9601
P. marinus str. MIT 9312
P. marinus subsp. pastoris str. CCMP1986
P. marinus str. MIT 9515
P. marinus str. MIT 9215
P. marinus str. MIT 9301
Chi-square tests on occurrences of different codons on two replicating strands of representative LL strains (LL1 and LL6) further reveal significant overrepresentation of 28 and 22 G-/U- ending codons on the leading strands of LL1 and LL6 respectively (p < 0.001); while 28 and 25 A-/C- ending codons are overrepresented in the genes encoded on the lagging strands of LL1 and LL6 respectively (p < 0.001) (Additional files 1 and 2). The codon 'CUG' is the only exception, which, in spite of being G-ending, is significantly overrepresented in the lagging strand genes of LL1.
In microbial genomes characterized by pronounced strand asymmetry [2, 3, 34, 35], replicational-transcriptional selection usually play a major role in shaping genome organization. The leading strands of replication of such organisms, in general, contain higher number of genes due to replicational selection, and are also enriched with highly expressed genes as an effect of transcriptional selection. However, in LL strains of P. marinus, the predicted protein coding sequences are found to be distributed almost equally in two strands. In fact, in three LL strains (LL1, LL2 and LL3), the number of predicted protein coding sequences are lower in the leading strands than in the lagging strands (approximately 48% in the leading strands and 52% in the lagging strands), indicating the absence of replicational selection. The lagging strands of the strains MIT9313 (LL1) and MIT9303 (LL2) are also found to be enriched in ribosomal proteins, which are typically highly expressed. Only two such genes are encoded by each of their leading strands, leaving 36 and 37 ribosomal proteins to be encoded by their lagging strands (LL1 and LL2 respectively). For other LL strains, the distribution of ribosomal genes is quite conventional (≈ 25 on leading strands and ≈ 12 on lagging strands). However, most of the other potentially highly expressed genes (e.g., RNA polymerases, transcription and translation processing factors, etc.), are present in higher numbers in the leading strands of all LL strains. Hence it is difficult to arrive at any definite conclusion regarding the effects of transcriptional selection on the LL strains of Prochlorococcus.
Larger extent of genomic rearrangements between small and large P. marinus genomes
Niche-specific dinucleotide abundance values of P. marinus genomes
However, significant intra-Prochlorococcus differences are also present in dinucleotide abundance profiles, on the basis of which all P. marinus strains under study may be divided into three distinct groups:
(a) Group LLa, comprised of the two LL strains P. marinus MIT 9313 (LL1) and MIT 9303 (LL2) - both having larger genomes (≈ 2.5 MB) and average G+C-content ≈ 50%: Genomes of these two strains are characterized by significantly high values of CA/TG and low values of TA (Additional file 3). The values for AT, AC/GT and CG are also relatively higher and that of CC/GG are lower, as compared to other P. marinus strains.
(b) Group LLb, consisting of other four LL strains (LL3, LL4, LL5 and LL6), characterized by relatively lower G+C-content (between 35% - 37%) and small genome size (< 2 MB): These four LL strains exhibit highly similar patterns, which are visibly distinct mainly at CA/TG and CC/GG from the almost overlapping profile of HL strains (Figure 3).
(c) Group HL, including all 6 HL strains having reduced genome and G+C-content ≈ 31%: The dinucleotide CC/GG is significantly overrepresented only in the HL Prochlorococcus strains.
Clustering by amino acid composition reveals a balance between genomic G+C-bias and Prochlorococcus-specific selection forces
In an attempt to investigate whether the strand-specific mutational bias has any impact on amino acid usage in gene products of LL strains of P. marinus in comparison to their HL counterparts, we performed correspondence analysis (COA) on relative amino acid usage (RAAU) of the encoded proteins of each organism. No clear segregation can be observed for proteins encoded by the leading and lagging strands in any of the P. marinus genomes under study (data not shown), implying that the strand-specific mutational bias has hardly any influence on the amino acid compositions of the gene products of LL strains of Prochlorococcus. In all the strains of P. marinus, the first three axes generated by COA on amino acid usage cumulatively explain about 39% of the total variability. Both mean hydrophobicity and aromaticity of the encoded proteins exhibit strong correlations with either of the first two principal axes and seem to be the major contributors to amino acid usage variation in P. marinus proteins (data not shown).
Niche-specific variations in physicochemical and structural features of Prochlorococcus orthologs
Different amino acid indices and secondary structural traits of 519 orthologous proteins present in 12 Prochlorococcus strains
Amino acid indices (Mean)
Secondary structural traits (%)
Isoelectric Point (pI)
Instability Index (II)
Comparison between various amino acid indices and secondary structural traits of six sets of proteins of Prochlorococcus and non-Prochlorococcus orthologs
Mean of amino acid indices
Secondary structural traits (%)
Set I (303 pairs)
Set II (136 pairs)
Set III (265 pairs)
Set IV (175 pairs)
Set V (963 pairs)
Set VI (961 pairs)
Higher positive selection between orthologs from strains with opposite light optima
Pronounced effects of directional mutational bias in the intergenic regions of HL P. marinus strains
In an attempt to examine whether the G+C-bias of intergenic regions of the different strains (with varying genomic G+C-content), follow trends similar to the respective coding regions, G+C-content of intergenic regions were calculated. The intergenic regions are, in general, more A+T rich than the overall genomic G+C-content of respective organisms (Additional file 4). Also, the A+T bias of intergenic regions are more pronounced in HL strains than their LL counterparts.
Putative remnants of coding regions and their G+C-content (%)
Putative remnants of coding regions
No. of hits
Among several genome/proteome signatures of P. marinus strains reported for the first time in this work, the most notable is the impact of pronounced replication-strand-specific asymmetry on synonymous codon usage, observed exclusively in the low light adapted strains of P. marinus (Figure 1). This is noteworthy for two reasons: (i) Presence of pronounced strand-specific mutational bias with detectable influence on codon usage was observed so far mostly for obligatory intracellular microorganisms having reduced genomes [2, 3, 5]. Interestingly, all 6 LL strains of P. marinus exhibiting strand-specific synonymous codon usage are free-living and two of them (LL1 and LL2) are characterized by relatively larger genome size. On the other hand, for the reduced genomes of 6 HL strains, no perceivable sign of strand asymmetry could be seen in their usage of synonymous codons. (ii) In most of the other microbial genomes with asymmetric mutational bias, the genes, especially the highly expressed ones, are present in the leading strands of replication in significantly higher numbers, the phenomenon referred to as replicational-transcriptional selection [2, 3, 34, 35]. No such definite significant bias in gene distribution is observed in either of the strands of replication in the LL strains of P. marinus. Strand asymmetry in codon usage of Prochlorococcus, therefore, may not bear an explicit causality to the event of genome reduction or with replicational-transcriptional selection.
The homogenization of the strand asymmetric bias in the HL strains may be attributed, at least partially, to the absence of a specific type of DNA repair enzyme MutY. In previous studies of Rocap et al.  and Dufresne et al.  it have been shown that the enzyme MutY is absent in the strain P. marinus str. CCMP1986 (HL3), while it is present in P. marinus str. CCMP1375 (LL3) and P. marinus str. MIT9313 (LL1). MutY, an A/G-specific DNA glycosylase, acts with MutT (NTP pyrophosphohydrolase) and MutM (formamido-pyrimidine-DNA glycosylase) to avoid misincorporation of oxidized guanine (8-oxoG) in DNA and to repair the base mismatches A:8-oxoG . Knocking out both mutM and mutY in E. coli results in a 1,000-fold increase of G:C to A:T transversions in comparison to the wild-type strain . Our analysis reveals (through BLASTP search) that mutY is present only in the LL strains, but not in any of the 6 HL strains. The excess number of 'G's present in the leading strands of LL strains might have transversed to 'A's in the HL strains due to the absence of mutY in the later, and this in turn, caused a simultaneous increase of 'T's in the lagging strands, eventually leading to homogenization of the G+T and A+C frequencies in two strands of replication in the HL strains. Existing mutational drift towards A+T-enrichment in the HL strains might also have facilitated achieving the uniformity in those strains. Further insights may be accumulated in this regard with the availability of more completely sequenced Prochlorococcus genomes in future.
In the process of gradual genome reduction, mutations often accumulate in expendable genes, thereby transforming them, by degrees, to pseudogenes, to small fragments, to extinction . In the reduced genomes of P. marinus, we have found some putative remnants of coding regions, the A+T-content of which are, in general, higher than that of coding regions, but lower than other non-coding regions. This is in agreement with the fact that the reduced genomes of P. marinus (especially those of HL strains) are subject to a strong mutational A+T-drift, and will therefore result in gradual A+T-enrichment of the genic remnants already released from amino-acid-coding constraints in recent past. The base composition of such remnants is expected to gradually approach the A+T-content of bona fide non-coding regions.
Comparison of orthologous gene synteny from five representative strains having different genome size and G+C-content clearly points at a high level of chromosomal rearrangement during genome shrinkage in Prochlorococcus. This finding is in agreement with earlier findings on association of chromosomal rearrangement events with higher rates of chromosomal evolution and/or the phenomenon of genome reduction, as in Arabidopsis thaliana  and different endoparasites/endosymbionts [41, 42]. Intra-chromosomal recombination at duplicated sequences often results in deletion of intervening sequences, and rearrangement of flanking regions, thereby leading to genome shrinkage .
Previous analyses with endosymbiotic or endoparasitic organisms like Bartonella, Tropheryma, Buchnera, Wigglesworthia etc. [2, 3, 28, 29] revealed that the phenomenon of genome reduction is normally associated with population bottlenecks or other mechanisms such as selective sweeps. In case of the hyperthermophile Nanoarchaeum equitans, extreme genome reduction is a feature of its thermoparasitic adaptation . Although our knowledge of bacterial populations in open oceans is not exhaustive, it may certainly be assumed that P. marinus ecotypes, the most abundant free-living marine cyanobacteria and an important contributor to global photosynthesis, are not subject to small population sizes [13–15, 30]. More importantly, the HL strains with reduced genomes are apparently biologically superior than their LL counterparts . It is possible that the bias towards reduced A+T rich genomes in HL strains is consistent with cellular economy at regions with limited nitrogen and phosphorous near the ocean surface. Scarcity of these elements that are essential in DNA synthesis favors the incorporation of an AT base-pair containing seven atoms of nitrogen, one less than a GC base-pair. It is worth mentioning at this point that the trends in amino acid usage in different P. marinus strains, as observed in this study are quite compatible with the earlier report by Lv et al.  on influence of resource availability on proteome composition of these species. For instance, increase in overall aromaticity from LLa to LLb and HL strains is in full agreement with the observations by Lv et al.  on increased carbon-content in the encoded proteins of different HL strains as compared to that of LL strains. The average instability indices of the HL proteins are significantly lower than those of their LL orthologs, suggesting that the HL proteins, in general, may be more stable. Proteins characterized by higher percentages of helix structures, experience increased overall packing that imparts more rigidity  and, hence, a decrease in regions with helix-forming propensities with a subsequent increase in coiled structures in HL proteins probably makes them more flexible. It is also tempting to presume that higher values of aromaticity and pI in HL proteins, as compared to LL orthologs, might facilitate cation-pi interactions in the former, imparting more stability. The central issue in the adaptation of HL proteins to their environmental niches may, therefore, be the conservation of their functional state, characterized by a well-balanced optimization of stability and flexibility.
The current study advocates for the presence of adaptive selection forces that might have played significant role in governing Prochlorococcus evolution and fitness at the genome and proteome levels. An optimization between these adaptive forces and directional mutational bias has set definite trends in molecular evolution of P. marinus. This characterizes different P. marinus ecotypes with distinct niche-specific compositional, physicochemical and structural traits, thereby driving them chronologically towards increasing stability and/or fidelity.
All predicted protein coding sequences and the complete genome sequences of the 12 different strains of P. marinus were retrieved from the NCBI GenBank (listed in Table 1). For comparison, the predicted protein coding sequences of E. coli (NC_000913.2), Bacillus cereus (NC_003909.8), Francisella tularensis (NC_006570.1), Synechococcus elongatus (NC_006576.1), Synechocystis sp. (NC_000911.1), Nostoc sp. (NC_003272.1), Campylobacter jejuni (NC_003912.7), Cyanothece (NC_011884.1) and Gramella forsetii (NC_008571.1) were also retrieved from GenBank. Annotated ORFs, which encode proteins less than 100 amino acids long, were not considered for further analysis.
Determination of leading and lagging strand genes
In order to identify the replication origin (oriC) or termination (ter) sites we performed GC-skew (G-C/G+C) analysis using a sliding window of 10 Kb along the genome sequence. The sites were validated by checking the neighbouring gene organization (e.g. identified origins in Prochlorococcus genomes were flanked by DNA polymerase beta subunit III gene on the 3' side and the Threonine synthatase gene on the 5' side) and the presence of DnaA boxes in their vicinity . Based on the predicted oriC and ter sites (Additional file 6), the leading strands and lagging strands of replication for each genomes were identified along with the genes encoded on the two strands.
Multivariate analyses on synonymous codon and amino acid usage and cluster analysis on amino acid usage
Correspondence analysis (COA) on relative synonymous codon usage (RSCU) and amino acid usage of genes/proteins were performed on individual genomes in order to identify any significant variation in the usage of codons or amino acids, if present, and help ascertain the underlying cause(s), using the program CODONW 1.4.2 .
To find out the variation in amino acid usage between LL and HL Prochlorococcus strains, a cluster analysis on standardized amino acid usage was carried out using STATISTICA (version 6.0, published by Statsoft Inc., USA) for all 12 Prochlorococcus organisms (Table 1) along with E. coli, Cyanothece, C. jejuni and G. forsetii having G+C-content nearly equal to the different LL and HL strains. The amino acid usage of E. coli was chosen as a well-defined reference for standardizing the amino acid composition for the analysis and to produce an accompanying heat map. With the help of a program developed in-house in Visual Basic, a 16 × 20 matrix (heatmap) was generated, where the rows and the columns correspond to data sources (i.e., organisms in the cluster) and standardized amino acid usage values, respectively. The overrepresentation or underrepresentation of standardized amino acid usage values of the organisms in the matrix are shown in green or red colored blocks (Figure 4) respectively, and their intensities varying in accordance with their deviation from the standard (yellow). The extreme left column represents the genomic G+C-content of the respective organisms.
Dinucleotide analysis of DNA sequences
For all Prochlorococcus genomes and E. coli, the dinucleotide abundance for each possible dinucleotide was calculated as the ratio between the observed and expected frequencies of the concerned dinucleotide in its genomic context . Dinucleotide abundance values generally represent the genomic signature of any species  and here we were interested to see whether all Prochlorococcus genomes follow a similar trend or not.
Determination of orthologs
Stand alone BLAST package (ver. 2.2.18) was downloaded from the NCBI FTP site and using the package, all-to-all BLASTN and BLASTP searches were performed with the genes from all the 12 strains of P. marinus. Orthologs across these organisms were defined for this study as protein coding genes having a BLASTP sequence Identity ≥ 60%, not more than 20% difference in length and E-value ≤ 1e-20. The resultant list of 'orthologs' were checked for consistency with the data obtained from Genplot http://www.ncbi.nlm.nih.gov/sutils/geneplot.cgi, which houses a pair-wise list of genes giving mutually best BLASTP hits when all genes from the genomes of any two organisms are 'blasted' against each other. We have identified 519 orthologs present in all 12 P. marinus genomes as their core proteome. The stringent measures employed for the similarity search ensure that these orthologs have been sufficiently conserved throughout the adaptive evolution of P. marinus, and any niche-specific features deciphered from this dataset would certainly not be a trivial outcome. For comparative analysis with suitable outgroup organisms, we retrieved orthologs of nine organisms including three representative P. marinus strains (LL1, LL3 and HL3), E. coli, B. cereus, F. tularensis, S. elongatus, Synechocystis sp. and Nostoc sp. from NCBI GenePlot by filtering the symmetrical best hits of protein homologs.
Estimation of synonymous and non-synonymous substitution patterns in orthologous sequences
Positive selection can be inferred from a higher proportion of non-synonymous over synonymous substitutions per site (dN/dS > 1). The dN and dS values were calculated for 519 orthologs of LL1, LL6 and HL3 using the software MEGA (version 4) . The calculation was based on the modified Nei-Gojobori Jukes-Cantor method that considers deviations from an equal frequency of transitions and transversions [50, 51].
Gene synteny visualization
Comparison of the gene repertoire or gene synteny between 5 representative Prochlorococcus strains (LL1, LL3, LL6, HL3 and HL4) were carried out using a Java program developed in-house. It can represent the arrangement of orthologous genes between two chromosomes by joining the locations of the orthologs by differently coloured lines. The red lines represent the genes present on the same strand (+/-) and blue lines represent orthologs coded on different strands of chromosomes being compared.
Calculation of codon/amino acid usage indices and estimation of secondary structure of proteins
Indices like relative synonymous codon usage (RSCU) , G+C and G+T-content at third codon positions (GC3 & GT3 respectively), aromaticity and average hydrophobicity (Gravy score)  of protein coding sequences were calculated to find out the factors influencing codon and amino acid usages. The isoelectric point (pI) and instability index  of each protein were calculated using the Expasy proteomics server . Secondary structures of the identified orthologs were computed using the software PREDATOR  and the varying percentages of the structural components (viz. helices, sheets, and coils) in proteins from different strains were also noted.
Identification of intergenic regions
The sequences coding for mRNAs and structural RNAs were noted from the protein table and structural RNA table respectively (available from NCBI) for each of the organisms. Intergenic regions were identified by subtracting the regions of these gene sequences from the whole genome. The overall G+C-content of the intergenic regions were calculated after concatenating all the intergenic sequences together, for each of the 12 Prochlorococcus strains. For identification of probable pseudogenes/remnants of coding DNA in LL6 and HL3 (two representative strains of groups LLb and HL having reduced genomes), their intergenic regions were subjected to a similarity search (tBlastX) against a pool of Prochlorococcal genes (consisting of sequences from three representative strains LL1, LL6, HL3). 48 hits for LL6 and 93 hits for HL3 were identified, having sequence identities ≥ 30%, aligned lengths ≥ 15 amino acids, and E-values < 1e-3.
relative synonymous codon usage
G+T-content at third codon positions
G+C-content at third codon positions.
We thank Sanjib Chatterjee and Avik Datta, IICB for giving technical support in calculating dinucleotide abundance and synonymous and non-synonymous divergence. This work was supported by the Department of Biotechnology, Government of India (Grant Number BT/BI/04/055-2001) and Council of Scientific and Industrial Research (Project no. CMM 0017). SP and AD are supported by Senior Research Fellowships from Council of Scientific and Industrial Research, India.
- Das S, Paul S, Bag SK, Dutta C: Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaptation. BMC Genomics. 2006, 7: 186-10.1186/1471-2164-7-186.PubMed CentralPubMedView ArticleGoogle Scholar
- Das S, Paul S, Chatterjee S, Dutta C: Codon and amino acid usage in two major human pathogens of genus Bartonella--optimization between replicational-transcriptional selection, translational control and cost minimization. DNA Res. 2005, 12 (2): 91-102. 10.1093/dnares/12.2.91.PubMedView ArticleGoogle Scholar
- Das S, Paul S, Dutta C: Evolutionary constraints on codon and amino acid usage in two strains of human pathogenic actinobacteria Tropheryma whipplei. J Mol Evol. 2006, 62 (5): 645-658. 10.1007/s00239-005-0164-6.PubMedView ArticleGoogle Scholar
- Eisenberg H: Life in unusual environments: progress in understanding the structure and function of enzymes from extreme halophilic bacteria. Arch Biochem Biophys. 1995, 318 (1): 1-5. 10.1006/abbi.1995.1196.PubMedView ArticleGoogle Scholar
- Moran NA, Wernegreen JJ: Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol Evol. 2000, 15 (8): 321-326. 10.1016/S0169-5347(00)01902-9.PubMedView ArticleGoogle Scholar
- Paul S, Bag SK, Das S, Harvill ET, Dutta C: Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. Genome Biol. 2008, 9 (4): R70-10.1186/gb-2008-9-4-r70.PubMed CentralPubMedView ArticleGoogle Scholar
- Pikuta EV, Hoover RB, Tang J: Microbial extremophiles at the limits of life. Crit Rev Microbiol. 2007, 33 (3): 183-209. 10.1080/10408410701451948.PubMedView ArticleGoogle Scholar
- Singer GA, Hickey DA: Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene. 2003, 317 (1-2): 39-47. 10.1016/S0378-1119(03)00660-7.PubMedView ArticleGoogle Scholar
- Bliska JB, Casadevall A: Intracellular pathogenic bacteria and fungi--a case of convergent evolution?. Nat Rev Microbiol. 2009, 7 (2): 165-171.PubMedView ArticleGoogle Scholar
- Merhej V, Royer-Carenzi M, Pontarotti P, Raoult D: Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct. 2009, 4: 13-10.1186/1745-6150-4-13.PubMed CentralPubMedView ArticleGoogle Scholar
- Mongodin EF, Nelson KE, Daugherty S, Deboy RT, Wister J, Khouri H, Weidman J, Walsh DA, Papke RT, Sanchez Perez G: The genome of Salinibacter ruber: convergence and gene exchange among hyperhalophilic bacteria and archaea. Proc Natl Acad Sci USA. 2005, 102 (50): 18147-18152. 10.1073/pnas.0509073102.PubMed CentralPubMedView ArticleGoogle Scholar
- Chisholm S, Olson R, Zettler E, Goericke R, Waterbury J, Welschmeyer N: A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature. 1988, 334: 340-343. 10.1038/334340a0.View ArticleGoogle Scholar
- Goericke R, Welschmeyer N: The marine prochlorophyte Prochlorococcus contributes significantly to phytoplankton biomass and primary production in the Sargasso Sea. Deep-sea research Part 1 Oceanographic research papers. 1993, 40 (11-12): 2283-2294. 10.1016/0967-0637(93)90104-B.View ArticleGoogle Scholar
- Partensky F, Blanchot J, Vaulot D: Differential distribution and ecology of Prochlorococcus and Synechococcus in oceanic waters: a review. Bulletin de l'Institut océanographique(Monaco). 1999, 457-475.Google Scholar
- Partensky F, Hess WR, Vaulot D: Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev. 1999, 63 (1): 106-127.PubMed CentralPubMedGoogle Scholar
- Moore LR, Rocap G, Chisholm SW: Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature. 1998, 393 (6684): 464-467. 10.1038/30861.PubMedView ArticleGoogle Scholar
- Urbach E, Scanlan DJ, Distel DL, Waterbury JB, Chisholm SW: Rapid diversification of marine picophytoplankton with dissimilar light-harvesting structures inferred from sequences of Prochlorococcus and Synechococcus (Cyanobacteria). J Mol Evol. 1998, 46 (2): 188-201. 10.1007/PL00006294.PubMedView ArticleGoogle Scholar
- West NJ, Scanlan DJ: Niche-partitioning of Prochlorococcus populations in a stratified water column in the eastern North Atlantic Ocean. Appl Environ Microbiol. 1999, 65 (6): 2585-2591.PubMed CentralPubMedGoogle Scholar
- Kettler GC, Martiny AC, Huang K, Zucker J, Coleman ML, Rodrigue S, Chen F, Lapidus A, Ferriera S, Johnson J: Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 2007, 3 (12): e231-10.1371/journal.pgen.0030231.PubMed CentralPubMedView ArticleGoogle Scholar
- Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren NA, Arellano A, Coleman M, Hauser L, Hess WR: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003, 424 (6952): 1042-1047. 10.1038/nature01947.PubMedView ArticleGoogle Scholar
- Coleman ML, Chisholm SW: Code and context: Prochlorococcus as a model for cross-scale biology. Trends Microbiol. 2007, 15 (9): 398-407. 10.1016/j.tim.2007.07.001.PubMedView ArticleGoogle Scholar
- Garcia-Fernandez JM, Diez J: Adaptive mechanisms of nitrogen and carbon assimilatory pathways in the marine cyanobacteria Prochlorococcus. Res Microbiol. 2004, 155 (10): 795-802. 10.1016/j.resmic.2004.06.009.PubMedView ArticleGoogle Scholar
- Martiny AC, Coleman ML, Chisholm SW: Phosphate acquisition genes in Prochlorococcus ecotypes: evidence for genome-wide adaptation. Proc Natl Acad Sci USA. 2006, 103 (33): 12552-12557. 10.1073/pnas.0601301103.PubMed CentralPubMedView ArticleGoogle Scholar
- Sullivan MB, Waterbury JB, Chisholm SW: Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature. 2003, 424 (6952): 1047-1051. 10.1038/nature01929.PubMedView ArticleGoogle Scholar
- Dufresne A, Garczarek L, Partensky F: Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 2005, 6 (2): R14-10.1186/gb-2005-6-2-r14.PubMed CentralPubMedView ArticleGoogle Scholar
- Berg OG, Kurland CG: Evolution of microbial genomes: sequence acquisition and loss. Mol Biol Evol. 2002, 19 (12): 2265-2276.PubMedView ArticleGoogle Scholar
- Lynch M, Blanchard JL: Deleterious mutation accumulation in organelle genomes. Genetica. 1998, 102-103 (1-6): 29-39. 10.1023/A:1017022522486.PubMedView ArticleGoogle Scholar
- Wernegreen JJ: Genome evolution in bacterial endosymbionts of insects. Nat Rev Genet. 2002, 3 (11): 850-861. 10.1038/nrg931.PubMedView ArticleGoogle Scholar
- Wernegreen JJ, Moran NA: Evidence for genetic drift in endosymbionts (Buchnera): analyses of protein-coding genes. Mol Biol Evol. 1999, 16 (1): 83-97.PubMedView ArticleGoogle Scholar
- Hu J, Blanchard JL: Environmental sequence data from the Sargasso Sea reveal that the characteristics of genome reduction in Prochlorococcus are not a harbinger for an escalation in genetic drift. Mol Biol Evol. 2009, 26 (1): 5-13. 10.1093/molbev/msn217.PubMedView ArticleGoogle Scholar
- Garcia-Fernandez JM, de Marsac NT, Diez J: Streamlined regulation and gene loss as adaptive mechanisms in Prochlorococcus for optimized nitrogen utilization in oligotrophic environments. Microbiol Mol Biol Rev. 2004, 68 (4): 630-638. 10.1128/MMBR.68.4.630-638.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Dufresne A, Salanoubat M, Partensky F, Artiguenave F, Axmann IM, Barbe V, Duprat S, Galperin MY, Koonin EV, Le Gall F: Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci USA. 2003, 100 (17): 10020-10025. 10.1073/pnas.1733211100.PubMed CentralPubMedView ArticleGoogle Scholar
- Marais GA, Calteau A, Tenaillon O: Mutation rate and genome reduction in endosymbiotic and free-living bacteria. Genetica. 2008, 134 (2): 205-210. 10.1007/s10709-007-9226-6.PubMedView ArticleGoogle Scholar
- Lafay B, Lloyd AT, McLean MJ, Devine KM, Sharp PM, Wolfe KH: Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases. Nucleic Acids Res. 1999, 27 (7): 1642-1649. 10.1093/nar/27.7.1642.PubMed CentralPubMedView ArticleGoogle Scholar
- McInerney JO: Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci USA. 1998, 95 (18): 10698-10703. 10.1073/pnas.95.18.10698.PubMed CentralPubMedView ArticleGoogle Scholar
- Gentles AJ, Karlin S: Genome-scale compositional comparisons in eukaryotes. Genome Res. 2001, 11 (4): 540-546. 10.1101/gr.163101.PubMed CentralPubMedView ArticleGoogle Scholar
- Michaels ML, Cruz C, Grollman AP, Miller JH: Evidence that MutY and MutM combine to prevent mutations by an oxidatively damaged form of guanine in DNA. Proc Natl Acad Sci USA. 1992, 89 (15): 7022-7025. 10.1073/pnas.89.15.7022.PubMed CentralPubMedView ArticleGoogle Scholar
- Horst JP, Wu TH, Marinus MG: Escherichia coli mutator genes. Trends Microbiol. 1999, 7 (1): 29-36. 10.1016/S0966-842X(98)01424-3.PubMedView ArticleGoogle Scholar
- Andersson SG, Zomorodipour A, Andersson JO, Sicheritz-Ponten T, Alsmark UC, Podowski RM, Naslund AK, Eriksson AS, Winkler HH, Kurland CG: The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature. 1998, 396 (6707): 133-140. 10.1038/24094.PubMedView ArticleGoogle Scholar
- Yogeeswaran K, Frary A, York TL, Amenta A, Lesser AH, Nasrallah JB, Tanksley SD, Nasrallah ME: Comparative genome analyses of Arabidopsis spp.: inferring chromosomal rearrangement events in the evolutionary history of A. thaliana. Genome Res. 2005, 15 (4): 505-515. 10.1101/gr.3436305.PubMed CentralPubMedView ArticleGoogle Scholar
- Belda E, Moya A, Silva FJ: Genome rearrangement distances and gene order phylogeny in gamma-Proteobacteria. Mol Biol Evol. 2005, 22 (6): 1456-1467. 10.1093/molbev/msi134.PubMedView ArticleGoogle Scholar
- Mira A, Ochman H, Moran NA: Deletional bias and the evolution of bacterial genomes. Trends Genet. 2001, 17 (10): 589-596. 10.1016/S0168-9525(01)02447-7.PubMedView ArticleGoogle Scholar
- Lv J, Li N, Niu DK: Association between the availability of environmental resources and the atomic composition of organismal proteomes: evidence from Prochlorococcus strains living at different depths. Biochem Biophys Res Commun. 2008, 375 (2): 241-246. 10.1016/j.bbrc.2008.08.011.PubMedView ArticleGoogle Scholar
- Fleming PJ, Richards FM: Protein packing: dependence on protein size, secondary structure and amino acid composition. J Mol Biol. 2000, 299 (2): 487-498. 10.1006/jmbi.2000.3750.PubMedView ArticleGoogle Scholar
- Mackiewicz P, Zakrzewska-Czerwinska J, Zawilak A, Dudek MR, Cebrat S: Where does bacterial replication start? Rules for predicting the oriC region. Nucleic Acids Res. 2004, 32 (13): 3781-3791. 10.1093/nar/gkh699.PubMed CentralPubMedView ArticleGoogle Scholar
- Penden J: Analysis of codon usage. PhD thesis. 1997, University of Nottingham, Department of GeneticsGoogle Scholar
- Karlin S, Burge C: Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 1995, 11 (7): 283-290. 10.1016/S0168-9525(00)89076-9.PubMedView ArticleGoogle Scholar
- Karlin S, Mrazek J, Campbell AM: Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol. 1997, 179 (12): 3899-3913.PubMed CentralPubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.PubMedView ArticleGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3 (5): 418-426.PubMedGoogle Scholar
- Nei M, Kumar S: Molecular evolution and phylogenetics. 2000, Oxford University Press, USAGoogle Scholar
- Sharp PM, Li WH: The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15 (3): 1281-1295. 10.1093/nar/15.3.1281.PubMed CentralPubMedView ArticleGoogle Scholar
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982, 157 (1): 105-132. 10.1016/0022-2836(82)90515-0.PubMedView ArticleGoogle Scholar
- Guruprasad K, Reddy BV, Pandit MW: Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 1990, 4 (2): 155-161. 10.1093/protein/4.2.155.PubMedView ArticleGoogle Scholar
- Expasy Proteomics Server. [http://expasy.org]
- Frishman D, Argos P: Seventy-five percent accuracy in protein secondary structure prediction. Proteins. 1997, 27 (3): 329-335. 10.1002/(SICI)1097-0134(199703)27:3<329::AID-PROT1>3.0.CO;2-8.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.