Compositional patterns in the genomes of unicellular eukaryotes
© Costantini et al.; licensee BioMed Central Ltd. 2013
Received: 1 November 2012
Accepted: 31 October 2013
Published: 5 November 2013
The genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores, large and fairly homogeneous stretches of DNA that belong to a small number of families characterized by different average GC levels, by different gene concentration (that increase with GC), different chromatin structures, different replication timing in the cell cycle, and other different properties. A question raised by these basic results concerns how far back in evolution the compartmentalized organization of the eukaryotic genomes arose.
In the present work we approached this problem by studying the compositional organization of the genomes from the unicellular eukaryotes for which full sequences are available, the sample used being representative. The average GC levels of the genomes from unicellular eukaryotes cover an extremely wide range (19%-60% GC) and the compositional patterns of individual genomes are extremely different but all genomes tested show a compositional compartmentalization.
The average GC range of the genomes of unicellular eukaryotes is very broad (as broad as that of prokaryotes) and individual compositional patterns cover a very broad range from very narrow to very complex. Both features are not surprising for organisms that are very far from each other both in terms of phylogenetic distances and of environmental life conditions. Most importantly, all genomes tested, a representative sample of all supergroups of unicellular eukaryotes, are compositionally compartmentalized, a major difference with prokaryotes.
Investigations at the sequence level showed that (i) the genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores that belong to a small number of families that are characterized by different GC levels and dinucleotide frequencies [1–6]. These findings confirmed and extended previous investigations (originally using density gradient ultracentrifugation [7–9]) carried out by our laboratory over many years (see  for a review).
The results available so far support the idea of isochores being a “fundamental level of genome organization”  not only in vertebrates but also in the other multicellular eukaryotes analyzed. Indeed, as established by our previous work, not only gene distribution, but also chromatin structure, short sequence frequencies, DNA methylation, gene expression, replication timing and recombination are the main structural and functional properties associated with isochore families of all multicellular eukaryotes explored so far. We also proposed that the large conservation of GC levels and dinucleotide frequencies of isochore families reflect the conservation of chromatin structures, whereas the conservation of isochore size might be due to the role played by isochores in chromosome structure and replication [2, 11]. These results stress the interest of understanding the structure and the evolution of compositional patterns in unicellular eukaryotes.
Some early results indicated that the nuclear genome of Euglena gracilis and the macro-nuclear genome of Tetrahymena pyriformis were remarkably homogeneous in base composition, while the nuclear genome of Saccharomyces cerevisiae showed a slight heterogeneity . Later work based on sequenced yeast chromosomes showed that some of them consist of alternating large domains of GC-rich and GC-poor DNA [12–14], generally correlating with a variation in gene density. More recent work showed that in yeast GC-rich and GC-poor isochores are different in chromatin conformation, histone modification and transcription; more precisely, GC-rich isochores have a more extended chromatin conformation, different levels of histone acetylation and more highly expressed GC-rich genes .
In the case of Plasmodium falciparum, the unicellular parasite responsible for the most virulent and widespread form of human malaria, a striking feature is that it hosts the GC-poorest (19.4% GC) nuclear genome known so far [16, 17]. In Plasmodium cynomolgi, a compositional compartmentalization was demonstrated in the nuclear DNA, which consists of DNA segments likely to average 100 kb .
Both DNAs from Trypanosoma brucei and Trypanosoma equiperdum (two closely related trypanosomes [19, 20]) showed a bimodal distribution characterized by two major peaks banding at 1.702-1.703 and 1.707-1.708 g/cm3 in CsCl density gradients and representing 1/3 and 2/3 of total DNA, respectively; a number of minor components were also detected, corresponding to satellite DNAs and possibly to ribosomal DNA .
In conclusion, the results on yeast, Plasmodia and Trypanosomes indicated that a compositional compartmentalization was not only present in the genomes of metazoan and plants, but also in those of unicellular eukaryotes. These findings encouraged us to extend our investigations to other unicellular eukaryotes.
Other important aspects, indicative of a wide genomic diversity are worth mentioning: 1) The range of genome sizes of unicellular eukaryotes (8.7 Mb to 357 Mb, a 41-fold range; ) is even broader than that of metazoans (from 94.4 Mb to 3000 Mb, a 32-fold range, neglecting cases of polyploidy; [4, 5]). 2) The range of average GC levels of the genomes of unicellular eukaryotes is as broad as that of prokaryotes [23, 24]. 3) The chromatin structure of unicellular eukaryotes may be organized in a different way compared to that of multicellular eukaryotes. For example, Saccharomyces cerevisiae lacks histone H1; similarly, Trypanosomes, although they have H1 histone, this protein is quite divergent and chromatin does not reach high levels of compaction during mitosis. 4) The environmental conditions under which unicellular eukaryotes live are much more diverse than those of vertebrates and also of invertebrates. 5) Unicellular eukaryotes lack the very complex regulatory system involved in the developmental process of multicellular eukaryotes.
All these considerations prompted us to tackle the analysis of compositional organization in unicellular eukaryotes. Here we approached these problems by studying the genomes of representative species from all the so-called “supergroups” of unicellular eukaryotes.
In this work we studied the compositional organization in representative species from all the so called eukaryotic “supergroups” (see also Additional file 1: Table S1 and refs. [25, 26]). In Additional file 2: Figure S1 we report the phylogenetic distribution of the unicellular species analyzed in the present work .
Green and red algae (Ostreococcus tauri, Cyanidioschyzon merolae respectively) represent Plantae. The supergroup Amebozoa is represented by the slime mold, Dictyostelium discoideum. In the supergroup Chromoalvelata we analysed species from the four main groups: two diatoms (Thalassiosira pseudonana and Phaeodactylum tricornutum) representing, Stramopiles. For the Apicomplexans (that include parasitic species in mammals) we analysed the human pathogen Toxoplasma gondii and the malarial parasites Plasmodium berghei, Plasmodium chabaudi, Plasmodium knowlesi, Plasmodium falciparum and Plasmodium vivax. The Cryptophyta group is represented by Guillardia theta, while for the last group, Ciliates, the analysis was only partial due to the fragmented genome assembly that is available for this species. The Excavata supergroup is represented by two Kinetoplastids (Trypanosoma brucei and Trypanosoma cruzi), while in the Fornicata group, the species analysed was Giardia lamblia, even if, in this case too, the analysis was only partial due to the incompleteness of the assembled genome. Finally the supergroup Opisthokonta (which also includes animals) is represented here by several unicellular fungi: Saccharomyces cerevisiae, Candida glabrata, Ashbya gossypii and Cryptococcus neoformans.
Average GC (A) and relative amounts (B) in percentage of components from unicellular eukaryotes
A) Average GC
B) Relative amount
The parasitic protist Toxoplasma gondii consisted of one major component centered at 52% GC and a smaller component at 55% GC, whereas the Amoeba Dictiostelium discoideum showed one major component centered at 28% GC (Figure 3).
GC content, number of contigs/scaffolds and their total lengths in megabases (Mb), length of scaffolds > 100 and > 100 kb and their percentage on the total length were reported
< 100 Kb
> 100 Kb
< 100 Kb
> 100 Kb
The results just reported clearly show that the genomes of unicellular eukaryotes range from narrow compositional distributions, as in the case of O. tauri, T. pseudonana, C. neoformans and P. falciparum, P. berghei and P. chabaudi, to more heterogeneous patterns, such as those of S. cerevisiae and T. brucei, while in many other groups such as P. knowlesi, P. vivax, and T. cruzi, the heterogeneity is remarkable. These observations deserve some general comments (in addition to those already made in the preceding section).
Several findings are very striking when compared with both vertebrate and invertebrate genomes. Even if the number of genomes is admittedly modest, a first observation is that free-living unicellular organisms generally show narrower compositional distributions with only minor additional components (S. cerevisiae and A. gossypii, the latter showing, however, a slightly wider compositional range; 52%-55% GC). This narrow distribution is centered, however, on very different GC levels, that range from 38%-40% GC for the two yeasts to almost 60% GC for the green alga O. tauri. Obviously, it would be interesting to correlate these very different compositions to environmental factors. This seems, however, to be possible only for C. merolae, in which case the high GC level (55% GC) might be related to the hot acid springs (45°C; pH 2.0) of its habitat. This idea is supported by our previous findings in which high GC levels are correlated with the high body or optimal growth temperatures, in the case of vertebrates and bacteria, respectively (see  for a review). Interestingly, protein divergence between Galdieria sulphuraria, which lives like C. merolae in hot spring, and Galdieria phlegrea, which lives in less extreme habitat (i.e. moderate pH and temperature) is similar to that between human and medaka .
In contrast, parasitic unicellular organisms show some striking features, namely that within the same genus one species may have a wide compositional distribution (this is the case of T. cruzi and of P. vivax) and other ones have a very narrow distribution (P. falciparum, P. berghei and P. chabaudi). These results are highly suggestive of compositional adaptation. Needless to say, it would be of great interest to identify the causes for such adaptations, especially since recent results  reported a lack of synteny among Apicomplexa due to genome rearrangements.
The compositional compartmentalization of some genomes of unicellular eukaryotes is possibly linked to a different chromatin structure and different regulation of gene expression. The results of Table 1 also show something of great potential interest, namely that, apart from the extreme cases of P. falciparum and O. tauri, the GC values for the single or multiple DNA components are very close to those previously found for the isochore families of vertebrates and invertebrates. This might be a coincidence, but might also be linked to specific features of chromatin structures. Needless to say that it would be also very interesting to consider whether genes characterized by specific functions are differentially distributed in the two major families exhibited by T. brucei and P. vivax, respectively.
At this point, it is worthwhile mentioning that an intrachromosomal compositional heterogeneity was also found in prokaryotic genomes . In fact, while most prokariotic species tested are compositionally homogeneous, a minority are rather heterogeneous in composition, an explanation, being, however, associated with recent lateral transfers.
Previous results on the genomes from a small number of unicellular eukaryotes provided the first indication that a compositional compartmentalization was not only present in the genomes of multicellular eukaryotes, but also in those of some protozoa. The findings presented here revealed that situations of compositional compartmentalization covering a very broad range were generally present in unicellular eukaryotes. Even if the sample of organisms investigated is admittedly modest this point is clearly demonstrated. This distinguishes eukaryotes that always show compartmentalized genomes from prokaryotes, in which case the compositional heterogeneity is exceedingly rare and possibly always associated with recent lateral tranfers.
The results presented here, and previous observations (like those already mentioned for the budding yeast), lead us to suggest that genome compartmentalization is a very general feature of all eukaryotes. Different levels of compartmentalization are probably linked with increasing regulatory complexity and/or other functional requirements to which organisms are bound. This idea is in line with a more general notion in Biology concerning the role of compartmentalization as a fundamental way to organize structure and function at all levels from the organ level down to the cellular and genome level.
Two additional conclusions we consider as preliminary, but, if confirmed by investigations on a larger sample, would be of very great interest. The first one concerns the differences found between free-living and parasitic unicellular eukaryotes. The second one, the fact that GC levels found in unicellular eukaryotes are very close (with two exceptions) to those of isochore families from multicellular eukaryotes. Indeed, the first point suggests compositional adaptation of the genomes of parasitic unicellular organisms, the second a correlation with chromatin structure.
Genome and gene sequences: the resources
The sequences of unicellular genomes as well as those of the genes analyzed in this study were downloaded from different websites (see Additional file 3: Table S2). Partial, putative, synthetic construct, predicted, not experimental, hypothetical protein, r-RNA, t-RNA, ribosomal and mitochondrial genes were eliminated and then the cleanup program  was applied for ridding nucleotide sequence databases of redundancies. For the remaining genes a script implemented by us was used in order to identify the coding sequences beginning with a start codon and ending with a stop codon. The coordinates of genes on the chromosomes were retrieved from the website used for downloading the chromosomes.
Compositional patterns: methodology and nomenclature
The entire chromosomal sequences of the finished genome assembly were partitioned into non-overlapping windows, and their GC levels were calculated using the program draw_chromosome_gc.pl [32, 33]. The general methodology used to map DNA segments on unicellular genomes was that described for the isochore map of vertebrates  and invertebrates genomes [4, 5]. It should be stressed that this methodology has a trend to overestimate compositionally homogeneous regions, because the standard deviation tends to decrease with increasing size of the regions. Because of the small chromosome sizes of several unicellular genomes under analysis, we used a non-overlapping window of 25 kb, a size suitable for all the unicellular genomes. The GC levels of compositionally nearly homogeneous DNA segments were calculated using a script implemented by us. The sequences of contigs/scaffolds for unicellular genomes reported in Table 2 were downloaded from Ensembl Genome Browser (http://protists.ensembl.org/).
In order to demonstrate that the different compositional patterns found were not an artifact due to the small window used (25 kb), we analyzed two unicellular genomes showing a strong compositional heterogeneity using two different non-overlapping windows. Additional file 4: Figure S2A-B display the compositional profiles of T. brucei and P. vivax at windows of 25 kb and 100 kb. The results clearly demonstrate that the levels of heterogeneity at 25 kb were barely larger than at 100 kb.
Additional file 5: Tables S3-S20 report the coordinates, sizes and GC levels of the segments identified in the genomes. When these segments were pooled in bins of 0.5% GC, families of segments were found according to their average GC levels. Table 1 reports the average GC levels and the relative amounts from these families. For the sake of comparison, Table 1 also shows the average GC levels calculated for the different isochore families of vertebrates  and invertebrates .
As far as the name of each DNA segment is concerned we used a convention in which the first number in the name represents the chromosome number, the following two letters are the initials of the scientific name of the species under consideration, and the last number identifies the fragment.
We thank Fabio Auletta for bioinformatics support. We also thank Dr. Daniel Neafsey at Broad Institute of MIT and Harvard to for providing us a set of orthologous genes of P. vivax and P. falciparum for our analysis and Dr. Oliver Clay for helpful discussions.
- Costantini M, Clay O, Auletta F, Bernardi G: An isochore map of human chromosomes. Genome Res. 2006, 16: 536-541. 10.1101/gr.4910606.PubMed CentralView ArticlePubMedGoogle Scholar
- Costantini M, Clay O, Federico C, Saccone S, Auletta F, Bernardi G: Human chromosomal bands: nested structure, high-definition map and molecular basis. Chromosoma. 2007, 116: 29-40. 10.1007/s00412-006-0078-0.View ArticlePubMedGoogle Scholar
- Costantini M, Clay O, Auletta F, Bernardi G: Isochore and gene distribution in fish genomes. Genomics. 2007, 90: 364-371. 10.1016/j.ygeno.2007.05.006.View ArticlePubMedGoogle Scholar
- Costantini M, Cammarano R, Bernardi G: The evolution of isochore patterns in vertebrate genomes. BMC Genomics. 2009, 10: 146-10.1186/1471-2164-10-146.PubMed CentralView ArticlePubMedGoogle Scholar
- Cammarano R, Costantini M, Bernardi G: The isochore patterns of invertebrate genomes. BMC Genomics. 2009, 10: 538-10.1186/1471-2164-10-538.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernardi G: Structural and Evolutionary Genomics. Natural Selection in Genome Evolution. 2004, Amsterdam, The Netherlands: Elsevier, reprinted in 2005Google Scholar
- Filipski J, Thiery JP, Bernardi G: An analysis of the bovine genome by Cs2 SO4 /Ag+ density gradient centrifugation. J Mol Biol. 1973, 80: 177-197. 10.1016/0022-2836(73)90240-4.View ArticlePubMedGoogle Scholar
- Thiery JP, Macaya G, Bernardi G: An analysis of eukaryotic genomes by density gradient centrifugation. J Mol Biol. 1976, 108: 219-235. 10.1016/S0022-2836(76)80104-0.View ArticlePubMedGoogle Scholar
- Macaya G, Thiery JP, Bernardi G: An approach to the organization of eukaryotic genomes at a macromolecular level. J Mol Biol. 1976, 108: 237-254. 10.1016/S0022-2836(76)80105-2.View ArticlePubMedGoogle Scholar
- Eyre-Walker A, Hurst LD: The evolution of isochores. Nat Rev Genet. 2001, 2: 549-555. 10.1038/35080577.View ArticlePubMedGoogle Scholar
- Costantini M, Bernardi G: Replication timing, chromosomal bands and isochores. Proc Natl Acad Sci U S A. 2008, 105: 3433-3437. 10.1073/pnas.0710587105.PubMed CentralView ArticlePubMedGoogle Scholar
- Karlin S, Blaisdell BE, Sapolsky RJ, Cardon L, Burge C: Assessment of DNA inhomogeneities in yeast chromosome III. Nucleic Acid Res. 1993, 21: 703-711. 10.1093/nar/21.3.703.PubMed CentralView ArticlePubMedGoogle Scholar
- Sharp PM, Lloyd AT: Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. Nucleic Acid Res. 1993, 21: 179-183. 10.1093/nar/21.2.179.PubMed CentralView ArticlePubMedGoogle Scholar
- Dujon B: The yeast genome project: what did we learn?. Trends Genet. 1996, 12: 263-270. 10.1016/0168-9525(96)10027-5.View ArticlePubMedGoogle Scholar
- Dekker J: GC- and AT–rich chromatin domains differ in conformation and histone modification status and are differentially modulated by Rpd3p. Genome Biol. 2007, 8: R11615-View ArticleGoogle Scholar
- Pollak Y, Katzen A, Spira D, Golenser J: The genome of Plasmodium falciparum I: DNA composition. Nucleic Acid Res. 1982, 10: 539-546. 10.1093/nar/10.2.539.View ArticleGoogle Scholar
- McCutchan TF, Dame JB, Miller LH, Barnwell J: Evolutionary relatedness of Plasmodium species as determined by the structure of DNA. Science. 1984, 225: 808-811. 10.1126/science.6382604.View ArticlePubMedGoogle Scholar
- McCutchan TF, Dame JB, Gwadz RW, Vernick KD: The genome of Plasmodium cynomolgi is partitioned into separable domains which appear to differ in sequence stability. Nucleic Acid Res. 1988, 16: 4499-4510. 10.1093/nar/16.10.4499.PubMed CentralView ArticlePubMedGoogle Scholar
- Haag J, O’Huigin C, Overath P: The molecular phylogeny of trypanosomes: evidence for an early divergence of the Salivaria. Mol Biochem Parsitol. 1998, 91: 37-49. 10.1016/S0166-6851(97)00185-0.View ArticleGoogle Scholar
- Stevens JR, Noves HA, Dover GA, Gibson WC: The ancient and divergent origins of the human pathogenic trypanosomes. Trypanosoma brucei and T. cruzi. Parasitology. 1999, 118: 107-116. 10.1017/S0031182098003473.View ArticlePubMedGoogle Scholar
- Isacchi A, Bernardi G, Bernardi G: Compositional compartmentalization of the nuclear genomes of Trypanosoma brucei and Trypanosoma equiperdum. FEBS Lett. 1993, 335: 181-183. 10.1016/0014-5793(93)80725-A.View ArticlePubMedGoogle Scholar
- Suga H, Chen Z, de Mendoza A, Sebé-Pedrós A, Brown MW, Kramer E, Carr M, Kerner P, Vervoort M, Sánchez-Pons N, Torruella G, Derelle R, Manning G, Lang BF, Russ C, Haas BJ, Roger AJ, Nusbaum C, Ruiz-Trillo I: The Capsaspora genome reveals a complex unicellular prehistory of animals. Nature Communication. 2013, 4: 2325-View ArticleGoogle Scholar
- Bernardi G, Bernardi G: Compositional transitions in the nuclear genomes of cold-blooded vertebrates. J Mol Evol. 1990, 31: 282-293. 10.1007/BF02101123.View ArticlePubMedGoogle Scholar
- Katz LA, Grant JR, Parfey LW, Burleigh JG: Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life. Syst Biol. 2012, 61: 653-660. 10.1093/sysbio/sys026.PubMed CentralView ArticlePubMedGoogle Scholar
- Cavalier-Smith T: Deep phylogeny, ancestral groups and the four ages of life. Philos Trans R Soc Lond B Biol Sci. 2010, 365: 111-132. 10.1098/rstb.2009.0161.PubMed CentralView ArticlePubMedGoogle Scholar
- Parfrey LW, Lahr DJ, Knoll AH, Katz LA: Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc Natl Acad Sci U S A. 2011, 108: 13624-13629. 10.1073/pnas.1110633108.PubMed CentralView ArticlePubMedGoogle Scholar
- D’Onofrio G, Bernardi G: A universal compositional correlation among codon positions. Gene. 1992, 110: 81-88. 10.1016/0378-1119(92)90447-W.View ArticlePubMedGoogle Scholar
- Qiu H, Price DC, Weber APM, Reeb V, Chan Yang E, Mo Lee J, Yeon Kim S, Su Yoon H, Bhattacharya D: Adaptation through horizontal gene transfer in the cryptoendolithic red alga Galdieria phlegrea. Curr Biol. 2013, 23: R865-R866. 10.1016/j.cub.2013.08.046.View ArticlePubMedGoogle Scholar
- Debarry JD, Kissinger JC: Jumbled genomes: missing Apicomplexan sinteny. Mol Biol Evol. 2011, 28: 2855-2871. 10.1093/molbev/msr103.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernaola-Galvan P, Oliver JL, Carpena P, Clay O, Bernardi G: Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes. Gene. 2004, 333: 121-133.View ArticlePubMedGoogle Scholar
- Grillo G, Attimonelli M, Liuni S, Pesole G: CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases. Comp Appl Biosci. 1996, 12: 1-8.PubMedGoogle Scholar
- Pavlìček A, Jabbari K, Paces J, Paces V, Hejnar JV, Bernardi G: Similar integration but different stability of Alus and LINEs in the hunan genome. Gene. 2001, 276: 39-45. 10.1016/S0378-1119(01)00645-X.View ArticlePubMedGoogle Scholar
- Pačes J, Zika R, Pavlìček A, Clay O, Bernardi G: Representing GC variation along eukaryotic chromosomes. Gene. 2004, 333: 135-141.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.