Insights into a dinoflagellate genome through expressed sequence tag analysis
© Hackett et al. 2005
Received: 02 February 2005
Accepted: 29 May 2005
Published: 29 May 2005
Skip to main content
© Hackett et al. 2005
Received: 02 February 2005
Accepted: 29 May 2005
Published: 29 May 2005
Dinoflagellates are important marine primary producers and grazers and cause toxic "red tides". These taxa are characterized by many unique features such as immense genomes, the absence of nucleosomes, and photosynthetic organelles (plastids) that have been gained and lost multiple times. We generated EST sequences from non-normalized and normalized cDNA libraries from a culture of the toxic species Alexandrium tamarense to elucidate dinoflagellate evolution. Previous analyses of these data have clarified plastid origin and here we study the gene content, annotate the ESTs, and analyze the genes that are putatively involved in DNA packaging.
Approximately 20% of the 6,723 unique (11,171 total 3'-reads) ESTs data could be annotated using Blast searches against GenBank. Several putative dinoflagellate-specific mRNAs were identified, including one novel plastid protein. Dinoflagellate genes, similar to other eukaryotes, have a high GC-content that is reflected in the amino acid codon usage. Highly represented transcripts include histone-like (HLP) and luciferin binding proteins and several genes occur in families that encode nearly identical proteins. We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X. We speculate this histone may be retained for its role in DNA double-strand break repair.
This is the most extensive collection to date of ESTs from a toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production.
Dinoflagellates play critical roles in marine ecosystems as primary producers and grazers of other bacterial and eukaryotic plankton . Approximately one-half of the ca. 4,000 species of dinoflagellates contain plastids, although many are mixotrophic . Many taxa produce potent toxins and form harmful algal blooms, or "red tides", resulting from populations of more than 20 million cells per liter of seawater. The toxins cause a variety of poisonings that affect humans and marine wildlife  and have a significant impact on coastal ecosystems throughout the world . Yet, other dinoflagellates, like Symbodinium, are central contributors to the health of reef ecosystems as the symbionts of corals . Loss of the dinoflagellate symbiont results in coral bleaching. In addition to their ecological role, dinoflagellates display some fascinating and unique aspects of cell biology. One intriguing character is nuclear biology. The nucleus of dinoflagellates is unlike that of any other eukaryote because the chromosomes are condensed throughout the cell cycle except during DNA replication . The morphologically similar chromosomes are attached to the nuclear envelope and can number in the hundreds . Dinoflagellates also lack nucleosomes , instead the nuclear DNA is associated with basic proteins that are moderately similar to bacterial histone-like proteins (HLPs [8, 9]). Dinoflagellates were thought to lack histones , but in a recent gene expression study, a putative histone H3 was annotated in Pyrocystis lunula, although the sequence was not analyzed further . The general lack of nucleosomes raises many questions about transcription and gene regulation in these organisms. Dinoflagellate nuclei also contain vast amounts of DNA compared to other eukaryotes. Estimates range from 3 - 250 pg·cell-1, or approximately 3,000 - 215,000 megabases (MB) . In comparison, human nuclei contain 3.2 pg·cell-1 (3,180 MB). The dinoflagellate nucleus contains such a high concentration of DNA that it exists in a liquid crystal State, which is responsible for the unique morphology [13, 14]. The DNA to basic protein ratio of dinoflagellate chromosomes has been estimated to be 10:1, which is dramatically higher than the 1:1 ratio observed in most eukaryotes. This indicates that very little basic protein is associated with dinoflagellate chromosomes and that the crystal structure is the primary cause of the unusual morphology. Dinoflagellates are also the only eukaryotes to contain hydroxymethyluracil, a deaminated nucleotide that can be produced by oxidative damage of DNA, which replaces 12 - 70% of the thymidine . The role of polyploidy or potentially, genome amplification within particular life history stages remains to be clarified for dinoflagellates. It is highly unlikely, however, given their relatively simple morphology that the immense DNA content is explained solely by gene content.
The most widespread plastid in dinoflagellates contains the unique photopigment peridinin. The "peridinin plastid" is remarkably different from this organelle in other eukaryotes because it lacks a typical genome. Plastids normally contain a circular genome of about 150 kb that encodes 100 - 200 genes that are necessary for plastid function. In peridinin-containing dinoflagellates, the plastid genome has been broken into minicircles that encode a single, or a few genes per circle. However, only 16 genes have been identified thus far on minicircles [16, 17]. Recent studies show that most of the plastid genes have been transferred to the nucleus [18, 19] with 15 of these genes found exclusively on the plastid genome in all other photosynthetic eukaryotes . The peridinin dinoflagellates encode therefore the smallest number of plastid genes of any photosynthetic eukaryote, making them a model for understanding organellar gene transfer. Nuclear-encoded plastid proteins are targeted to the plastid using a tripartite N-terminal targeting signal . As in Euglena, nuclear-encoded plastid proteins are co-translationally inserted into the endoplasmic reticulum and embedded in this membrane using a stop-transfer sequence in the N-terminus. Through algal endosymbioses, the dinoflagellates have been able to acquire four other types of plastids from distantly related evolutionary lineages including the haptophytes, cryptophytes, diatoms, and prasinophytes [1, 21]. This aspect of their evolutionary history highlights the unmatched ability of dinoflagellates to capture and retain foreign plastids.
Alexandrium tamarense is one of the best-studied dinoflagellates. This species forms toxic blooms and causes paralytic shellfish poisoning through saxitoxin production. It has a peridinin-containing plastid and in North America, A. tamarense blooms from Alaska to Southern California in the Pacific and along the Canadian and New England coasts in the Atlantic. There has been a recent increase in blooms of A. tamarense and other Alexandrium species in other parts of the world making this genus of high importance to the world's fisheries. We undertook a gene discovery project with this organism using expressed sequence tag (EST) data to investigate dinoflagellate evolution and to create a genomic resource for scientists working on different aspects of A. tamarense and dinoflagellate biology. The EST method was the most reasonable approach in this case because haploid A. tamarense cells contain approximately 143 chromosomes and have a genome size of 200 pg/cell (ca. 200,000 Mb [Erdner and Anderson unpublished data]). Our EST results comprise the first extensive high-throughput, genome-wide data set for a dinoflagellate.
Cluster size and frequency of the A. tamarense ESTs.
Best BLAST hit(s)
peridinin-chl a protein, Cytochrome C6, EF1-alpha, unknown
ATP synthase C chain
Form II Rubisco, unknown putative dino. specific protein
fucoxanthin chlorophyll a/c binding protein like
Unknown putative plastid protein
peridinin-chlorophyll a protein, ATP synthase C chain, unknown
histone-like protein/basic nuclear protein
Codon Usage in the A. tamarense ESTs.
Top 20 A. tamarense EST blast hits against the genome of the apicomplexan P. falciparum.
A. tamarense EST
flavoprotein subunit of succinate dehydrogenase
serine/threonine protein phosphatase
26S proteasome regulatory subunit 4
bifunctional dihydrofolate reductase-thymidylate synthase
eukaryotic translation initiation factor 2 gamma subunit
ADP ribosylation factor 1
eukaryotic initiation factor
40S ribosomal protein S5
ribosomal protein S2
protein serine/threonine phosphatase
RNA helicase 1
Top 20 hits of the A. tamarense ESTs to the GenBank nr database.
A. tamarense EST
ribulose 1,5-bisphosphate carboxylase
bifunctional dihydrofolate reductase-thymidylate synthase
S-adenosyl-homocysteine hydrolase like protein
oxygen evolving enhancer 1 precursor
glutamate 1-semialdehyde 2,1-aminomutase
proliferating cell nuclear antigen
Similar to DEAD box polypeptide 48
succinate dehydrogenase flavoprotein subunit, mitochondrial
ADP ribosylation factor 1
DEAD/DEAH box helicase, putative
eukaryotic translation initiation factor 2, subunit 3 gamma
Serine/threonine protein phosphatase PP1 isozyme 2
26S proteasome regulatory subunit 4, putative
40S ribosomal protein S5
As previously mentioned, our bioinformatic analyses identified 6,723 clusters of unique genes. However, this is likely to be a conservative estimate of the number of unique transcripts that were sequenced. A combination of short 3'-UTRs and highly conserved coding regions caused many related transcripts to be assembled together, even though their 3'-UTRs contained sequence differences. For example, two large clusters comprise ESTs that correspond to the plastid atp H gene that encodes the ATP synthase C chain. This gene is normally plastid encoded in other photosynthetic eukaryotes. These two clusters form closely related, but clearly distinct sets of transcripts. An additional atp H-encoding transcript was identified by a single EST. Together, the three clusters contain 43 ESTs, 16 of which are unique. The N-terminal extensions, which encode the tripartite plastid-targeting signals, share an average 74.3% nucleotide and 68.6% amino acid identity, respectively. Similar to many other species, the dinoflagellate transit peptides appear to be under selection to maintain hydrophobiCity rather than a conserved amino acid sequence. This may explain why the nucleotide conservation is greater than that of the encoded amino acids. Five hydrophobic amino acids (phenylalanine, leucine, isoleucine, methionine, and valine) are, for example, encoded by codons with a T in the second position. This combined with the high GC-content at third positions results in higher conservation at second and third positions than at first positions. In addition, the high proportion of alanine (28.6%), leucine (10.2%), and valine (11.8%) rather than phenylalanine (2.4%), isoleucine (3.6%), methionine (4.3%, excluding starting methionine), and tyrosine (0.3%) in the N-terminal extensions may reflect the underlying GC-richness, because alanine, leucine, and valine are encoded by GC-rich codons. It is unclear if these amino acids are evolutionarily selected for specifically, or if they are selected for the combination of their hydrophobic character and the GC-content of their codons. In contrast, the conserved core of the protein shared an average 88.4% nucleotide and 98% amino acid identity, respectively, which corresponds to the more typical pattern of third position variation resulting from selection. The 3'-UTRs of the atp H genes show substantial variation and were difficult to align. There are several groups of more closely related 3'-UTRs that may be the result of recently duplicated genes. In all, there are five alignable groups of UTRs (and one singleton) that may have originated from more closely related genes.
H2A.X proteins are closely related to the canonical H2A except for the C-terminus which contains the distinctive SQ(E/D)Φ motif (where Φ is a hydrophobic residue). H2A.X plays an important role in the recognition and repair of double-strand DNA breaks by non-homologous end-joining. At the site of double-strand breaks, the serine of the SQ(E/D)Φ motif is rapidly phosphorylated . The phosphorylation signal spreads a large distance down the chromosome around the breaks, signalling the recruitment of the DNA repair proteins Rad50, Rad51, and BRCA1 [26, 27]. We also identified histone H2A and H2A.X from the haptophyte Emilania huxleyi through high-throughput EST sequencing of this alga (J. D. H. and D. B. unpublished data). Phylogenetic analysis places A. tamarense H2A.X in its predicted position (with moderate bootstrap support) as sister to the E. huxleyi homolog within a group of chromalveolates that includes haptophytes, stramenopiles, and apicomplexans (Figure 3B). H2A.X from A. tamarense, E. huxleyi, and Toxoplasma gondii do not, however, form a monophyletic group suggesting multiple origins within chromalveolates. This is not surprising because H2A.X appears to have arisen independently many times during eukaryotic evolution [28, 29].
We tested the strength of these results using the Approximately Unbiased (AU-) statistical test. A 16-taxon ML backbone tree was generated without A. tamarense H2A.X and then we made a set of 17-taxon trees by placing this sequence on every possible branch (29 in total). This analysis provides good support for the position shown in Figure 3B (P = 0.827), however, many alternative positions were included in the 5% confidence set of trees (i.e., as sister to Thalassiosira pseudonana, Phaeodactylum tricornutum, Homo sapiens, or Drosophila melanogaster, and at the base of or sister to either of the land plants). The lack of robust phylogenetic signal for the divergence point of A. tamarense H2A.X likely reflects the short length and high conservation of these histones.
One group of proteins (referred to here as bacterial HLPs) is more closely related to dinoflagellate HLPs and includes Bph2 from Bordetella pertussis. Bph2 has a role in virulence gene expression and shares limited (likely convergent) sequence similarity with histone H1 . The dinoflagellate and bacterial HLPs also contain an N-terminal extension in comparison to HU proteins. This extension is rich in alanine, lysine, and proline, which is reminiscent of the C-terminus of histone H1. The dinoflagellate HLP N-termini are however, also enriched in methionines. Compared to the bacterial HLPs, this N-terminal region is generally shorter in the dinoflagellates, although there is variability among species in both groups (Figure 4A). In contrast to the primary sequence, secondary structure predictions for these three classes of proteins are remarkably similar. The crystal structure of E. coli HU has been determined (PBD ID: 1MUL) and the known secondary structure was compared to the predicted secondary structures of B. pertussis Bph2 and an A. tamarense HLP (Figure 4B). Both types of HLPs are predicted to have two α-helices that are identical in size and spacing to the N-terminal helices in E. coli HU, followed by two β-strands that are similar in size and position. We conclude from this analysis that dinoflagellate HLPs show structural similarity to HU proteins from bacteria, however, it is unclear if these proteins are functional homologs. It is also apparent that dinoflagellate HLPs are distantly related to bacterial HU proteins. The dinoflagellates have one putatively homologous functional residue corresponding to Lys3 (arrow in Figure 4A) of HU proteins, which interacts with the DNA and is involved in wrapping the DNA around the protein . A proline residue (asterisk in Figure 4A), which intercalates into the DNA during HU binding, appears to be conserved among HU proteins and bacterial HLPs, but is not present in the dinoflagellate HLPs . However, there are several prolines conserved among dinoflagellates in the C-terminal end of the protein. The C-terminal arms of HU are critical for the interactions that bend the DNA. Given the low level of sequence similarity and the absence of a homologous proline in this region, it is unclear if the dinoflagellate HLPs are able to interact with DNA in the same manner as HU proteins.
In our phylogenetic analyses, the proteobacterial HLPs form a well-supported monophyletic group with the dinoflagellates (Figure 4C) suggesting an origin of the dinoflagellate gene through lateral transfer (followed by several rounds of gene duplication). It is also noteworthy that dinoflagellates are the only eukaryotes to possess a proteobacterial form II rubisco . The position of the dinoflagellate HLPs is distinct from that of other eukaryotic HU proteins. These latter proteins group with the canonical HU proteins from bacteria and have likely originated through intracellular transfer from the mitochondrial or plastid endosymbiont. Statistical support for the monophyly of the dinoflagellate and proteobacterial HLPs was tested using the AU-test. In these analyses (details not shown), a sister group relationship between the HLPs was the most highly favored topology (P = 0.659) and all other positions for the dinoflagellates (except branching inside the bacterial HLP clade) had significantly lower probabilities (P < 0.05).
Dinoflagellates no longer use the nucleosome as the major DNA packaging protein complex. Chromosomal DNA strands in these taxa are smooth, in contrast to the "beads on a string" conformation in other eukaryotes . The chromosome structure is also unique in that they are uniform in size and morphology, remain condensed throughout the cell cycle, and are birefringent, indicating a liquid crystal State [5, 14, 37]. Transcription is thought to take place in DNA loops that protrude from the condensed chromosome . It appears that dinoflagellates have acquired DNA binding proteins from a proteobacterium possibly to facilitate the compaction of their immense genomes. HU and related proteins from bacteria induce sharp bends in DNA strands and some models suggest that HLPs are responsible for creating DNA bends at the periphery of the chromosomes [39, 40]. Immunolocalization shows dinoflagellate HLP to be associated with the periphery of chromosomes .
However, the HLP concentration is very low relative to DNA content. Dinoflagellate chromosomes have a 1:10 protein:DNA ratio (in contrast to the 1:1 ratio in other eukaryotes). The HLP concentration may therefore be too low to function in DNA compaction, rather they may act as transcriptional regulators [41, 42].
In summary, our discovery of H2A.X in A. tamarense shows that, whereas dinoflagellates appear to no longer use nucleosomes for DNA packaging, at least one histone has been retained and is weakly expressed. Interestingly, in a recent paper, histone H3 appears in a table of redox-regulated genes in the dinoflagellate Pyrocystis lunula . Until now, only these two histones have been identified in dinoflagellates and it is unclear if all dinoflagellates possess either of these two genes, or others that have not yet been found. If other histones are present (which is likely), they may however also be expressed at a low level (as is the case for A. tamarense H2A.X). This would render difficult their identification using the EST-based approach unless comprehensive sequencing of normalized and subtracted cDNA libraries is used. In metazoans, replication-dependant canonical histone (H2A, H2B, H3 and H4) mRNAs are not polyadenylated, raising the possibility that they have been excluded from this poly-A primed cDNA library . However, these histone mRNAs are polyadenylated in plants, apicomplexans, and ciliates, suggesting that if they are present, they may be in dinoflagellates as well [44–46]. Given the important role that H2A.X plays in DNA repair, we speculate that this gene may have been maintained specifically to perform this function. Consistent with this idea, the core region of A. tamarense H2A.X is highly conserved, indicating that it may still be able to interact with DNA in a manner similar to H2A in other species.
This collection of ESTs is the most extensive genomic resource for a toxic dinoflagellate species to date and provides a useful glimpse into its nuclear genome. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for understanding the method of toxin production in these species. We have likely not yet exhausted the gene discovery potential using the EST approach (i.e., note the high discovery rate of our normalized library). In the future, we will use serial subtraction of cDNA libraries to improve/maintain the novelty rate of our cDNA library and create cDNA libraries from A. tamarense under various growth conditions and life history stages to get generate a more complete catalog of the gene content of this important organism.
Total RNA from a culture of the toxic dinoflagellate Alexandrium tamarense (CCMP 1598) was extracted using Trizol (GibcoBRL) and mRNA purified using the Oligotex mRNA Midi Kit (Qiagen). This culture strain was produced by isolating a single cyst, a diploid resting stage that produces haploid vegetative cells by meiosis. However, it is unknown if a single or multiple vegetative cells were isolated after antibiotic treatment to make the culture axenic. If a single vegetative cell was isolated, the culture would be clonal. The culture was grown at 20°C on a 13:11 hour light:dark cycle (80 μEinsteins of light) in L1 media. Start and normalized directionally cloned (3' NotI-5'EcoR1) cDNA libraries were constructed as previously described . ESTs were sequenced from the 3' end to maximize clustering accuracy using the 3' untranslated region (UTR). All ESTs were processed as previously described . Identification of a total of a non-redundant "unigene" set of 6,723 unique clusters from 11,171 sequences was accomplished using using UIcluster v3.0.5 .
Data was gathered from GenBank (including the recently released Karenia brevis EST data, Frances Van Dolah, unpublished data) using blast searches. Maximum likelihood (ML) analyses were done with PHYLIP using the JTT model of protein evolution with gamma corrected rates (JTT + Γ) with 5 random additions . ML bootstrap analyses (100 replications) were done as described with either 5 (histone H2A) or 1 (HLPs) rounds of random taxon addition. Bayesian analyses were done using MrBayes V3.0b4 . Four chains (1 cold, 3 heated) were run for 1 million generations, sampled every 1000 generations, using the JTT + Γ model. The first 500 trees were discarded as burn-in. Neighbor joining (NJ) bootstrap (500 replicates) analyses were done with PHYLIP using the JTT + Γ model. Minimum evolution (ME) analyses done with PHYLIP using the JTT + Γ model with global rearrangements and 10 rounds of random taxon addition (1 round was used in the bootstrap analysis).
The Approximately Unbiased test was done using CONSEL . ML trees without the groups of interest were generated as described above. A pool of trees was then generated by adding the group of interest (A. tamarense H2A.X or dinoflagellate HLPs) to every possible branch in the ML tree. For the HLP analyses, a reduced taxon set was used that included Bordetella, Ralstonia, Xylella, Pasteurella, Nostoc, Synechocystis, Agrobacterium, Rikettsia, Escherichia, Guillardia, Cyanidioschyzon, Sorghum, Toxoplasma, Xenopus, and Homo. A. tamarense 1 and C. cohnii HCC2 were added as a monophyletic group to every branch in this reduced ML tree. Secondary structure prediction was done using Jpred [53, 54]. The consensus secondary structures were used in the comparison to the know structure of E. coli HU (PDB ID: 1MUL).
JDH was supported by an Institutional NRSA (T 32 GM98629) from the National Institutes of Health. This work was supported by grants from the National Science Foundation awarded to DB (DEB 01-07754, MCB 02-36631). TES was partially supported by a Career Development Award from Research to Prevent Blindness.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.