De novo assembly and characterization of the transcriptome of the toxic dinoflagellate Karenia brevis
© Ryan et al.; licensee BioMed Central Ltd. 2014
Received: 6 February 2014
Accepted: 24 September 2014
Published: 11 October 2014
Karenia brevis is a harmful algal species that blooms in the Gulf of Mexico and produces brevetoxins that cause neurotoxic shellfish poisoning. Elevated brevetoxin levels in K. brevis cells have been measured during laboratory hypo-osmotic stress treatments. To investigate mechanisms underlying K. brevis osmoacclimation and osmoregulation and establish a valuable resource for gene discovery, we assembled reference transcriptomes for three clones: Wilson-CCFWC268, SP3, and SP1 (a low-toxin producing variant). K. brevis transcriptomes were annotated with gene ontology terms and searched for putative transmembrane proteins that may elucidate cellular responses to hypo-osmotic stress. An analysis of single nucleotide polymorphisms among clones was used to characterize genetic divergence.
K. brevis reference transcriptomes were assembled with 58.5 (Wilson), 78.0 (SP1), and 51.4 million (SP3) paired reads. Transcriptomes contained 86,580 (Wilson), 93,668 (SP1), and 84,309 (SP3) predicted transcripts. Approximately 40% of the transcripts were homologous to proteins in the BLAST nr database with an E value ≤ 1.00E-6. Greater than 80% of the highly conserved CEGMA core eukaryotic genes were identified in each transcriptome, which supports assembly completeness. Seven putative voltage-gated Na+ or Ca2+ channels, two aquaporin-like proteins, and twelve putative VATPase subunits were discovered in all clones using multiple bioinformatics approaches. Furthermore, 45% (Wilson) and 43% (SP1 and SP3) of the K. brevis putative peptides > 100 amino acids long produced significant hits to a sequence in the NCBI nr protein database. Of these, 77% (Wilson and SP1) and 73% (SP3) were successfully assigned gene ontology functional terms. The predicted single nucleotide polymorphism (SNP) frequencies between clones were 0.0028 (Wilson to SP1), 0.0030 (Wilson to SP3), and 0.0028 (SP1 to SP3).
The K. brevis transcriptomes assembled here provide a foundational resource for gene discovery and future RNA-seq experiments. The identification of ion channels, VATPases, and aquaporins in all three transcriptomes indicates that K. brevis regulates cellular ion and water concentrations via transmembrane proteins. Additionally, > 40,000 unannotated loci may include potentially novel K. brevis genes. Ultimately, the SNPs identified among the three ecologically diverse clones with different toxin profiles may help to elucidate variations in K. brevis brevetoxin production.
The dinoflagellate Karenia brevis blooms almost annually in the Gulf of Mexico and is the region’s major harmful algal species . K. brevis produces two ladder-frame polyether brevetoxin compounds, PbTx-1 and PbTx-2, which bind to receptor site 5 of voltage-gated Na+ channels [2, 3]. Both parent compounds and their derivatives inhibit channel deactivation . Because they affect voltage-gated Na+ channel activity, brevetoxins are responsible for neurotoxic shellfish poisoning (NSP), may cause marine animal mortalities during blooms, and have been implicated in fish kills . Additionally, K. brevis cells that are damaged by the breaking surf have been shown to release enough aerosolized brevetoxins to cause eye irritation and respiratory distress in humans near the shore .
Despite the health risks associated with brevetoxins, their biological function within K. brevis is currently unknown. Notably, recent evidence suggests that PbTx-1 and PbTx-2 production increases in response to hypo-osmotic stress. Within 24 hours after a rapid media salinity reduction (35 to 27), the K. brevis Wilson clone (CCFWC268) produced ~25% more total brevetoxin per cell. It was therefore hypothesized that toxin production facilitates the response of K. brevis to salinity variations in the natural environment . As populations move from offshore oceanic to coastal waters, cells experience a range of environmental salinities. For example, cells of this oceanic species have even been observed in the hyposaline Mississippi River Delta .
To better characterize brevetoxin production and osmoacclimation in K. brevis, the reference transcriptomes of three K. brevis clones, Wilson-CCFWC268, SP1, and SP3 were assembled. These clones represent diverse geographic origins, duration of years in culture, and toxin profiles. Wilson was initially collected off the coast of Florida in 1953. In contrast, the SP1 and SP3 clones originate from the Texas coast in 1999. While SP1 produces low, often undetectable amounts of PbTx-1 and PbTx-2, the total brevetoxin in SP3 and Wilson consistently exceeds 10 pg cell-1 [7, 9]. Thus, differences among the three transcriptomes may improve our understanding of brevetoxin production.
Brevetoxins are ladder-frame polyethers that belong to the polyketide family. Polyketide synthase (PKS) genes have been isolated from Wilson cultures . In 2008, Monroe et al. identified four novel K. brevis PKS mRNA sequences in K. brevis clones Wilson, C2, NOAA-1, and SP2 that were not present in other dinoflagellates, including closely-related Karenia mikimotoi and Karlodinium veneficum. Because these four PKS sequences are unique to K. brevis and are structurally novel, it was hypothesized that they participate in the brevetoxin biosynthetic pathway . As part of this project, we searched the SP1, SP3, and Wilson transcriptomes for novel K. brevis PKS genes to determine if the known PKS genes were transcribed in all three clones.
Because brevetoxins increase voltage-gated Na+ channel activity, determining whether K. brevis expresses these channel proteins may elucidate the physiological function of this toxin. Previously constructed K. brevis EST libraries contained ~12,000 unique genes [12, 13], but no complete Na+ channel protein-coding region was identified in this gene set. Further, channel-based Ca2+, K+, and Na+ transport across cell membranes is one known method that plants and algae use to maintain homeostasis . The presence of ion channel sequences in K. brevis transcriptomes would suggest that this dinoflagellate is capable of osmoregulation through selective transmembrane ion transport.
Although aquaporins and VATPases have not been implicated in brevetoxin binding, they might affect osmoacclimation efficiency in K. brevis. Aquaporins were originally discovered in human red blood cells  but have since been found in taxa belonging to the bacterial, archaeal, and eukaryotic domains . These bidirectional transport proteins belong to the major intrinsic protein (MIP) family and move water and/or glycerol molecules across lipid membranes more quickly than diffusion [17, 18]. Similarly, VATPases generate pH gradients that trigger secondary ion transport . They have been identified in diverse eukaryotes and may participate in osmoregulation. For example, in Arabidopsis thaliana, Ca2+, Na+, and K+ starvation induced transcript-level downregulation of VATPase family genes , and inhibition of plasma H+‒ATPases in green alga Dunaliella salina prevented cell volume recovery after hyper-osmotic stress .
The haploid K. brevis genome is estimated to be 1 × 1011 base pairs (bp) [22–26] and has not been sequenced. This exceedingly large genome size highlights the crucial need for a reference transcriptome in order to initiate serious genomic analyses of this species. The Wilson, SP1, and SP3 K. brevis transcriptomes are among the first dinoflagellate transcriptomes to be assembled. Because no reference genome was available, a goal of this study was to evaluate different techniques to determine the de novo assembly method best suited for our data. Among eukaryotes, dinoflagellates are unique in a number of ways. A 22-nucleotide spliced leader sequence (SLS) has been identified in nuclear mRNA from all dinoflagellate species, including K. brevis [13, 27]. Though the dinoflagellate SLS is conserved within the dinoflagellate group, it is not homologous to SLSs used by other eukaryotic phyla . Additionally, dinoflagellate chromosomes are permanently condensed and nuclear genomes often contain a high quantity of repetitive, non-coding DNA [23, 28, 29]. These characteristics make dinoflagellates a biologically interesting target for transcriptome characterization, analysis of the gene complement, and gene expression studies. Here we report the results of a search for K. brevis PKSs, voltage-gated ion channels, aquaporins, and VATPases and describe transcriptome sequence differences in these genes among ecologically diverse clones.
A comparison of K. brevis transcriptome read number, locus number, apparently clone-unique locus number, N50 length, and mean locus length values
# Unique loci
Mean locus length (bp)
Short read alignment results
# Transcripts without Wilson alignments
# Transcripts without SP1 alignments
# Transcripts without SP3 alignments
CEGMA and TRAPID analyses
Core eukaryotic genes (CEGs) represent an unbiased set of proteins that are expressed and conserved within diverse eukaryotes , and therefore CEG identification helps gauge transcriptome assembly completeness. With CEGMA, we found 81% (Wilson), 84% (SP1), and 82% (SP3) (Additional file 1) of the complete highly conserved core genes described by Parra et al. . In comparison, 74% and 90% of the complete CEGs were identified in the dinoflagellate Karlodinium micrum CCMP2283 transcriptome and diatom Thalassiosira pseudonana genome, respectively. Thus, as a metric of transcriptome assembly success, the identification of complete CEGs in our transcriptomes indicates a high level of completeness in their coverage. The analysis also identified many conserved proteins that can be used to refine dinoflagellate phylogenetic relationships.
The “missing” CEGs that were not identified in any K. brevis reference transcriptome may represent genes that are not used by dinoflagellates. It is also possible that “missing” K. brevis CEG orthologs are not highly conserved by CEGMA criteria. To investigate these options, the SP1 transcriptome was used to search the CEG protein list using BLASTx. SP1 transcripts hit 235 of the 248 core CEGs with an E value < 1.00E-6. Therefore, homologs of 25 of the 38 “missing” genes were detected in the CEGMA dataset using BLASTx.
The ratio of full-length/quasi full-length to partial coding regions predicted by TRAPID  characterizes protein coding region completeness. The Wilson, SP1, and SP3 Oases reference transcriptomes were compared against three databases: the OrthoMCLDB 5.0 alveolate clade db, the PLAZA 2.5 green plants clade db, and the OrthoMCLDB 5.0 T. pseudonana CCMP1335 species db. The ratios of complete to partial coding regions identified in the reference transcriptomes were 7.3:1 (Wilson), 13.5:1 (SP1), and 12.5:1 (SP3). Over 90% of the K. brevis transcripts that significantly matched sequences in the TRAPID databases had ORFs that fell within two standard deviations of the mean ORF length in their gene families.
PKS, aquaporin, voltage-gated ion channel, and VATPase identification
The Wilson, SP1, and SP3 transcriptomes each included positive hits to the four novel K. brevis PKS sequences (gi # 148536478, 148536480,148536474, and 148536472), with nucleotide similarity values >99% over the aligned regions. No unique non-synonymous SNPs were identified in the SP1 PKS ORFs.
The same seven putative voltage-gated Na+ or Ca2+ channel genes were identified in all three K. brevis transcriptomes (SP1 transcriptome locus # 394, 12559, 19932, 26784, 30595, 36263, and 64946). Each sequence contained the voltage-sensing motif and four Pfam00520 domains with ~ six predicted transmembrane regions. During the BLASTx search against the nr database, the putative proteins very significantly (E value ≤ 1.00E-50) matched voltage-gated Na+ and Ca+2 channels from a range of organisms, including mammals.
The transcriptomes contained twelve different putative VATPase subunits with V-ATPase conserved domains. Two MIP genes were also identified in all the transcriptomes. The 267 and 405 aa putative proteins (SP1 transcriptome locus # 102528 and 12778, respectively) each contained six predicted transmembrane regions and conserved domains belonging to the MIP family. The aquaporin Asn-Pro-Ala (NPA) water selectivity filter motif was also identified, with conservation of the Asn in each occurrence. The K. brevis MIP sequences were compared to the nr protein database. Each returned over 100 hits with E values ≤ 1.00E-20. Of these, ~70% were aquaporins or predicted MIPs. The complete ORF of MIP #12778 was located in Wilson, SP1, and SP3, with almost 100% aa sequence conservation among clones, with just one aa variation in SP3, a Val to Met substitution at aa position 270. The ORF of MIP # 102529 was incomplete in the Wilson and SP3 transcriptomes.
SNP detection results
Mean SNP Rate
# Transcripts with SNPs
Wilson vs SP1
Wilson vs SP3
SP1 vs SP3
Wilson vs SP1
Wilson vs SP3
SP1 vs SP3
SNPs in putative voltage-gated Na + or Ca 2+ channel sequences
# SNPs in transcript
Mean SNP rate
Non-synonymous/synonymous SNPs in ORF
Wilson vs SP1
Wilson vs SP3
SP1 vs SP3
Our K. brevis transcriptomes are among the first dinoflagellate transcriptomes to be assembled, so it is not possible to make comparisons with closely related species. Lacking a reference genome sequence, several metrics, including transcript length, TRAPID-predicted full-length genes, and the identification of CEGs, were employed to gauge the completeness of the transcriptome assembly. Each of these criteria suggests that the transcriptomes are highly complete. First, the estimated mean protein-coding gene length of 19 model eukaryotes with sequenced genomes, including Arabidopsis thaliana, Caenorhabditis elegans, and red alga Cyanidioschyzon merolae, is 1,346 bp . This value is similar to the Wilson, SP1, and SP3 mean reference transcript lengths of 1340, 1376, and 1941 bp, respectively; therefore, our de novo assemblies of the K. brevis transcriptome appear to yield transcripts of a reasonable length.
Next, a 66% CEG identification rate was reported when Parra et al. analyzed the Toxoplasma gondii genome with CEGMA . Apicomplexan T. gondii and dinoflagellate K. brevis both belong to the alveolate group and are close phylogenetic relatives, based on rRNA analyses . Identification of >80% of the highly conserved CEGs from each K. brevis reference transcriptome provides additional support for the completeness of the assembly, since this value exceeds the expected percent based on T. gondii results. TRAPID results also indicated that more transcripts contained complete or mostly complete ORFs than partial ORFs. Of the transcriptomes similar to TRAPID alveolate, gene families, over 92% were within two standard deviations of the expected length.
Only ~40% of the genes in the K. brevis transcriptomes are homologous to sequences in the nr protein database with a hit significance ≤ 1.00E-6. Because little is known about dinoflagellate genomes, the high percentage of unknown loci was expected. The Blast2GO annotation step did not annotate enough hits to allow conclusions about the total gene ontology distribution of the transcriptomes. The low annotation rate is a result of low (<50%) similarity scores between K. brevis sequences and annotated proteins. This may be the result of the phylogenetic uniqueness of dinoflagellates combined with limited phylogenetic representation in the Blast2GO databases.
The four novel K. brevis PKS sequences  were found and expressed in all three clones. No unique non-synonymous SNPs were identified in the SP1 PKS ORFs. It is therefore possible that SP1 expresses the genes involved in brevetoxin production, though cellular PbTx-1 and PbTx-2 are often undetectable in this clone. This result is consistent with prior work investigating transcriptional and post-transcriptional regulation in K. brevis. Microarray studies have observed a high percent of expressed genes related to RNA post-transcriptional processing and protein processing in K. brevis, thus suggesting that this dinoflagellate species is highly reliant on post-transcriptional regulation .
Some phenotypic variance between clones may result from SNPs that alter gene function. SNPs affecting 25123 (Wilson to SP1), 28427 (Wilson to SP3), or 22794 (SP1 to SP3) expressed genes were identified in laboratory-cultured K. brevis clones. Based on overall nucleotide divergence rates between Wilson and SP1, Wilson and SP3, and SP1 and SP3 (Table 3), SP1 and SP3 were the most similar clones. This may be the result of time in culture. SP1 and SP3 were isolated in 1999, while Wilson-CCFWC268 has been in culture since 1953 .
Typically, eukaryotic algae respond to osmotic stress by differential metabolite production rates and/or the transmembrane flux of ions and water . The identification of putative MIPs, VATPases and voltage-gated Na+ or Ca2+ channels in K. brevis supports the hypothesis that cells osmoregulate with transmembrane channels. In K. brevis, aquaporins may facilitate quick responses to changes in the osmotic pressure gradient. Further, putative voltage-gated Na+ or Ca2+ channels with voltage-sensing motifs may facilitate cation transport across the cell membrane in response to ion gradients. The high interspecies similarity of these protein sequences indicates functional conservation among K. brevis clones that show varying brevetoxin profiles. Future experimental work will need to confirm that aquaporins, VATPases, or ion channels are involved with osmoacclimation in K. brevis. Potential experiments may measure cell volume post hypo-osmotic stress with and without specific protein blockers.
Our discovery of putative ion channel, aquaporin, and VATPase sequences supports the hypothesis that K. brevis cells use transmembrane proteins during osmoregulation and osmoacclimation. In the future, clone-to-clone and treatment-to-treatment comparisons of the expression of these and other novel genes using RNA-seq  will elucidate K. brevis responses to osmotic stress. The transcriptomes assembled during this study also provide a foundational reference for future differential expression and protein discovery work. Thousands of K. brevis transcripts have been assigned GO functions (Additional files 2, 3, 4, 5, 6 and 7), and over 40,000 K. brevis loci containing unknown hypothetical protein-coding regions > 100 aa long (~11 kDa) were assembled. Even if just a fraction of these transcribed loci encode functional proteins, our datasets identify a vast number of novel genes and gene-products. Future analyses of these genes will yield insights into the unique biology of K. brevis.
Cell culturing and RNA sequencing
K. brevis clones Wilson, SP1, and SP3 were maintained in L1 medium  at salinity 35. The medium was prepared with filtered (0.2 μm pore) and autoclave-sterilized sea water from the Flower Garden Banks region, Gulf of Mexico. For each clone, triplicate 1-L sterile glass bottles were inoculated with cells from laboratory cultures and maintained on a 12:12 hour light:dark cycle at 25°C. Cell counts were performed by light microscopy every other day to monitor growth rates.
During the late exponential growth phase, 500 ml were concentrated by centrifugation (800 × g, 10 min) and RNA was immediately extracted from the pellets in 40 μL of nuclease-free water with the Qiagen RNEasy Mini Kit (Qiagen Inc., Valencia, CA), in accordance with the manufacturer’s protocol. After final elution of RNA in 40 μL of nuclease-free water, samples were stored at -80°C until library preparation. The RNA concentration and purity of each extraction was estimated by measuring the absorption spectra of 2 μL sample aliquots with a NanoDrop Spectrophotometer.
The remaining culture in each bottle was diluted from a salinity of 35 to a salinity of 27 with nutrient-enriched Milli-Q water to simulate hypo-osmotic stress. Stressed cultures were incubated for one hour before RNA was extracted, as described above. RNA was shipped on dry ice to the National Center for Genome Resources Sequencing Lab (Santa Fe NM, USA) for paired-end Illumina sequencing (Illumina, San Diego CA, USA). Libraries were prepared with the TruSeq RNA Sample Preparation Kit (Illumina) using 2 μg RNA. Paired-end 50 bp reads were sequenced with the Illumina Hi-Seq 2000 platform.
Reference transcriptome assembly
Paired-end reads are available at the NCBI SRA repository. Reads were trimmed for quality and filtered for length with CLC Genomics Workbench 5.5.1 (CLC Bio, Aarhus, Denmark); the minimum read length permitted was 45 bp. Reads were trimmed based on Phred quality scores at the probability threshold of p = 0.05.
Reference transcriptomes for each clone were assembled with reads pooled from both control and salinity stress treatments. Pooling the reads allowed the assembly of genes that may only be expressed in one of the two treatments. Assembly was completed with Velvet-Oases, which uses a de Bruijn graph algorithm to build transcripts de novo [30, 31]. Coverage cutoff values were chosen automatically, and the edge fraction cutoff was increased from the default 10% to 50%. For each clone, single k-mer assemblies (k-mer lengths 21, 25, 29, 33, 37, and 41 bp) were merged into a non-redundant consensus transcriptome assembly. A 250 bp minimum transcript length threshold was enforced. For comparison purposes, an SP1 reference transcriptome was also assembled with the ABySS de novo paired-end assembler . Single k-mer assemblies (odd lengths, 25 to 45 bp) were merged into a final transcriptome with “bubble popping” enabled.
Oases may output several transcript isoforms belonging to the same predicted locus. When multiple isoforms were present, we removed all but one representative sequence based on the following criteria. Isoforms were searched using BLAST  against the other K. brevis transcriptomes, with the E value significance threshold 1.00E-6. The isoform that hit another sequence with the longest alignment length and highest significant bit score was retained. In the event that a locus containing multiple isoforms did not return a significant hit, the longest transcript was chosen to represent its locus. This technique retains the isoform that is most abundant across all clones, when multiple isoforms were present.
The mean locus (unigene) length and N50 value were calculated using a transcript length list. To identify putative complete genes, the transcriptomes were analyzed with TRAPID. Transcripts were assigned gene families by a comparison with the TRAPID alveolate clade database. ORFs within two standard deviations of the mean gene family length were considered “full-length”. For each K. brevis clone, the transcriptome assembly method that produced the greatest N50 length and the most full-length genes was considered optimal and used during subsequent analyses.
Identification of core eukaryotic proteins
To assess transcriptome completeness, loci were analyzed with the Core Eukaryotic Genes Mapping Approach (CEGMA) pipeline. CEGMA was developed to identify a subset of 248 highly conserved core eukaryotic genes (CEGs) in eukaryotic genomes. The CEGs were derived from six diverse model organisms: Homo sapiens, Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Saccharomyces cerevisiae and Schizosaccharomyces pombe . For comparison purposes, the T. pseudonana genome was downloaded from the Joint Genome Institute and analyzed with CEGMA. T. pseudonana is a eukaryotic oceanic alga, for which a complete genome is available. Additionally, the K. micrum CCMP2283 transcriptome was downloaded from the CAMERA Data Distribution Center and analyzed with CEGMA. Karlodinium dinoflagellates are close phylogenetic relatives of K. brevis, based on rRNA analyses .
Assessing gene completeness with TRAPID
Full-length, quasi full-length, and partial protein coding regions were predicted in the Wilson, SP1, and SP3 Oases reference transcriptomes and the SP1 ABySS merged assembly. All the transcriptomes were compared against sequences in the OrthoMCLDB 5.0 alveolate clade database, while the Oases reference transcriptomes were also compared against sequences in the PLAZA 2.5 green plants clade database and the OrthoMCLDB 5.0 T. pseudonana CCMP1335 species database. The similarity search considered hits that yielded an E value < 1.00E-5 to be significant, and transcripts were annotated according to best hit values. Transcripts that hit one or more sequences in the TRAPID databases were qualified as “full-length”, “quasi full-length”, or “partial” based on the ORF length. ORFs that were more than two deviations shorter than the average ORF length of their assigned gene family (excluding the 10% longest and shortest sequences within the family) are “partial”. ORFs that are longer than the mean minus two standard deviations are “full length” if they also contain a start and stop codon and “quasi full-length” if they lack a stop and/or start codon .
Predicting unique assemblies and SNP locations in the transcriptomes
To identify similarities and possible differences among clones, the transcriptomes were searched for transcripts that are present in just one or two out of the three clones. Unique assemblies may be caused by differences in transcript coverage or transcriptome assembly artifacts. First, each Velvet-Oases assembled K. brevis transcriptome was converted to a searchable BLAST nucleotide database. All transcriptomes were searched against each other with BLASTn. Only hits with E values ≤ 1.00E-50 were considered as significant matches. Next, Wilson, SP1, and SP3 paired-end reads were aligned to each transcriptome with CLC Genomics Workbench 6.5. During mapping, 85% similarity fraction and 90% length fraction cutoffs were enforced, as well as conservative mismatch (3), insertion (3), and deletion (3) costs.
Reads from Wilson and SP3 were mapped to the SP1 transcriptome and analyzed for SNPs with the CLC quality-based variant detection function. The following quality filtering criteria were enforced: neighborhood radius = 5, maximum gap and mismatch count = 2, minimum neighborhood quality = 15, and minimum central quality = 20. Variants also had to be present in both forward and reverse reads. Additionally, non-specific matches and broken pairs were ignored, and the program enforced a 20-fold minimum coverage threshold and 90% minimum variant frequency. The analysis was also run with less conservative but more inclusive 10-fold coverage threshold.
Whole-transcriptome annotation and targeted gene discovery
The longest putative ORF was identified in each transcript and translated to amino acids (aa) via the longorf.pl Bioperl script . Using BLASTp, peptides were compared to the NCBI non-redundant protein database, which was downloaded from the NCBI FTP site on September 20, 2013. Significant (E value ≤ 1.00E-6) hits were uploaded to Blast2GO and annotated with possible gene ontology (GO) terms using a 35% minimum similarity cutoff .
In order to identify novel K. brevis PKS mRNA sequences, BLASTn was used to search the Wilson, SP1, and SP3 transcriptomes for the four novel K. brevis mRNA sequences identified by Monroe and Van Dolah . Hits were aligned against each other with the Clustal Omega alignment program  to investigate sequence similarity among the clones and original PKS sequences.
To identify possible voltage-gated cation channels, VATPase subunits, and aquaporin transcripts, the transcriptomes were compared to annotated voltage-gated Na+ channel alpha subunit and aquaporin sequences using BLASTx.
During the targeted search for these particular genes, loci with significant (E value ≤ 1.00E-6) matches to one or more proteins from the databases were translated into ORFs. Any locus containing an ORF < 300 bp long was discarded.
Probable transmembrane domains within the complete ion channel and aquaporin ORFs were identified with the Center for Biological Sequence Analysis TMHMM Server, version 2.0, which uses a hidden Markov model approach to predict transmembrane, intracellular, and extracellular regions in proteins . Since cation channels and aquaporins each contain domains with six transmembrane regions, sequences containing at least six predicted transmembrane helices were analyzed with two additional tools. First, they were compared to the NCBI nr database with BLASTx to identify homologous sequences from a range of eukaryotes. Next, conserved protein domains from the NCBI Conserved Domain Database  were identified with CD-search . In particular, ion channel transcripts were expected to contain domains belonging to the Pfam ion transport protein family (Pfam00520), which includes Na+, K+, and Ca2+ channels. Aquaporin transcripts were expected to contain MIP family conserved domains. VATPases were searched for V-ATPase domains, including Walker motifs, which are conserved among VATPase A subunits .
Potential ion channel transcripts were also searched for the voltage-gated Na+ channel S4 voltage-sensing motif. This motif, a repeated triad containing one positively charged and two hydrophobic aa, is highly conserved and helps regulate conformational changes that occur during channel activation and inactivation .
Availability of supporting data
The short read data supporting the results of this article are available in the NCBI SRA repository as of March 2014 under BioProject PRJNA214017, accession IDs SRX363776, SRX363775, and SRX361898. Short read data is also available at the publically accessible camera repository (project IDs MMETSP0573, MMETSP0574, MMETSP0648, MMETSP0649, MMETSP0527 and MMETSP0528). The reference transcriptomes can be accessed through LabArchive (doi:10.6070/H44F1NPC, 10.6070/H40P0X0H, 10.6070/H4VX0DH6).
Additional supporting data are included as additional files within the article (Wilson_CC_GO.tab, Wilson_MF_GO.tab, SP1_CC_GO.tab, SP1_MF_GO.tab, SP3_CC_GO.tab, SP3_MF_GO.tab, Wilson_CEG.txt, SP1_CEG.txt, SP3_CEG.txt).
This research was supported by NSF-IOS 1155376 award to LC and, in part, by the Gordon and Betty Moore Foundation through Grant #2637 to the National Center for Genome Resources. Samples MMETSP0573, MMETSP0574, MMETSP0648, MMETSP0649, MMETSP0527 and MMETSP0528 were sequenced at the National Center for Genome Resources.
We thank D. Henrichs, A. Hawkins, S. W. Lockless, and R. Errera for their expertise and assistance during this project. We thank D. Henrichs for providing the SP1 transcriptome assembled with ABySS. We would like to acknowledge participants in the Molecular Ecology/Transcriptomics Journal Club at Texas A & M University for valuable discussions throughout this study.
- Steidinger KA, Vargo GA, Tester PA, Tomas CR: Bloom dynamics and physiology of Gymnodinium breve with emphasis on the Gulf of Mexico. NATO ASI Ser G: Ecol Sci. 1998, 41: 133-154.Google Scholar
- Baden DG: Brevetoxins: unique polyether dinoflagellate toxins. FASEB J. 1989, 3 (7): 1807-1817.PubMedGoogle Scholar
- Dechraoui M-Y, Naar J, Pauillac S, Legrand A-M: Ciguatoxins and brevetoxins, neurotoxic polyether compounds active on sodium channels. Toxicon. 1999, 37 (1): 125-143. 10.1016/S0041-0101(98)00169-X.PubMedView ArticleGoogle Scholar
- Huang J, Wu CH, Baden DG: Depolarizing action of a red-tide dinoflagellate brevetoxin on axonal membranes. J Pharmacol. 1984, 229 (2): 615-621.Google Scholar
- Landsberg JH: The effects of harmful algal blooms on aquatic organisms. Rev Fisher Sci. 2002, 10 (2): 113-390. 10.1080/20026491051695.View ArticleGoogle Scholar
- Backer LC, Fleming LE, Rowan A, Cheng Y-S, Benson J, Pierce RH, Zaias J, Bean J, Bossart GD, Johnson D: Recreational exposure to aerosolized brevetoxins during Florida red tide events. Harmful Algae. 2003, 2 (1): 19-28. 10.1016/S1568-9883(03)00005-2.View ArticleGoogle Scholar
- Errera R, Campbell L: Correction for Errera and Campbell, Osmotic stress triggers toxin production by the dinoflagellate Karenia brevis. Proc Natl Acad Sci U S A. 2012, 109 (43): 17723-17724.View ArticleGoogle Scholar
- Maier Brown AF, Dortch Q, Dolah FMV, Leighfield TA, Morrison W, Thessen AE, Steidinger K, Richardson B, Moncreiff CA, Pennock JR: Effect of salinity on the distribution, growth, and toxicity of Karenia spp. Harmful Algae. 2006, 5 (2): 199-212. 10.1016/j.hal.2005.07.004.View ArticleGoogle Scholar
- Errera RM, Bourdelais A, Drennan M, Dodd E, Henrichs D, Campbell L: Variation in brevetoxin and brevenal content among clonal cultures of Karenia brevis may influence bloom toxicity. Toxicon. 2010, 55 (2): 195-203.PubMedView ArticleGoogle Scholar
- Snyder R, Gibbs P, Palacios A, Abiy L, Dickey R, Lopez J, Rein K: Polyketide synthase genes from marine dinoflagellates. Mar Biotechnol. 2003, 5 (1): 1-12. 10.1007/s10126-002-0077-y.PubMedView ArticleGoogle Scholar
- Monroe EA, Van Dolah FM: The toxic dinoflagellate Karenia brevis encodes novel type I-like polyketide synthases containing discrete catalytic domains. Protist. 2008, 159 (3): 471-482. 10.1016/j.protis.2008.02.004.PubMedView ArticleGoogle Scholar
- Lidie KB, Ryan JC, Barbier M, Van Dolah FM: Gene expression in Florida red tide dinoflagellate Karenia brevis: analysis of an expressed sequence tag library and development of DNA microarray. Mar Biotechnol. 2005, 7 (5): 481-493. 10.1007/s10126-004-4110-6.PubMedView ArticleGoogle Scholar
- Lidie KB, Van Dolah FM: Spliced leader RNA‒mediated trans‒splicing in a dinoflagellate, Karenia brevis. J Eukaryot Microbiol. 2007, 54 (5): 427-435. 10.1111/j.1550-7408.2007.00282.x.PubMedView ArticleGoogle Scholar
- Glass AD: Regulation of ion transport. Annu Rev Plant Physiol Plant Mol Biol. 1983, 34 (1): 311-326. 10.1146/annurev.pp.34.060183.001523.View ArticleGoogle Scholar
- Agre P, Preston G, Smith B, Jung J, Raina S, Moon C, Guggino WB, Nielsen S: Aquaporin CHIP: the archetypal molecular water channel. Am J Physiol-Renal. 1993, 265 (4): F463-F476.Google Scholar
- Heymann JB, Engel A: Aquaporins: phylogeny, structure, and physiology of water channels. Physiology. 1999, 14 (5): 187-193.Google Scholar
- Agre P, King LS, Yasui M, Guggino WB, Ottersen OP, Fujiyoshi Y, Engel A, Nielsen S: Aquaporin water channels–from atomic structure to clinical medicine. J Physiol. 2002, 542 (1): 3-16. 10.1113/jphysiol.2002.020818.PubMed CentralPubMedView ArticleGoogle Scholar
- Borgnia M, Nielsen S, Engel A, Agre P: Cellular and molecular biology of the aquaporin water channels. Annu Rev Biochem. 1999, 68 (1): 425-458. 10.1146/annurev.biochem.68.1.425.PubMedView ArticleGoogle Scholar
- Nelson N, Perzov N, Cohen A, Hagai K, Padler V, Nelson H: The cellular biology of proton-motive force generation by V-ATPases. J Exp Biol. 2000, 203 (1): 89-95.PubMedGoogle Scholar
- Maathuis FJ, Filatov V, Herzyk P, Krijger CG, Axelsen BK, Chen S, Green BJ, Li Y, Madagan KL, Sánchez‒Fernández R: Transcriptome analysis of root transporters reveals participation of multiple gene families in the response to cation stress. Plant J. 2003, 35 (6): 675-692. 10.1046/j.1365-313X.2003.01839.x.PubMedView ArticleGoogle Scholar
- Oren-Shamir M, Pick U, Avron M: Involvement of the plasma membrane ATPase in the osmoregulatory mechanism of the alga Dunaliella salina. Plant Physiol. 1989, 89 (4): 1258-1263. 10.1104/pp.89.4.1258.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim YS, Martin DF: Effects of salinity on synthesis of DNA, acidic polysaccharide, and ichthyotoxin in Gymnodinium breve. Phytochemistry. 1974, 13 (3): 533-538. 10.1016/S0031-9422(00)91348-7.View ArticleGoogle Scholar
- Rizzo P, Jones M, Ray S: Isolation and properties of isolated nuclei from the Florida red tide dinoflagellate Gymnodinium breve (Davis). J Eukaryot Microbiol. 1982, 29 (2): 217-222.Google Scholar
- Sigee DC: Structural DNA and genetically active DNA in dinoflagellate chromosomes. Biosystems. 1984, 16 (3): 203-210.Google Scholar
- Kamykowski D, Milligan EJ, Reed RE: Biochemical relationships with the orientation of the autotrophic dinoflagellate Gymnodinium breve under nutrient replete conditions. Mar Ecol-Prog Ser. 1998, 167: 105-117.View ArticleGoogle Scholar
- Van Dolah FM, Lidie KB, Monroe EA, Bhattacharya D, Campbell L, Doucette GJ, Kamykowski D: The Florida red tide dinoflagellate Karenia brevis: new insights into cellular and molecular processes underlying bloom dynamics. Harmful Algae. 2009, 8 (4): 562-572. 10.1016/j.hal.2008.11.004.View ArticleGoogle Scholar
- Zhang H, Hou Y, Miranda L, Campbell DA, Sturm NR, Gaasterland T, Lin S: Spliced leader RNA trans-splicing in dinoflagellates. Proc Natl Acad Sci. 2007, 104: 4618-4623. 10.1073/pnas.0700258104.PubMed CentralPubMedView ArticleGoogle Scholar
- Lin S: Genomic understanding of dinoflagellates. Res Microbiol. 2011, 162 (6): 551-569. 10.1016/j.resmic.2011.04.006.PubMedView ArticleGoogle Scholar
- Rizzo PJ: Those amazing dinoflagellate chromosomes. Cell Res. 2003, 13 (4): 215-217. 10.1038/sj.cr.7290166.PubMedView ArticleGoogle Scholar
- Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012, 28 (8): 1086-1092. 10.1093/bioinformatics/bts094.PubMed CentralPubMedView ArticleGoogle Scholar
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18 (5): 821-829. 10.1101/gr.074492.107.PubMed CentralPubMedView ArticleGoogle Scholar
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol İ: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19 (6): 1117-1123. 10.1101/gr.089532.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Van Bel M, Proost S, Van Neste C, Deforce D, Van de Peer Y, Vandepoele K: TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. Genome Biol. 2013, 14 (12): R134-10.1186/gb-2013-14-12-r134.PubMed CentralPubMedView ArticleGoogle Scholar
- Parra G, Bradnam K, Korf I: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007, 23 (9): 1061-1067. 10.1093/bioinformatics/btm071.PubMedView ArticleGoogle Scholar
- Xu L, Chen H, Hu X, Zhang R, Zhang Z, Luo Z: Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms. Mol Biol Evol. 2006, 23 (6): 1107-1108. 10.1093/molbev/msk019.PubMedView ArticleGoogle Scholar
- Van de Peer Y, De Wachter R: Evolutionary relationships among the eukaryotic crown taxa taking into account site-to-site rate variation in 18S rRNA. J Mol Evol. 1997, 45: 619-630. 10.1007/PL00006266.PubMedView ArticleGoogle Scholar
- Wegmann K: Osmoregulation in eukaryotic algae. FEMS Microbiol Lett. 1986, 39 (1): 37-43.View ArticleGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.PubMedView ArticleGoogle Scholar
- Guillard R, Hargraves P: Stichochrysis immobilis is a diatom, not a chrysophyte. Phycologia. 1993, 32 (3): 234-236. 10.2216/i0031-8884-32-3-234.1.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Genome Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.View ArticleGoogle Scholar
- Bachvaroff TR, Gornik SG, Concepcion GT, Waller RF, Mendez GS, Lippmeier JC, Delwiche CF: Dinoflagellate phylogeny revisited: Using ribosomal proteins to resolve deep branching dinoflagellate clades. Mol Phylogenet Evol. 2014, 70: 314-322.PubMed CentralPubMedView ArticleGoogle Scholar
- Kortschak D: longorf.pl [perlscript]. Available at https://github.com/bioperl/bioperl-live/blob/master/examples/longorf.pl (Accessed 12 December 2013)
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011, 7: 539-PubMed CentralPubMedView ArticleGoogle Scholar
- Krogh A, Larsson B, Von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.PubMedView ArticleGoogle Scholar
- Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR: CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011, 39 (suppl 1): D225-D229.PubMed CentralPubMedView ArticleGoogle Scholar
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004, 32 (suppl 2): W327-W331.PubMed CentralPubMedView ArticleGoogle Scholar
- Forgac M: Structure and function of vacuolar class of ATP-driven proton pumps. Physiol Rev. 1989, 69 (3): 765-796.PubMedGoogle Scholar
- Marban E, Yamagishi T, Tomaselli GF: Structure and function of voltage-gated sodium channels. J Physiol. 1998, 508 (3): 647-657. 10.1111/j.1469-7793.1998.647bp.x.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.