Draft de novo transcriptome assembly and proteome characterization of the electric lobe of Tetronarce californica: a molecular tool for the study of cholinergic neurotransmission in the electric organ
- Maria Stavrianakou†1,
- Ricardo Perez1,
- Cheng Wu1,
- Matthew S. Sachs1,
- Rodolfo Aramayo†1Email authorView ORCID ID profile and
- Mark Harlow1Email author
© The Author(s) 2017
Received: 24 August 2016
Accepted: 21 June 2017
Published: 14 August 2017
The electric organ of Tetronarce californica (an electric ray formerly known as Torpedo californica) is a classic preparation for biochemical studies of cholinergic neurotransmission. To broaden the usefulness of this preparation, we have performed a transcriptome assembly of the presynaptic component of the electric organ (the electric lobe). We combined our assembled transcriptome with a previous transcriptome of the postsynaptic electric organ, to define a MetaProteome containing pre- and post-synaptic components of the electric organ.
Sequencing yielded 102 million paired-end 100 bp reads. De novo Trinity assembly was performed at Kmer 25 (default) and Kmers 27, 29, and 31. Trinity, generated around 103,000 transcripts, and 78,000 genes per assembly. Assemblies were evaluated based on the number of bases/transcripts assembled, RSEM-EVAL scores and informational content and completeness. We found that different assemblies scored differently according to the evaluation criteria used, and that while each individual assembly contained unique information, much of the assembly information was shared by all assemblies. To generate the presynaptic transcriptome (electric lobe), while capturing all information, assemblies were first clustered and then combined with postsynaptic transcripts (electric organ) downloaded from NCBI. The completness of the resulting clustered predicted MetaProteome was rigorously evaluated by comparing its information against the predicted proteomes from Homo sapiens, Callorhinchus milli, and the Transporter Classification Database (TCDB).
In summary, we obtained a MetaProteome containing 92%, 88.5%, and 66% of the expected set of ultra-conserved sequences (i.e., BUSCOs), expected to be found for Eukaryotes, Metazoa, and Vertebrata, respectively. We cross-annotated the conserved set of proteins shared between the T. californica MetaProteome and the proteomes of H. sapiens and C. milli, using the H. sapiens genome as a reference. This information was used to predict the position in human pathways of the conserved members of the T. californica MetaProteome. We found proteins not detected before in T. californica, corresponding to processes involved in synaptic vesicle biology. Finally, we identified 42 transporter proteins in TCDB that were detected by the T. californica MetaProteome (electric fish) and not selected by a control proteome consisting of the combined proteomes of 12 widely diverse non-electric fishes by Reverse-Blast-Hit Blast. Combined, the information provided here is not only a unique tool for the study of cholinergic neurotransmission, but it is also a starting point for understanding the evolution of early vertebrates.
Molecular genomics tools have been rather limited for the rays. To date, only the transcriptome of the postsynaptic electric organ of T. californica has been studied [4, 5], and no organismal genomic sequences are yet available for these animals. In the ray, the cholinergic neurons that innervate the electric organ reside in a specialized pair of lobes within the central nervous system, termed the electric lobes (Fig. 1). The lack of a transcriptome from the presynaptic cholinergic neurons, located in the electric lobe, hinders our ability to study the presynaptic components of the synapse, and more generally, the lack of genomic and transcriptome information hinders the study of the evolution and ecology of these early vertebrates. To address this deficit, we have isolated mRNAs from the electric lobe of T. californica, and applied a Next Generation Sequencing (NGS) approach to provide the first transcriptome of the cholinergic presynaptic neurons that innervate the peripheral electric organs.
Results and discussion
Transcriptome assembly evaluation
Evaluating a de novo transcriptome assembly is a challenging problem. Before accepting either one of our assemblies as the best one, and use it to predict a proteome, we first tested if any one of them contained all the information present in the other assemblies. We did this using the following criteria: 1) Number of bases assembled, 2) Number of transcripts assembled, 3) RSEM-EVAL scores, 4) Informational content, and 5) Informational completeness (BUSCO analysis).
Trinity + CD-HIT-EST
Next, we used Detonate , a package whose algorithm is designed for assessing true assembly. It does this by a reference-free evaluation method based on a novel probabilistic model that depends only on an assembly and the RNA-seq reads used for its construction. Using RSEM-EVAL, a component of the Detonate package, we scored our different assemblies (Table 1). We found that Assembly04 had a higher RSEM-EVAL score than any of the other assemblies. In fact Assembly01 rank was outperformed by Assemblies 02, 03, and 04. Clearly, neither the number of assembled bases, number of transcripts, nor number of genes correlated with RSEM-EVAL scores. Examination of the Detonate RSEM-EVAL results revealed both an inverse correlation between the number of contigs with no reads aligned to, and a direct correlation between the RSEM-EVAL scores and the number of alignable reads (Additional file 3: Table S01). Therefore, according to this criterion, Assembly04 is the best assembly.
In order to simplify these assemblies, we reduced sequence redundancy by applying the clustering algorithm optimized by CD-HIT [2, 9–13]. CD-HIT-EST has the potential of merging partially assembled transcripts (i.e., transfragments) into the longest assembled related sequence. We used stringent parameters (i.e., 100% sequence identity). We observed a reduction in the complexity of our assemblies (Table 1 and Additional file 3: Table S01). Comparing the Trinity + CD-HIT-EST assemblies to the original initial Trinity assemblies, we observed that the number of assembled bases in the Trinity + CD-HIT-EST assemblies was approximately 92% of the number of assembled bases in their corresponding original Trinity assemblies. Similarly, again we observed that Trinity + CD-HIT-EST assemblies contained approximately between 90% to 91.5% of the number of transcripts and genes when compared to their corresponding non-clustered original Trinity assemblies. The observed reduction was proportional for all four assemblies. Again, Trinity + CD-HIT-EST Assembly01 outperformed all other assemblies on the number of assembled bases, transcripts and genes. Importantly, as observed for the unclustered Trinity assemblies, the best clustered Trinity + CD-HIT-EST assembly, as determined by Detonate, was Assembly04 (Table 1). The clustered or ’flattened’ Assembly01 Detonate’s rank was, again, outperformed by clustered Assemblies 02, 03, and 04, even after applying the CD-HIT-EST algorithm, which resulted in a reduction of the assemblies’ complexity.
Looking at the clustered assemblies, we observed a direct correlation between the number of alignable reads and RSEM-EVAL scores (i.e., the more alignable reads, the better the RSEM-EVAL score). We also observed an inverse correlation between the number of contigs with no reads aligned to and RSEM-EVAL scores (i.e., the lower number of contig with no reads aligned, the higher the RSEM-EVAL score)(Additional file 3: Table S01). In all cases, RSEM-EVAL scores consistently pointed to the same best assembly regardless of the redundancy of the sample. This last result is important as it underscores the ability of Detonate RSEM-EVAL to correct for these duplications by its prior modeling of assemblies algorithm. Finally, it is worth noting that in all cases more than 87% of the reads that entered assembly mapped to the different transcriptome assemblies and that the best assembly as called by Detonate RSEM-EVAL has the highest number of mapped reads (Additional file 3: Table S01).
Next, we further evaluated these assemblies by looking at their ’informational’ content. We detected differences between assemblies by estimating the full-length transcript ’coverage’ of the different assembled transcripts, or as we prefer to call them, transfragments, when compared to the Uniprot_Sprot protein database with Blastx [14, 15]. We selected Uniprot_Sprot because this is a high quality database [16–19]. We started by running blastx using as a ’query’ transfragments corresponding to different assemblies and as ’subject’ proteins in Uniprot_Sprot database. We used a stringent E-Value (1e-20) and retrieved only the best hit for each alignment (i.e., max-target-seqs=1). Results were then processed using ‘analyze_blastPlus_topHit_coverage’, a script provided with the Trinity package, and summarized in Additional file 3: Table S02 and displayed in Additional file 1: Figure S02. We found on average 5,582 full length transcripts for all four original Trinity assemblies and, on average, 5,556 full length transcripts for all four Trinity + CD-HIT-EST assemblies. Looking just at the CD-HIT-EST processed assemblies we found that 9,783 transcripts (average of all four ‘flattened’ assemblies) covered 60% or higher length percentage of the proteins present in Uniprot_Sprot. The equivalent number for the Trinity + CD-HIT-EST assemblies was 9,880 transcripts. We were unable to observe marked differences between assemblies, as they all looked similar (see Additional file 1: Figure S02). Given that the Trinity assemblies clustered with CD-HIT-EST are less complex while retaining the same sequence information, we decided to concentrate on these ‘flattened’ or ‘non-redundant’ assemblies (that from now on will simply be called Assembly01, Assembly02, Assembly03, and Assembly04).
Informational assembly content comparison against Uniprot_Sprot database
Total hits missing
Informational ’Between assemblies’ content comparison
Total Hits Missing
Based on these results it is hard to select a single “best assembly”. First, we were unable to select a given assembly based on the RSEM-EVAL score. Second, we found that each one of the individual assemblies obtained had informational content not present in the other assemblies. This ’informational-content’ difference was evident when the different assemblies were queried against standard databases. We used a ’complexity-reduction’ strategy designed to capture the unique information generated by the different assemblies by clustering them using very stringent parameters (i.e., 100% ID). This resulted in a substantial redundancy reduction, while preserving new information. Our results emphasize the need to evaluate assembly results critically and not just accept a given assembly as the best assembly using a single parameter. Results here obtained emphasize the complexity of transcriptome assembly. Even when using one of the best transcriptome assemblers available to date (i.e. Trinity), the complexity of transcriptome assembly should not be underestimated. Results coming from a single assembly must be taken with caution. While it has been established that assemblies at different Kmers generates a potentially larger set of assembled transfragments and elegant solutions have been proposed [21–27], finding a unified final solution to this problem is still an area of active investigation.
Defining the conserved proteome of the electric organ
The main motivation for this work was to identify key presynaptic proteins of the electric lobe that are important for the structure and function of the fish electric organ, and to combine these with proteins identified in previous transcriptomes of the postsynaptic organ (see Fig. 1). We defined the conserved proteome in three general steps: 1) The longest open reading frame (ORF) of each assembled transfragment was extracted, making sure not to leave behind any conserved and/or potentially functional smaller ORFs. 2) All available public records corresponding to T. californica deposited in the NCBI (GeneBank), were downloaded and processed. 3) Potential proteins sequences identified in steps 1 and 2 above were combined and characterized (see Fig. 2 for a general outline).
1. Defining the assembled presynaptic electric lobe proteome
To define our assembled proteome, without leaving behind any information, we first combined Assemblies 01, 02, 03, and 04 into a single file. This combined assembly, called ’Combined-Assemblies01-04’ (Fig. 2), contained a total of 378,591 assembled transfragments. This file was then ‘flattened’ using CD-HIT-EST (at 100% ID) to generate a file (Combined-Clustered-Assemblies01-04) containing 180,840 transfragments (or 47.8% of the original sequences). We then used ‘TransDecoder.LongOrfs’ to extract the longest ORF from each transfragment. The resulting file (Combined-Clustered-Translated-Assemblies01-04) had 1,057,426 proteins. The complexity of this file was reduced to 211,589 proteins with CD-HIT (100% ID). The final file (Combined-Clustered-Translated-Clustered-Assemblies01-04) was then used as a ’query’ for blastp searches against Uniprot_Sprot and HMMER searches against PfamA. The final Assembled Proteome file was generated by running ’TransDecoder.Predict’, using the Combined-Clustered-Assemblies01-04 file and the results of both Blastp and Pfam searches to generate a set of 124,536 predicted assembled proteins. The nucleotide sequences corresponding to these predicted proteins were screened for those that were shorter than 200 bp. We found the presence of 2,896 sequences smaller than 200 bp. Although the 2,896 sequences were used in downstream analyses, they were separated for sequence submission (see Availability of Data Materials for details). The final file contained 121,640 predicted assembled proteins (Translated-Assemblies01-04. Fig. 2).
2. Defining the publicly available postsynaptic electric organ proteome
After downloading all currently available sequences from NCBI, we followed the same logic outlined above. We started by processing 10,185 transcripts (file NCBI-Tcalifornica-Sequences) that clustered into 9,099 unique transcripts (Clustered-NCBI-Tcalifornica-Sequences). We obtained 18,404 peptides after ‘TransDecoder.LongOrfs’ translation (to get Clustered-Translated-NCBI-Tcalifornica-Sequences). This set produced 13,129 peptides after CD-HIT (Clustered-Translated-Clustered-NCBI-Tcalifornica-Sequences). The resulting peptides were then used for both Blastp and Pfam searches. Finally, we extracted a set of 6,490 proteins after running ‘TransDecoder.Predict’ (Translated-NCBI-Tcalifornica-Sequences).
3. Defining the MetaProteome
The MetaProteome of the electric organ was defined by combining files Translated-Assemblies01-04 + Translated-NCBI-Tcalifornica-Sequences to generate ‘Combined-Tcalifornica-MetaProteome’, containing a total of 128,130 proteins. This combined file was then further clustered to generate a final file called Combined-Clustered-Tcalifornica-MetaProteome containing a set of 74,195 predicted proteins. The resulting final set of 74,195 predicted proteins was defined as our MetaProteome (see Additional file 4: for details). In summary, of the 74,195 predicted MetaProteomic proteins, 70,338 proteins (98.8%) come from the presynaptic transcriptome, while only 3,857 proteins (5.2%) originated in the postsynaptic transcriptome.
Evaluationg the MetaProteome
Analysis of the MetaProteome against Uniprot_Sprot database
To determine the full-length ’coverage’ of the MetaProteome, we performed a Blastp search against Uniprot_Sprot. We wanted to determine how many of the Metaproteome sequences were complete. We observed an increase in the total number of hits to the database (from 15,929 to 17,615; compare Additional file 3: Tables S02 with S05), as expected. Similarly, the total number of Uniprot_Sprot proteins with a coverage percentage of 60% or more, jumped from 9,783 to 10,642 (859 more). We extracted the UniProt Identifiers of proteins who had a coverage of 70% or higher (9,600. Additional file 5). Using these IDs and the Panther database [28–31], we identified the Gene-Ontology (GO) terms associated with these proteins. Looking at the ‘Pathway’ category, we found enrichment in signaling pathways like Gonadotropin-releasing hormone receptor, CCRK, Wnt, Integrin, and Huntington disease pathways. In the ’Protein Class’ category, the highest percentage hits was against terms associated with RNA binding proteins like translation factors, mRNA processing, and ribosomal proteins. In addition, we observed enrichment associated with macromolecular complexes of the nervous system, for example tubulin, the SNARE proteins, Vesicle Coat proteins, among others. We also observed enrichment in terms associated with phosphoprotein phosphatase hydrolase activity and non-membrane spanning protein tyrosine kinase activity. We observed enrichment for the Alzheimer disease-amyloid secretase pathway. These observations support the notion that the electric organ is a good system to study the biochemistry of important signaling pathways. The compiled MetaProteome thus provides a good starting point to understand the biology of how the electric organ works (Additional file 5).
Evolutionary analysis of the MetaProteome
To understand the biology of the T. californica electric organ MetaProteome from an evolutionary point of view, we performed two broadly different analysis. In the first one, the MetaProteome was compared against the proteomes of largely different genomes: H. sapiens and C. milli (Elephant Shark). In the second one, transporter proteins were detected in the MetaProteome and the combined proteomes of 12 fish genomes (Additional file 3: Table S06), by comparing them against the Transporter Classification Database (TCDB) [32–35].
1. Comparative analysis of the MetaProteome against the human and elephant shark genomes
The number of orthologous pairs found between T. californica and C. milli was 7,100, between T. californica and H. sapiens was 5,403, and between H. sapiens, and C. milli was 7,209 (Additional file 7, Additional file 8, and Additional file 9). The number of orthologous pairs found to be uniquely shared between T. californica and C. milli (i.e., excluding H. sapiens) was 3,618, between T. californica and H. sapiens (i.e., excluding C. milli) was 1,921, and between H. sapiens and C. milli (i.e., excluding T. californica) was 3,727 (Additional file 10, Additional file 11, and Additional file 12). We calculated that 3,482 proteins define the common set shared by all three datasets (Additional file 13).
Percentage conservation of proteins between proteomes of T. californica and H. sapiens
Hit coverage percentage:
Number of hits:
Comparing and contrasting GO terms retrieved by the set of proteins present only in the H. sapiens versus T. californica (not C. milli) with those present in the H. sapiens versus C. milli (not T. californica) (Additional file 16 and Additional file 17, respectively), we find a distinctive pattern: the H. sapiens versus T. californica set is highly enriched in terms associated with Axon guidance mediated by Slit/Robo, Beta3 adrenergic receptor signaling pathway, Opioid prodynorphin pathway, Opioid proopiomelanocortin pathway, Metabotropic glutamate receptor group II pathway, and Muscarinic acetylcholine receptor 2 and 4 signaling pathway, among others (Additional file 16). In contrast, terms associated with the H. sapiens versus C. milli set are distinctively enriched with terms related to Alzheimer disease-presenilin pathway, Cadherin signaling pathway, and Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated pathway (Additional file 17). Similarly, among the protein classes observed in the H. sapiens versus T. californica only set, we detect acetyltransferase, G-protein, membrane traffic protein and RNA binding, whereas the terms we observed in the H. sapiens versus C. milli set are enriched in homeobox -related helix-turn-helix transcription factors and ion channel transporters. Overall there is a clear pattern that favors terms associated with RNA binding, vesicle, membrane, and synaptic signalling in the protein set shared only by H. sapiens and T. californica versus wnt/frizzled, phototransduction in those proteins shared by H. sapiens and C. milli.
Number of T. californica proteins present in important Neurobiological human pathways
Neurotrophin signaling pathway
Synaptic vesicle cycle
Amyotrophic lateral sclerosis (ALS)
SNARE interactions in vesicular transport
2. Analysis of the Proteomes of T. californica and Twelve Fish Genomes Against the Transporter Classification Database (TCDB)
Distribution of Eukaryotic transporter proteins present in T. californica and combined fish Proteomes
Classes of transportersa
Combined fish Proteomeb
T. californica Proteomeb
Families specific To T. californica
Porters (uniporters, symporters, antiporters)
Alpha type channels
Auxillary transport proteins
Recognized transporters of unknown mechanism
Putative transport proteins
Vesicle fusion pores
Transmembrane 1-electron transfer carriers
Acyl CoA ligase-coupled transporters
We then asked if the T. californica MetaProteome detected transporter proteins present in the TCDB dataset, not detected by the 12-Fish proteome. We found 42 such hits (Table 6; Additional file 23). The majority of the hits were in the Porters (uniporters, symporters, antiporters) class, including proteins such as the Excitatory Amino Acid Transporter 1 (EAAT1), which is important in the uptake of the excitatory neurotransmitter glutamate , and the Chloride Channel protein 2 (ClC-2) , which modulates neuronal excitability. Five Alpha-Type Channels were unique to T. californica, including the Voltage-dependent L-type calcium channel subunit 1 (Cav1.2) , and ligand gated channels, such as the Glutamate [NMDA], receptor subunit epsilon-3 (NR2C) , and the GABA(A) receptor subunit beta-1 . Finally, among the representative transporters detected only by T. californica, we found two proteins that have been previously identified in the Marbled electric ray T. marmorata - Dispanin, which is a type of auxiliary transport protein, and the proton conducting portion of the vacuolar-ATPase, the V-type proton ATPase 16 kDa proteolipid subunit .
The T. californica-specific hits described here (i.e., signal) were found despite the use of a combined 12-Fish proteome (i.e., noise) composed of 12 largely evolutionarily diverse organisms ranging from bony vertebrates (Euteleostomi like Coelacanth (Latimeria), Spotted gar (Lepisosteus), Zebrafish (Danio), Blind Cave Fish (Astyanax), Atlantic Cod (Gadus), Southern Platyfish (Xiphophorus), Japanese Medaka (Oryzias), Nile Tilapia (Oreochromis), Fugu (Takifugu), Amazon molly (Poecilia), Three-spined Stickleback (Gasterosteus), and Spotted Green Pufferfish (Tetraodon). Many of these organisms (i.e., the Euteleostomi), arguably are closer to H. sapiens than T. californica. Also, note that of the 42 transporters identified, only 22 of them were present in the set of proteins shared by C. milli, H. sapiens, and T. californica and 12 were present in the set shared by both H. sapiens and T. californica, and absent in C. milli. This suggests that these transporters are specific to T. californica (an electric fish) and are not just present in the Chondrichthyes. Finding Dispanin, a Torpedo protein, validated both the assembly and protein prediction strategy employed. The RBH-Blast strategy used here is very specific, as it establishes a one-to-one relationship between two datasets. RBH-Blast establishes that the hits observed are mutual, both from the query and from the subject (or database) point of view. Importantly, these results do not argue against nor discard the existence of homologous proteins present in the 12-Fish proteome. These results do establish, however, that in a RBH-Blast between the T. californica and current components of the TCDB-database, the T. californica transporter proteins selected, showed higher performance both at the level of Identity and Coverage, than the proteins of the combined 12-Fish-Proteome set used in this work. As a result, we were able to identify transporter proteins not identified by the other fish proteomes. In summary, we find that the set of transporters preferentially identified by the T. californica MetaProteome is highly enriched in proteins that play key roles in important neurological processes. These results are consistent with electric fish having a set of transporters that have been ’optimized’ to respond and recover quickly from a massive cellular depolarization. These results are also consistent with the idea that the genomes of organisms like H. sapiens and Mus musculus, in addition to other transporters, also posses the set of such ’electrically-optimized’ transporters or derivatives of them.
We have assembled a de novo transcriptome corresponding to the electric lobe of T. californica. We critically evaluated the quality of our assemblies using ‘Industry Standard’ methods. We found a high degree of variability between assemblies produced at different Kmers. Neither the number of transcripts assembled nor the value of the Detonate scores calculated gave us a definitive prediction for best assembly. In contrast, we observed that the only reliable parameter for assembly evaluation was related to the information content of the assembly in question, when compared to a standard database. We also generated a non-redundant set of transcripts by combining the transcriptome of the electric lobe with previous transcriptome of the electric organ, and determined those predicted proteins having high homology against the genomes of both H. sapiens and C. milli. Finally, we mapped and cross-annotated these highly-conserved predicted proteins against the well annotated Human biochemical and developmental pathways. We also identified transporter proteins present in the T. californica MetaProteome and in a MetaProteome set corresponding to the proteomes of 12-Fish genomes and identified a set of important transporters that were only detected by the T. californica MetaProteome. The combined information provides not only a unique tool for the study of cholinergic neurotransmission, but also a starting point for understanding the biology of early vertebrates, as well as, the biology of strongly electric fish, such as T. californica.
Poly(A)+ RNA purification, cDNA library preparation, and sequencing
The electric lobe from a female marine ray Tetronarce californica (Aquatic Research Consultants; San Pedro, CA) was dissected from the central nervous system, and total RNA was isolated from the frozen tissue as described . RNA concentration was determined using a Nanodrop spectrophotometer and quality assessed by denaturing gel electrophoresis in formaldehyde gels and Northern analyses . Poly(A)+RNA was purified using Poly(A)Purist MAG Kit (Ambion) and further cleaned up with Turbo DNase (Ambion) and Terminator-5’-phosphate-dependent exonuclease (Epicentre) to remove any trace amount of DNA and rRNA. cDNA was synthesized from purified poly(A)+mRNA using random hexamer oligonucleotide or oligo dT as primer. Synthesized cDNA was then sheared into small pieces of 100 to 800 bp in length using a Biorupter. The fragmented cDNA was prepared for Illumina sequencing using TruSeq Sample Prep Kit (Illumina). Paired-end sequencing (100 bp) of the cDNA libraries was performed on a HiSeq 2500 instrument (Illumina).
De novo transcriptome assembly
The description of the software, version number and origin (when applicable) and representative commands and databases used are presented in Additional file 2.
Reads quality control (RQC)
Briefly, reads quality score was initially evaluated using FastQC. The first and last nucleotides of every read were trimmed using fastx_trimmer (FASTX-Toolkit). Similarly, sequencing artifacts were removed using fastx_artifacts_filter (FASTX-Toolkit). Sequencing adaptors were then removed using Cutadapt . Reads were then trimmed with fastq_quality_trimmer (FASTX-Toolkit) to remove any nucleotide with a quality threshold lower than 20. Reads with a minimal length of 40 nucleotides (after trimming) were discarded. The read size distributions before and after RQC are shown in Additional file 2.
Assembly was performed using Trinity at four different Kmers (25, 27, 29, and 31). All assemblies were performed with the ’jaccard_clip’ flag on (see Additional file 2 for details).
Assembly experimental strategy
The general Assembly experimental strategy used is presented in Fig. 2.
Other essential tools
We would like to dedicate this work to the memory of Maria Stravianaku (1964-2014). Maya’s premature departure has become our motivation and inspiration to understand how the brain works. This work was in part performed using the Supercomputer Ada, managed and maintained by the Texas A&M University High Performance Research Computing Resources Center (http://hprc.tamu.edu). In particular, we would like to acknowledge Dr. Michael Dickens for his invaluable help and support during the realization of this work and Tatiana Aramayo for her invaluable help editing the manuscript.
Funding was provided by National Institute of General Medical Sciences (Grants No. R01GM58770 and R01GM123139) to RA and by the National Science Foundation (Award No. HRD-1612776) to RP.
Availability of data and materials
The reads assembled during this project have been deposited in the National Center for Biotechnology Information (NCBI) as Bioproject PRJNA322346 (Transcriptome Assembly of the Electric Organ of Tetronarce californica ) and as Biosample SAMN05158845 (Transcriptome Assembly of the Electric Organ of Tetronarce californica ). Transcripts larger than 200 bp generated by this project have been deposited in the National Center for Biotechnology Information (NCBI) at DDBJEMBLGenBank under the accession GFBV00000000. The version described in this paper is the first version, GFBV01000000. GeneBank IDs corresponding to NCBI sequences that were incorporated into the MetaProteome are presented in Additional file 4. Because NCBI rules precludes submission of sequences shorter than 200 bp long, the nucleotide sequences shorter than 200 bp, whose predicted peptides were incorporated into the MetaProteome are presented in Additional file 24: (Nucleotide) and Additional file 25: (Peptides). The authors will gladly provide any files and/or additional information related to this work upon request.
Listed from highest to lowest. Experimental Design and Methodology: RA, RP, MS and MH. Software Selection, Implementation and Scripts Generation: RA and RP. RNA Extraction, purification and library preparation: CW. Analysis: RA and RP. Figures: RA, RP and MH. Tables: RA and RP. Supplemental Data compilation and assembly: RA. Computational Resources: RA. Data Curation: RA. Writing - Original Draft: RA. Writing - Review and Editing: RA, RP and MH. Visualization: RA and RP. Funding Acquisition: RA, RP and MH. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Moller P. Chapman & Hall fish and fisheries series, 1st edn. London: Chapman & Hall; 1995, p. 584.Google Scholar
- Bennett MV. Comparative physiology: electric organs. Annu Rev Physiol. 1970; 32:471–528. doi:10.1146/annurev.ph.32.030170.002351.View ArticlePubMedGoogle Scholar
- Whittaker VP. The structure and function of cholinergic synaptic vesicles. the third thudichum lecture. Biochem Soc Trans. 1984; 12(4):561–76.View ArticlePubMedGoogle Scholar
- Nazarian J, Berry DL, Sanjari S, Razvi M, Brown K, Hathout Y, Vertes A, Dadgar S, Hoffman EP. Evolution and comparative genomics of subcellular specializations: Est sequencing of torpedo electric organ. Mar Genomics. 2011; 4(1):33–40. doi:10.1016/j.margen.2010.12.004.View ArticlePubMedPubMed CentralGoogle Scholar
- Nazarian J, Hathout Y, Vertes A, Hoffman EP. The proteome survey of an electricity-generating organ (torpedo californica electric organ). Proteomics. 2007; 7(4):617–27. doi:10.1002/pmic.200600686.View ArticlePubMedGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from rna-seq data without a reference genome. Nat Biotechnol. 2011; 29(7):644–52. doi:10.1038/nbt.1883.View ArticlePubMedPubMed CentralGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, Leduc RD, Friedman N, Regev A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protocol. 2013; 8(8):1494–512. doi:10.1038/nprot.2013.084.View ArticleGoogle Scholar
- Li B, Fillmore N, Bai Y, Collins M, Thomson JA, Stewart R, Dewey CN. Evaluation of de novo transcriptome assemblies from rna-seq data. Genome Biol. 2014; 15(12):553. doi:10.1186/s13059-014-0553-5.View ArticlePubMedPubMed CentralGoogle Scholar
- Li W, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinform (Oxford Engl). 2002; 18(1):77–82. doi:10.1093/bioinformatics/18.1.77.View ArticleGoogle Scholar
- Berriz GF, King OD, Bryant B, Sander C, Roth FP. Bioinformatics applications note. Bioinformatics. 2003; 19(18):2502–4. doi:10.1093/bioinformatics/btg363.View ArticlePubMedGoogle Scholar
- Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22(13):1658–9. doi:10.1093/bioinformatics/btl158.View ArticlePubMedGoogle Scholar
- Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinformatics. 2010; 26(5):680–2. doi:10.1093/bioinformatics/btq003.View ArticlePubMedPubMed CentralGoogle Scholar
- Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150–2. doi:10.1093/bioinformatics/bts565.View ArticlePubMedPubMed CentralGoogle Scholar
- Cock PJA, Chilton JM, Grüning B, Johnson JE, Soranzo N. NCBI BLAST+ integrated into Galaxy. 2015:bioRxiv, 014043. doi:10.1101/014043.
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST plus: architecture and applications. BMC Bioinforma. 2009; 10(421):1. doi:10.1186/1471-2105-10-421.Google Scholar
- Apweiler R, Bairoch A, Wu CH. Protein sequence databases. Curr Opin Chem Biol. 2004; 8(1):76–80. doi:10.1016/j.cbpa.2003.12.004.View ArticlePubMedGoogle Scholar
- Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: Comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007; 23(10):1282–8. doi:10.1093/bioinformatics/btm098.View ArticlePubMedGoogle Scholar
- Berriz GF, King OD, Bryant B, Sander C, Roth FP. Bioinformatics applications note. Bioinformatics. 2003; 19(18):2502–4. doi:10.1093/bioinformatics/btg363.View ArticlePubMedGoogle Scholar
- Li W, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinform (Oxford Engl). 2002; 18(1):77–82. doi:10.1093/bioinformatics/18.1.77.View ArticleGoogle Scholar
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015; 31(19):3210–2. doi:10.1093/bioinformatics/btv351.View ArticlePubMedGoogle Scholar
- Schulz MH. Letting the data speak for themselves: a fully Bayesian approach to transcriptome assembly. Genome Biol. 2014; 15(10):498. doi:10.1186/s13059-014-0498-8.View ArticlePubMedPubMed CentralGoogle Scholar
- Nakasugi K, Crowhurst R, Bally J, Waterhouse P. Combining transcriptome assemblies from multiple de novo assemblers in the allo-tetraploid plant nicotiana benthamiana. PLoS ONE. 2014;9(3). doi:10.1371/journal.pone.0091776.
- Moreton J, Izquierdo A, Emes RD. Assembly, assessment and availability of de novo generated eukaryotic transcriptomes. Front Genet. 2015; 6(January):1–9. doi:10.3389/fgene.2015.00361.Google Scholar
- Mbandi SK, Hesse U, Rees DJG, Christoffels A. A glance at quality score: Implication for de novo transcriptome reconstruction of Illumina reads. Front Genet. 2014; 5(FEB):1–5. doi:10.3389/fgene.2014.00017.Google Scholar
- Maretty L, Sibbesen J, Krogh A. Bayesian transcriptome assembly. Genome Biol. 2014; 15(10):501. doi:10.1186/s13059-014-0501-4.View ArticlePubMedPubMed CentralGoogle Scholar
- He B, Zhao S, Chen Y, Cao Q, Wei C, Cheng X, Zhang Y. Optimal assembly strategies of transcriptome related to ploidies of eukaryotic organisms. BMC Genomics. 2015; 16(1):65. doi:10.1186/s12864-014-1192-7.View ArticlePubMedPubMed CentralGoogle Scholar
- Tasnim M, Ma S, Yang EW, Jiang T, Li W. Accurate inference of isoforms from multiple sample rna-seq data. BMC Genomics. 2015; 16(2):1–12. doi:10.1186/1471-2164-16-S2-S15.Google Scholar
- Mi H, Poudel S, Muruganujan A, Casagrande JT, Thomas PD. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 2015; 44(D1):336–42. doi:10.1093/nar/gkv1194.View ArticleGoogle Scholar
- Thomas PD, Kejariwal A, Guo N, Mi H, Campbell MJ, Muruganujan A, Lazareva-Ulitsky B. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 2006; 34(WEB. SERV. ISS.):645–50. doi:10.1093/nar/gkl229.View ArticleGoogle Scholar
- Mi H, Muruganujan A, Casagrande JT, Thomas PD. Large-scale gene function analysis with the PANTHER classification system. Nat Protoc. 2013; 8(8):1551–66. doi:10.1038/nprot.2013.092.View ArticlePubMedGoogle Scholar
- Mi H, Thomas P, Vol. 563. Protein Networks and Pathway Analysis; 2009, pp. 123–40. doi:10.1007/978-1-60761-175-2, arXiv:1011.1669v3. http://link.springer.com/10.1007/978-1-60761-175-2.
- Saier MH, Reddy VS, Tsu BV, Ahmed MS, Li C, Moreno-Hagelsieb G. The Transporter Classification Database (TCDB): recent advances. Nucleic Acids Res. 2016; 44(D1):372–9. [PubMed Central:PMC4702804] [doi:10.1093/nar/gkv1103] [PubMed:26546518].View ArticleGoogle Scholar
- Saier MH, Reddy VS, Tamang DG, Vastermark A. The transporter classification database. Nucleic Acids Res. 2014; 42(Database issue):251–8. [PubMed Central:PMC3964967] [doi:10.1093/nar/gkt1097] [PubMed:24225317].View ArticleGoogle Scholar
- Saier MH, Yen MR, Noto K, Tamang DG, Elkan C. The Transporter Classification Database: recent advances. Nucleic Acids Res. 2009; 37(Database issue):274–8. [PubMed Central:PMC2686586] [doi:10.1093/nar/gkn862] [PubMed:19022853].View ArticleGoogle Scholar
- Saier MH, Tran CV, Barabote RD. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res. 2006; 34(Database issue):181–6. [PubMed Central:PMC1334385] [doi:10.1093/nar/gkj001] [PubMed:16381841].View ArticleGoogle Scholar
- Martin-Granados C, Philp A, Oxenham SK, Prescott AR, Cohen PT. Depletion of protein phosphatase 4 in human cells reveals essential roles in centrosome maturation, cell migration and the regulation of Rho GTPases. Int J Biochem Cell Biol. 2008; 40(10):2315–32. [doi:10.1016/j.biocel.2008.03.021] [PubMed:18487071].View ArticlePubMedGoogle Scholar
- Kittler R, Putz G, Pelletier L, Poser I, Heninger AK, Drechsel D, Fischer S, Konstantinova I, Habermann B, Grabner H, Yaspo ML, Himmelbauer H, Korn B, Neugebauer K, Pisabarro MT, Buchholz F. An endoribonuclease-prepared siRNA screen in human cells identifies genes essential for cell division. Nature. 2004; 432(7020):1036–40. [doi:10.1038/nature03159] [PubMed:15616564].View ArticlePubMedGoogle Scholar
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999; 27(1):29–34. doi:10.1093/nar/27.1.29.View ArticlePubMedPubMed CentralGoogle Scholar
- Choi KD, Jen JC, Choi SY, Shin JH, Kim HS, Kim HJ, Kim JS, Choi JH. Late-onset episodic ataxia associated with SLC1A3 mutation. J Hum Genet. 2017; 62(3):443–6.View ArticlePubMedGoogle Scholar
- Jordt SE, Jentsch TJ. Molecular dissection of gating in the ClC-2 chloride channel. EMBO J. 1997; 16(7):1582–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu G, Shi J, Yang L, Cao L, Park SM, Cui J, Marx SO. Assembly of a Ca2+-dependent BK channel signaling complex by binding to beta2 adrenergic receptor. EMBO J. 2004; 23(11):2196–205.View ArticlePubMedPubMed CentralGoogle Scholar
- Wenzel A, Fritschy JM, Mohler H, Benke D. NMDA receptor heterogeneity during postnatal development of the rat brain: differential expression of the NR2A, NR2B, and NR2C subunit proteins. J Neurochem. 1997; 68(2):469–78.View ArticlePubMedGoogle Scholar
- Angelotti TP, Macdonald RL. Assembly of GABAA receptor subunits: alpha 1 beta 1 and alpha 1 beta 1 gamma 2S subunits produce unique ion channels with dissimilar single-channel properties. J Neurosci. 1993; 13(4):1429–40.PubMedGoogle Scholar
- Birman S, Meunier FM, Lesbats B, Le Caer JP, Rossier J, Israel M. A 15 kDa proteolipid found in mediatophore preparations from Torpedo electric organ presents high sequence homology with the bovine chromaffin granule protonophore. FEBS Lett. 1990; 261(2):303–6.View ArticlePubMedGoogle Scholar
- Wei J, Zhang Y, Ivanov IP, Sachs MS. The stringency of start codon selection in the filamentous fungus Neurospora crassa. J Biol Chem. 2013; 288(13):9549–62. doi:10.1074/jbc.M112.447177.View ArticlePubMedPubMed CentralGoogle Scholar
- Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011; 17(1):10. doi:10.14806/ej.17.1.200.View ArticleGoogle Scholar
- Tange O. Gnu parallel - the command-line power tool. ;login: The USENIX Magazine. 2011; 36(1):42–7. doi:10.5281/zenodo.16303.Google Scholar
- Kuhn RM, Haussler D, Kent WJ. The UCSC genome browser and associated tools. Brief Bioinforma. 2013; 14(2):144–61.View ArticleGoogle Scholar
- Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res. 2002; 12(4):656–64.View ArticlePubMedPubMed CentralGoogle Scholar