Skip to main content

De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish



High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives.


We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod.


Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.


The deep sea is characterized by high hydrostatic pressure, darkness and low temperatures [1]. Among these characteristics, high hydrostatic pressure is regarded as the harshest for living organisms, since it can inhibit the functions of proteins through denaturing and impairing their structures [2, 3]. This is especially for enzymes [4, 5] and cytoskeleton proteins [6]. Besides, at low temperatures, DNA and RNA strands tend to tighten their structures, hindering the involvement of enzymes in DNA replication, transcription and translation [7] and thus disrupting the transcription and translation processes.

In eukaryotes, actin and microtubules are the primary constituents of cytoskeleton organization, which contributes to maintaining cytoskeletal structures, intracellular transport and cell motility [8]. However, high hydrostatic pressure has been found to disrupt actin fibers, microtubules and myosins in mammalian cells [6]. High hydrostatic pressure can also influence the cellular regulatory system, which controls and regulates the cytoskeletal changes, thus disrupting the assembly of actin filaments and microtubules [9]. Therefore, high hydrostatic pressure can affect all sorts of biological processes that rely on the cytoskeleton, such as spindle formation, cell division, mitosis and meiosis [10].

As a response to high hydrostatic pressure, cells in deep-sea organisms develop counteractive strategies, such as amino acid substitutions at specific key sites of actin sequences [8, 11] to help stabilize the advanced structures of proteins. For instance, Q137K and A155S can maintain the coupling of ATP and Ca2+ to counteract the dissociation effects of high pressure, and both V54A and L67P can help sustain the DNase I activity [8, 11]. A few amino acid substitutions of lactate dehydrogenase from deep-sea fish were suggested to help the enzyme better tolerate and function under high hydrostatic pressure [12]. Amino acid substitutions may also contribute to the hydrostatic pressure adaptation of protein-protein interactions or ligand binding [13]. In fact, this is not limited to deep-sea fish. In the amphipod Hirondellea gigas, amino acid substitutions likely provide the main resource for molecular adaptation, allowing this creature to survive and thrive in hadal trenches [14]. Additionally, osmolytes can help proteins fold properly and remain stable so that they can maintain their functions under high hydrostatic pressure [15, 16].

Aldrovandia affinis (Günther, 1877) [17] is a benthopelagic teleost fish (Actinopterygii: Teleostei) commonly found in the deep sea. It has a snake-like body and a pointed snout. This species is widespread in the Atlantic Ocean and Pacific Ocean at depths ranging from 730 m to 2560 m [18]. To adapt to a wide range of hydrostatic pressure, A. affinis has likely developed capabilities to maintain protein structures and functions, especially cytoskeleton organization [2], but little is known about the molecular mechanisms.

Genome-wide patterns of positive selection can be effectively identified through transcriptome sequencing combined with a branch site model [19,20,21]. In this approach, proteins of a specific species are compared with its ancestral protein sequences predicted through phylogeny. If the nonsynonymous substitution rate (dN) is significantly larger than the synonymous substitutions rate (dS), the genes are defined as positively selected [22, 23]. In the present study, the transcriptome of A. affinis captured from the Okinawa Trough at a depth of 1550 m using a ROV was sequenced and compared with those of three shallow-water fish species (the cave fish Astyanax mexicanus, the cod fish Gadus morhua, and the platy fish Xiphophorus maculatus) in order to identify positively selected genes.


Sample collection, RNA extraction, and sequencing

During the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) R/V Kairei cruise KR15–17 in November 2015, an individual A. affinis (JAMSTEC sample no. 1150047615) (Fig. 1) was captured from the Sakai hydrothermal vent field [24, 25] of the Okinawa Trough (21°31.4749’ N, 126°59.021′ E) at a depth of approximately 1550 m by the ROV Kaiko Mk-IV. A section of muscle tissue was preserved in RNAlater at 4 °C overnight, and transferred to − 80 °C afterwards. The species was identified initially based on morphology [18] and later confirmed with the COI barcoding sequence. RNA was extracted using the TRIzol Reagent (Invitrogen, USA) according to the manufacturer’s instruction. The quality and quantity of RNA were evaluated by 1.5% agarose gel electrophoresis and with the NanoDrop 2000 (Thermo). The RNA quality was further tested with the Agilent 2100 Bioanalyzer and the RNA integrity number was 7.8. A full-length cDNA library was constructed and sequenced on the Illumina HiSeq 4000 with the read length of 150 bp.

Fig. 1
figure 1

Photograph of an individual of the deep-sea fish Aldrovandia affinis. Aldrovandia affinis being captured from the Sakai hydrothermal vent field of the Okinawa Trough (21°31.4749’ N, 126°59.021′ E) at a depth of 1550 m by the remotely operated vehicle Kaiko (Dive Number #676)

Data filtering, de novo assembly, functional annotation and reference species determination

Trimmomatic version 0.33 [26] was used to trim the adaptors and remove low-quality reads. Trinity version 2.0.6 [27] was utilized to de novo assemble all filtered reads. Due to redundancy of isoforms generated during alternative splicing, only the isoform with the highest abundance as estimated by RSEM was retained for each gene. CD-HIT-EST version 4.6.5 [28] with the setting of “-c 0.95” was used to remove transcripts whose sequence similarity exceeded 95% [14, 29]. BUSCO version 2.0 [30] and the metazoa_odb9 database were used to assess the completeness of the non-redundant transcripts. Coding sequences of the non-redundant transcripts were then predicted and translated using TransDecoder [27, 31] with a cut-off of 100 bp as recommended by the Trinity manual [27]. Each transcript was represented by the longest translated protein sequence in subsequent analyses.

All translated protein sequences were compared to sequences in NCBI non-redundant (NR) database using BLASTp version 2.2.31 with an E-value < 1 × e− 5. Then, the NR hits of all protein sequences were classified according to the taxonomy database of NCBI using MEGAN5 [32]. The functions and annotations of proteins were predicted with Blast2GO version 3.1 [33] to search against the Gene Ontology (GO) database. The KEGG (Kyoto Encyclopedia of Genes and Genomes) Automatic Annotation Server [34] was used together with the bi-directional BLAST method to identify pathway information. Subsequently, the GO item distribution for biological process, cellular components and molecular functions was summarized and plotted with WEGO version 1.0 [35].

Amino acids and codons usage analysis

The codon usage of the overall coding region was calculated in CodonW version 1.4.4 [36]. The codon usage bias was determined by the relative synonymous codon usage (RSCU). RSCU > 1 and RSCU < 1 imply positive and negative codon usage bias, respectively. If RSCU equals 1, the codon usage is regarded as no bias. A Perl script was written to calculate the proportion of amino acids.

Identification of orthologs and phylogenetic analysis

Single-copy orthologs shared by the deep-sea fish A. affinis and sequences of shallow-water fishes Astyanax mexicanus, Gadus morhua, Lepisosteus oculatus, Oryzias latipes, Tetraodon nigroviridis, Xiphophorus maculatus and Latimeria chalumnae, in the Ensembl database were identified using OrthoMCL version 2.0.9 [37] relying on all-vs-all BLASTp with an E-value threshold of 1 × e− 5 and MultiParanoid [38] which clusters pairwise orthologs inferred with InParanoid [39]. Aligned amino acid sequences of single-copy orthologs between A. affinis and the shallow-water species (with Latimeria chalumnae serving as the outgroup; class Sarcopterygii) were concatenated and used for constructing a phylogenetic tree using RAxML version 8.2.4 [40], which applies maximum-likelihood analysis based on the substitution model of PROGAMMA + GTR with 100 bootstraps. The phylogenetic tree with the highest bootstrap value was used in subsequent positive selection analysis. To exclude the influences of paralogs generated from genome duplication within the species, single-copy orthologs derived from speciation were used. Indeed, including a greater number of the shallow-water species would increase the statistical significance of the result. However, doing so would also reduce the number of common single-copy orthologs. Therefore, there is a trade-off between the number of shared single-copy orthologs and the number of species used. Moreover, genome duplication is common in fish [41, 42] and also reduces the number of single-copy orthologs. We ran several trials and found that three particular shallow-water fish species (A. mexicanus, G. morhua, and X. maculatus) can provide the subsequent positive selection analysis with the greatest number of single-copy orthologs. Therefore, these three species of fish were referred to the subsequent positive selection analysis of A. affinis.

Positive selection analysis

The same analytical pipeline described in the previous genome study [29] was used to identify positively selected genes in the deep-sea A. affinis. A modified branch site model A [43] coupled with Bayesian Empirical Bayes (BEB) methods [44] was adopted to compare A. affinis with the shallow-water species.

MUSCLE [45] was used to align amino acid sequences, and amino acid alignment further guided the alignment of coding DNA sequences in ParaAT version 1.0 [46] with the “-g” flag to delete gaps in the aligned sequences. The strength of positive selection on each codon of each orthologous gene along a specific targeted lineage of a phylogenetic tree, designated as the deep-sea A. affinis, was estimated with the modified branch site model using codeml of the PAML package [47]. To determine to what degree these codon sequences along the targeted lineage fit the branch site model including positive selection better than the one containing neutral selection or negative selection, an alternative branch site model (Model = 2, NSsites = 2 and Fix = 0) and a neutral branch site model (Model = 2, NSsites = 2, Fix = 1 and Fix ω = 1) were combined to calculate log-likelihood values for each model using likelihood ratio tests. The log-likelihood values generated were used to assess the model fit, using the Chi-square test with one degree of freedom [43]. A multiple testing correction method [48] was then applied to correct the P values. In addition, potential positive selection of codon sites was assessed by their posterior probabilities calculated with the BEB method. If the posterior probability exceeds 0.9, then the amino acid site would be considered as a positively selected site. Genes with an adjusted P value < 0.1 [49, 50] and positively selected amino acid sites were regarded as positively selected genes.


A total of 38,370,894 raw paired-end reads (150 bp) were cleaned and filtered, resulting in 31,858,276 reads (83%) that were retained and used for de novo assembly. After removing redundant isoforms from the raw transcriptome assembly and predicting the open reading frame, 27,633 non-redundant transcripts ranging from 297 bp to 19,469 bp had a total size of 27,427,719 bp and a contig N50 value of 1359 nt (Table 1). The length distribution of the assembled contigs is shown in Additional file 1: Figure S1. These non-redundant transcripts hit 90.9% of the single-copy orthologs in the BUSCO metazoan database, including 82.2% of the complete orthologs and 8.7% of the fragmented orthologs. Translating all of these A. affinis transcripts with the open reading frames, 23,196 (~ 84%) of these sequences were significantly matched to the existing protein sequences in the NCBI NR database; 15,999 had at least one significant match to the GO item; and 7954 had significant hits in terms of KEGG pathways (Table 1). The GO item distribution of A. affinis (Fig. 2) for biological processes, molecular functions and cellular components did not appear to be significantly biased, indicating that there was no sequence bias in the reads.

Table 1 Statistics of assembly and annotation for Aldrovandia affinis
Fig. 2
figure 2

Gene ontology distribution for the cellular component, molecular function and biological process of Aldrovandia affinis

A phylogenetic tree (Fig. 3) was constructed from a total of 349,341 amino acids that were aligned and trimmed from single-copy orthologs shared by the deep-sea fish A. affinis and shallow-water fishes A. mexicanus, G. morhua, L. oculatus, O. latipes, T. nigroviridis, X. maculatus and L. chalumnae (the latter of which served as the outgroup). A. mexicanus, G. morhua and X. maculatus were chosen for positive selection analysis because they shared 4918 single-copy orthologs with A. affinis, which is the greatest number possible from this pool. The Venn diagram shows that 7475 gene families were shared among A. affinis, A. mexicanus, G. morhua and X. maculatus, including both single-copy orthologs and multi-copy paralogs (Fig. 4). There was no significant amino acid and codon usage bias among these four species (Additional file 1: Table S1). Among these orthologous genes, 138 genes (Additional file 1: Table S2) in A. affinis fitted the alternative branch site model significantly better assuming positive selection and had positively selected amino acid sites with a posterior probability exceeding 0.9. A set of proteins involved in cytoskeleton organization, especially proteins stabilizing actin and microtubules, and nucleic-binding proteins involved in genetic information processing had a clear positive sign (Table 2).

Fig. 3
figure 3

Maximum-likelihood phylogenetic tree for Aldrovandia affinis and shallow-water fish. The shallow-water fish species include the cave fish Astyanax mexicanus, the cod fish Gadus morhua, the spotted gar Lepisosteus oculatus, the medaka fish Oryzias latipes, the tetraodon fish Tetraodon nigroviridis, the platy fish Xiphophorus maculatus and the coelacanth Latimeria chalumnae (class Sarcopterygii; serving as the outgroup). This tree was constructed based on the substitution model of PROGAMMA + GTR with 100 bootstraps

Fig. 4
figure 4

Gene families shared by Aldrovandia affinis, Astyanax mexicanus, Gadus morhua and Xiphophorus maculatus

Table 2 Positively selected genes related to the cytoskeleton system and genetic information processing in Aldrovandia affinis


High hydrostatic pressure and low temperatures are considered as two major barriers to survival in the deep sea [2, 7]. How certain animals cope with such adverse conditions remains largely unknown. A set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing were identified in this study. This finding implies certain genes contribute to molecular adaptation to the deep sea. By comparing our results with those from our previous studies concerning the giant amphipod H. gigas collected from the Challenger Deep at a depth of approximately 11,000 m [14], we found that functional domains involved in separating the strands of the DNA double helix chain or the self-annealed RNA chain, including DEAD (Asp-Glu-Ala-Asp motif) box helicase, helicase conserved C-terminal domain, UvrD/REP helicase N-terminal domain and RNA helicase, as well as eukaryotic initiation factor 4G, are positively selected in both A. affinis and H. gigas. These domains are capable of generating cold shock response and are further involved in unwinding unfavorable secondary structures, which helps maintain the genetic information processing in the deep-sea environment [51, 52]. Both the fish A. affinis and the amphipod H. gigas are unable to regulate their body temperature themselves, which means that essential genetic processes such as DNA replication, transcription and translation, are confronted with the threats of low temperatures in the deep ocean [7, 53]. Moreover, high hydrostatic pressure treatment can trigger cold shock response in bacteria [54]. Therefore, cold shock genes are subjected to positive selection pressure, which may help animals deal with not only low temperatures but also high hydrostatic pressure to maintain their key genetic processes in the deep sea.

Besides low temperatures, deep-sea organisms are exposed to high hydrostatic pressure that can cause DNA chain breakage and damage, and thus it is suspected that they would need to repair their DNA more frequently to maintain DNA integrity [55,56,57,58]. In the present study, two important genes involved in repairing DNA damage are positively selected in A. affinis. One is DNA repair endonuclease XPF (ERCC4) (Table 2, Fig. 5) that can contribute to repairing abnormal nucleotide excision and helping recombinant DNA remove cross-links during the homologous recombination stage [59]. The other gene is replication factor C subunit 1 (RFC1) (Table 2, Fig. 5). In the hadal amphipod H. gigas, replication factor A1 (RFA1) is positively selected [14]. Both RFC1 and RFA1 help repair DNA damaged by environmental stress [14, 60]. In the deep sea, animals cannot avoid exposure to high hydrostatic pressure, and their fundamental genetic information is vulnerable in such an extreme environment. Thus, deep-sea animals probably require stronger DNA repairing mechanisms to protect their genetic information from high hydrostatic pressure. The positive selection of genes required for DNA repair may be one of the reasons that deep-sea vertebrates and invertebrates can adapt to high hydrostatic pressure.

Fig. 5
figure 5

Partial alignment of positively selected genes. Double asterisks indicate that the amino acids in Aldrovandia affinis have a BEB posterior probability higher than 95% and a single asterisk indicates that the sites have a posterior probability between 90% and 95%. (Aa: Aldrovandia affinis; Am: Astyanax mexicanus; Gm: Gadus morhua and Xm: Xiphophorus maculatus)

In contrast to the hadal amphipod H. gigas [14], a set of genes involved in cytoskeleton reorganization, especially microtubule regulation, are positively selected in the deep-sea fish A. affinis. The assembly of microtubules can be inhibited under high hydrostatic pressure [2, 9]. Microtubules determine the extension of axon formation and neuronal polarity [61], and thus one possible reason that these genes are associated with microtubule cytoskeletons under positive selection in the deep-sea fish A. affinis is that this particular fish has a much more developed nervous system than H. gigas. Genes involved in maintaining microtubules may be subjected to higher positive selection pressure to sustain the function of the nervous system under high hydrostatic pressure. Such positively selected genes include CDK5 regulatory subunit-associated 2 (CDK5RAP2), cytoskeleton-associated 5 (CKAP5) and CLIP-associating protein 2 (CLASP2) (Table 2, Fig. 5). These genes bind to the plus-end of microtubules to regulate the dynamics of their assembly [62,63,64,65]. Furthermore, CDK5RAP2 can promote microtubule nucleation in axons [66, 67]. Dystonin (DST) is a key protein linking F-actin and neuro-filaments to maintain neuronal cytoskeleton organization. The positive selection (Table 2, Fig. 5) of this protein may help protect the nervous system of deep-sea fish from the effects of high hydrostatic pressure [2, 68]. Even though the results obtained in the present study are based on one individual A. affinis, they still reflect the genetics of the entire species. This study has thoroughly compared the positive selection between deep-sea vertebrates and deep-sea invertebrates, which sheds light on the molecular adaptation of deep-sea animals.


A set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing were identified in the present study. These genes imply the molecular adaptation of animals to the deep sea. The deep-sea organisms rely on the amino acids substitutions of these positively selected genes as the main adaptation resources to survive in such an environment. Furthermore, the present study provides a high-quality, deep-sea transcriptome that can serve as a reference for future deep-sea studies.



Bayesian Empirical Bayes


CDK5 regulatory subunit-associated 2


Cytoskeleton-associated 5


CLIP-associating protein 2


Asp-Glu-Ala-Asp motif




DNA repair endonuclease XPF


Gene ontology


Japan Agency for Marine-Earth Science and Technology


Kyoto Encyclopedia of Genes and Genomes


Likelihood ratio tests






Replication factor A1


Replication factor C subunit 1


Remotely operated vehicle


Relative synonymous codon usage


  1. Jamieson AJ, Fujii T, Mayor DJ, Solan M, Priede IG. Hadal trenches: the ecology of the deepest places on earth. Trends Ecol Evol. 2010;25(3):190–7.

    Article  PubMed  Google Scholar 

  2. Somero GN. Adaptations to high hydrostatic pressure. Annu Rev Physiol. 1992;54(1):557–77.

    Article  PubMed  CAS  Google Scholar 

  3. Ohmae E, Miyashita Y, Kato C. Thermodynamic and functional characteristics of deep-sea enzymes revealed by pressure effects. Extremophiles. 2013;17(5):701–9.

    Article  PubMed  CAS  Google Scholar 

  4. Saad-Nehme J, Silva JL, Meyer-Fernandes JR. Osmolytes protect mitochondrial F0F1-ATPase complex against pressure inactivation. Biochim Biophys Acta. 2001;1546(1):164–70.

    Article  PubMed  CAS  Google Scholar 

  5. Nishiguchi Y, Abe F, Okada M. Different pressure resistance of lactate dehydrogenases from hagfish is dependent on habitat depth and caused by tetrameric structure dissociation. Mar Biotechnol. 2011;13(2):137–41.

    Article  CAS  Google Scholar 

  6. Crenshaw HC, Allen JA, Skeen V, Harris A, Salmon ED. Hydrostatic pressure has different effects on the assembly of tubulin, actin, myosin II, vinculin, Talin, vimentin, and cytokeratin in mammalian tissue cells. Exp Cell Res. 1996;227(2):285–97.

    Article  PubMed  CAS  Google Scholar 

  7. Feller G, Gerday C. Psychrophilic enzymes: hot topics in cold adaptation. Nat Rev Microbiol. 2003;1(3):200–8.

    Article  PubMed  CAS  Google Scholar 

  8. Morita T. Structure-based analysis of high pressure adaptation of α-actin. J Biol Chem. 2003;278(30):28060–6.

    Article  PubMed  CAS  Google Scholar 

  9. Bourns B, Franklin S, Cassimeris L, Salmon ED. High hydrostatic pressure effects in vivo: changes in cell morphology, microtubule assembly, and actin organization. Cell Motil Cytoskeleton. 1988;10(3):380–90.

    Article  PubMed  CAS  Google Scholar 

  10. Ishii A, Sato T, Wachi M, Nagai K, Kato C. Effects of high hydrostatic pressure on bacterial cytoskeleton FtsZ polymers in vivo and in vitro. Microbiology. 2004;150(6):1965–72.

    Article  PubMed  CAS  Google Scholar 

  11. Morita T. Comparative sequence analysis of myosin heavy chain proteins from congeneric shallow- and deep-living rattail fish (genus Coryphaenoides). J Exp Biol. 2008;211(9):1362–7.

    Article  PubMed  CAS  Google Scholar 

  12. Brindley AA, Pickersgill RW, Partridge JC, Dunstan DJ, Hunt DM, Warren MJ. Enzyme sequence and its relationship to hyperbaric stability of artificial and natural fish lactate dehydrogenases. PLoS One. 2008;3(4):e2042.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Lemaire B, Karchner SI, Goldstone JV, Lamb DC, Drazen JC, Rees JF, et al. Molecular adaptation to high pressure in cytochrome P450 1A and aryl hydrocarbon receptor systems of the deep-sea fish Coryphaenoides armatus. Biochim Biophys Acta. 2018;1866(1):155–65.

    Article  CAS  Google Scholar 

  14. Lan Y, Sun J, Tian R, Bartlett DH, Li R, Wong YH, et al. Molecular adaptation in the world’s deepest-living animal: insights from transcriptome sequencing of the hadal amphipod Hirondellea gigas. Mol Ecol. 2017;26(14):3732–43.

    Article  PubMed  CAS  Google Scholar 

  15. Yancey PH, Blake WR, Conley J. Unusual organic osmolytes in deep-sea animals: adaptations to hydrostatic pressure and other perturbants. Comp Biochem Physiol A Mol Integr Physiol. 2002;133(3):667–76.

    Article  PubMed  Google Scholar 

  16. Yancey PH, Siebenaller JF. Co-evolution of proteins and solutions: protein adaptation versus cytoprotective micromolecules and their roles in marine organisms. J Exp Biol. 2015;218(12):1880–96.

    Article  PubMed  Google Scholar 

  17. The World Register of Marine Species (WoRMS) database ( Accessed 30 Apr 2018.

  18. Fujikura K, Okutani T, Maruyama T. Deep-sea life biological observations using research submersibles. 2nd ed. Kanagawa: Tokai University press; 2012.

    Google Scholar 

  19. Baldo L, Santos ME, Salzburger W. Comparative transcriptomics of eastern African cichlid fishes shows signs of positive selection and a large contribution of untranslated regions to genetic diversity. Genome Biol Evol. 2010;3:443–55.

    Article  PubMed Central  Google Scholar 

  20. Yang L, Wang Y, Zhang Z, He S. Comprehensive transcriptome analysis reveals accelerated genic evolution in a Tibet fish, Gymnodiptychus pachycheilus. Genome Biol Evol. 2014;7(1):251–61.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Hongo JA, Castro GM, Cintra LC, Zerlotini A, Lobo FP. POTION: an end-to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes. BMC Genomics. 2015;16(1):567.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002;19(6):908–17.

    Article  PubMed  CAS  Google Scholar 

  23. Yang Z, Dos Reis M. Statistical properties of the branch-site test of positive selection. Mol Biol Evol. 2011;28(3):1217–28.

    Article  PubMed  CAS  Google Scholar 

  24. Nakamura K, Kawagucci S, Kitada K, Kumagai H, Takai K, Okino K. Water column imaging with multibeam echo-sounding in the mid-Okinawa trough: implications for distribution of deep-sea hydrothermal vent sites and the cause of acoustic water column anomaly. Geochem J. 2015;49(6):579–96.

    Article  CAS  Google Scholar 

  25. Miyazaki J, Makabe A, Matsui Y, Ebina N, Tsutsumi S, Ishibashi JI, et al. WHATS-3: an improved flow-through multi-bottle fluid sampler for deep-sea geofluid research. Front Earth Sci. 2017;5:45.

    Article  Google Scholar 

  26. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with trinity. Nat Protoc. 2013;8(8):1494–512.

    Article  PubMed  CAS  Google Scholar 

  28. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.

    Article  PubMed  CAS  Google Scholar 

  29. Sun J, Zhang Y, Xu T, Zhang Y, Mu H, Zhang Y, et al. Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes. Nat Ecol Evol. 2017;1(5):121.

    Article  PubMed  Google Scholar 

  30. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2.

    Article  PubMed  CAS  Google Scholar 

  31. TransDecoder software ( Accessed 20 Oct 2017.

  32. Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, et al. MEGAN community edition – interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol. 2016;12(6):e1004957.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.

    Article  PubMed  CAS  Google Scholar 

  34. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35:W182–5.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, et al. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 2006;34:W293–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. CodonW software ( Accessed 20 Oct 2017.

  37. Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–89.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Alexeyenko A, Tamas I, Liu G, Sonnhammer EL. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics. 2006;22(14):e9–15.

    Article  PubMed  CAS  Google Scholar 

  39. O'Brien KP, Remm M, Sonnhammer EL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005;33:D476–80.

    Article  PubMed  CAS  Google Scholar 

  40. Stamatakis A, Hoover P, Rougemont J. A rapid bootstrap algorithm for the RAxML web servers. Syst Biol. 2008;57(5):758–71.

    Article  PubMed  Google Scholar 

  41. Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431(7011):946.

    Article  PubMed  Google Scholar 

  42. Taylor JS, Braasch I, Frickey T, Meyer A, Van de Peer Y. Genome duplication, a trait shared by 22,000 species of ray-finned fish. Genome Res. 2003;13(3):382–90.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005;22(12):2472–9.

    Article  PubMed  CAS  Google Scholar 

  44. Yang Z, Wong WS, Nielsen R. Bayes empirical Bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005;22(4):1107–18.

    Article  PubMed  CAS  Google Scholar 

  45. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Zhang Z, Xiao J, Wu J, Zhang H, Liu G, Wang X, et al. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun. 2012;419(4):779–81.

    Article  PubMed  CAS  Google Scholar 

  47. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.

    Article  PubMed  CAS  Google Scholar 

  48. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995;57(1):289–300.

    Google Scholar 

  49. Areal H, Abrantes J, Esteves PJ. Signatures of positive selection in toll-like receptor (TLR) genes in mammals. BMC Evol Biol. 2011;11:368.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Raj T, Kuchroo M, Replogle JM, Raychaudhuri S, Stranger BE, De Jager PL. Common risk alleles for inflammatory diseases are targets of recent positive selection. Am J Hum Genet. 2013;92(4):517–29.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Thieringer HA, Jones PG, Inouye M. Cold shock and adaptation. BioEssays. 1998;20(1):49–57.

    Article  PubMed  CAS  Google Scholar 

  52. Lim J, Thomas T, Cavicchioli R. Low temperature regulated DEAD-box RNA helicase from the Antarctic archaeon, Methanococcoides burtonii. J Mol Biol. 2000;297(3):553–67.

    Article  PubMed  CAS  Google Scholar 

  53. Gualerzi CO, Giuliodori AM, Pon CL. Transcriptional and post-transcriptional control of cold-shock genes. J Mol Biol. 2003;331(3):527–39.

    Article  PubMed  CAS  Google Scholar 

  54. Wemekamp-Kamphuis HH, Karatzas AK, Wouters JA, Abee T. Enhanced levels of cold shock proteins in Listeria monocytogenes LO28 upon exposure to low temperature and high hydrostatic pressure. Appl Environ Microbiol. 2002;68(2):456–63.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Abe F, Kato C, Horikoshi K. Pressure-regulated metabolism in microorganisms. Trends Microbiol. 1999;7(11):447–53.

    Article  PubMed  CAS  Google Scholar 

  56. Rothschild LJ, Mancinelli RL. Life in extreme environments. Nature. 2001;409(6823):1092–101.

    Article  PubMed  CAS  Google Scholar 

  57. Aertsen A, Van Houdt R, Vanoirbeek K, Michiels CW. An SOS response induced by high pressure in Escherichia coli. J Bacteriol. 2004;186(18):6133–41.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Dixon DR, Pruski AM, Dixon LR. The effects of hydrostatic pressure change on DNA integrity in the hydrothermal-vent mussel Bathymodiolus azoricus: implications for future deep-sea mutagenicity studies. Mutat Res. 2004;552(1–2):235–46.

    Article  PubMed  CAS  Google Scholar 

  59. Kornguth DG, Garden AS, Zheng Y, Dahlstrom KR, Wei Q, Sturgis EM. Gastrostomy in oropharyngeal cancer patients with ERCC4 (XPF) germline variants. Int J Radiat Oncol Biol Phys. 2005;62(3):665–71.

    Article  PubMed  Google Scholar 

  60. Friedberg EC, Walker GC, Siede W, Wood RD. DNA repair and mutagenesis. 2nd ed. Washington DC: American Society for Microbiology Press; 2006.

    Google Scholar 

  61. van Beuningen SF, Hoogenraad CC. Neuronal polarity: remodeling microtubule organization. Curr Opin Neurobiol. 2016;39:1–7.

    Article  PubMed  CAS  Google Scholar 

  62. Mimori-Kiyosue Y, Grigoriev I, Lansbergen G, Sasaki H, Matsui C, Severin F, et al. CLASP1 and CLASP2 bind to EB1 and regulate microtubule plus-end dynamics at the cell cortex. J Cell Biol. 2005;168(1):141–53.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. Tsvetkov AS, Samsonov A, Akhmanova A, Galjart N, Popov SV. Microtubule-binding proteins CLASP1 and CLASP2 interact with actin filaments. Cell Motil Cytoskeleton. 2007;64(7):519–30.

    Article  PubMed  CAS  Google Scholar 

  64. Fong KW, Hau SY, Kho YS, Jia Y, He L, Qi RZ. Interaction of CDK5RAP2 with EB1 to track growing microtubule tips and to regulate microtubule dynamics. Mol Biol Cell. 2009;20(16):3660–70.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Cappell KM, Larson B, Sciaky N, Whitehurst AW. Symplekin specifies mitotic fidelity by supporting microtubule dynamics. Mol Cell Biol. 2010;30(21):5135–44.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Choi YK, Liu P, Sze SK, Dai C, Qi RZ. CDK5RAP2 stimulates microtubule nucleation by the γ-tubulin ring complex. J Cell Biol. 2010;191(6):1089–95.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Sánchez-Huertas C, Freixo F, Viais R, Lacasa C, Soriano E, Lüders J. Non-centrosomal nucleation mediated by augmin organizes microtubules in post-mitotic neurons and controls axonal microtubule polarity. Nat Commun. 2016;7:12187.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Dalpé G, Leclerc N, Vallée A, Messer A, Mathieu M, De Repentigny Y, et al. Dystonin is essential for maintaining neuronal cytoskeleton organization. Mol Cell Neurosci. 1998;10(5–6):243–57.

    Article  PubMed  Google Scholar 

Download references


The authors would like to thank Dr. Hiroyuki Yamamoto (JAMSTEC) who served as the chief scientist of research cruise KR15-17. The authors greatly appreciate the tireless support from the captain and crew members of R/V Kairei, the technical team of ROV Kaiko, as well as all scientists on-board during the research cruise.


This study and the research cruise KR15-17 was supported by Council for Science, Technology, and Innovation (CSTI) as the Cross Ministerial Strategic Innovation Promotion Program (SIP), Next-generation Technology for Ocean Resource Exploration. This study was financially supported by a grant from the Strategic Priority Research Program of the Chinese Academy of Sciences (project number: XDB06010102) awarded to PYQ.

Availability of data and materials

All sequencing reads were submitted to Sequence Read Archive database of National Center for Biotechnology Information (SRA accession number: SRR5306085). The non-redundant assembled transcripts were deposited to the Transcriptome Sequencing Assembly database (TSA accession number: GFJD00000000).

Author information

Authors and Affiliations



PYQ conceived the experiments and led the project. JS and CC collected the samples. TX extracted the RNA. YL performed the bioinformatics analysis and wrote the manuscript. RMT contributed to the bioinformatics analysis. JWQ contributed to the text. All authors contributed to and approved the manuscript for submission and publication.

Corresponding author

Correspondence to Pei-Yuan Qian.

Ethics declarations

Ethics approval and consent to participate

All animal experiments were approved by the Ethics Committee of the Hong Kong University of Science and Technology.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Table S1. Codon usage among Aldrovandia affinis, Astyanax mexicanus, Gadus morhua and Xiphophorus maculatus. Table S2. A complete list of positively selected genes in Aldrovandia affinis. Figure S1. Statistics of assembly contigs length. (DOCX 68 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, Y., Sun, J., Xu, T. et al. De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish. BMC Genomics 19, 394 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: