- Open Access
Transcriptome sequencing based annotation and homologous evidence based scaffolding of Anguilla japonica draft genome
BMC Genomics volume 17, Article number: S13 (2016)
Anguilla japonica (Japanese eel) is currently one of the most important research subjects in eastern Asia aquaculture. Enigmatic life cycle of the organism makes study of artificial reproduction extremely limited. Henceforth genomic and transcriptomic resources of eels are urgently needed to help solving the problems surrounding this organism across multiple fields. We hereby provide a reconstructed transcriptome from deep sequencing of juvenile (glass eels) whole body samples. The provided expressed sequence tags were used to annotate the currently available draft genome sequence. Homologous information derived from the annotation result was applied to improve the group of scaffolds into available linkage groups.
With the transcriptome sequence data combined with publicly available expressed sequence tags evidences, 18,121 genes were structurally and functionally annotated on the draft genome. Among them, 3,921 genes were located in the 19 linkage groups. 137 scaffolds covering 13 million bases were grouped into the linkage groups in additional to the original partial linkage groups, increasing the linkage group coverage from 13 to 14 %.
This annotation provide information of the coding regions of the genes supported by transcriptome based evidence. The derived homologous evidences pave the way for phylogenetic analysis of important genetic traits and the improvement of the genome assembly.
Abundance of Japanese eel, as well as other freshwater eels belongs to genus Anguilla, has been radically shrinking in the past decade . Catadromous eels’ enigmatic life cycle makes their reproduction affected deeply by anthropogenic impacts. Lack of an economical method to artificial reproduce makes this organism extremely vulnerable to overconsumption. Mature eels migrate thousands of kilometers into the open ocean to spawn eggs. Exact spawning locations of Japanese eels were hard to pinpoint and remained unknown until recently when they were found near Western Mariana Ridge. What prohibits the research from further improvement is that, eggs and larvae of Anguilla japonica are spread by the Kuroshio Current, making the habitats spans a wide area of Eastern Asia . Cylindrical shape larva develops into transparent color leptocephalus larvae, which eventually metamorphosis into glass eels. Glass eels migrate back into the freshwater through estuaries, sometimes traveling within wet sands into the inner continent, where they spend years going through pigmentation into yellow eels, and then silver eels . Such wide area of habitation potentially makes effect of pollution and diseases to be accumulated. Long life cycle and the spawning habit through migration make wild Anguilla eels hard to recover from the damage caused by overfishing.
Physical linkage map of Japanese eel were constructed in 2011 . High throughput Sequencing technology was rarely applied upon this issue before. However, with the advances of sequencing technologies bringing down the cost and time consuming of DNA and RNA sequencing, plus the approaching extinction of fresh water eels, the field began to change. In 2010, mRNA-Seq study of deep sequencing and de novo reconstruction of European glass eel were reported as well as the hox genes of the specie , 2 years following that, draft genome sequence of European were also published . The incorporative research of genomic and transcriptomics information from the deep sequencing should have major impacts in multiple fields. Expression profiling of both transcriptome of European eels response to environmental pollution were reported in 2012 . The first draft genome of Japanese eel was assembled , proving that the hox genes and genomic distance of European and Japanese eels were conserved. By 2014, a ddRAD-based linkage map was published, providing 13 % coverage of the draft genome . Such results left plenty of space for improvement.
Hereby, we provide a reconstructed transcriptome from whole body samples of juvenile (glass eels). The high throughput sequencing provides unprecedented amount of transcriptomic information. Instead of focusing only on certain types of tissues or organs, full transcriptome of the entire organism was sequenced. For future experimental design and guidance on ecological, physiological, artificially breeding and even toxicity resistance study of Japanese eel, such transcriptome can provide additional guidance. What‘s more, the massive amount of evidence provided by the transcriptome helps the complete of draft genome structural annotation. Combining transcriptome sequence data with publicly available expressed sequence tags evidences, 18,121 genes were structurally and functionally annotated on the draft genome. The structural annotation was performed through an established pipeline, MAKER . Functional annotation was based on sequence alignment. The acquired homologous evidences were further used to improve the draft genome scaffolding. Applying an improved version of scaffolding algorithm developed by Aganezov et al. , synteny of Anguilla japonica was compared to the genome of Fugu, Stickleback, Medaka, Tetraodon, Coelacanth and Zebra fish. Obtained results were integrated with previously published linkage map , putting 3,921 genes into the 19 linkage groups, which represent chromosomes of Anguilla japonica. 137 scaffolds were grouped into the linkage groups in addition to the original partial linkage groups. Phylogenetic analysis of the gene clusters correlation to thyroid hormone receptors and pigmentation were performed with MEGA 6.0 .
Results and discussion
Sequencing through Illumina HiSeqTM 2000 generated total 85,233,812 reads, with length of 101 nucleotides. After quality control, low quality reads were trimmed and left 77,939,562 reads were left with an average length of 99.575 nucleotides (Additional file 1: Figure S1). Quality control of the sequence reads is summarized in Additional file 1: Figure S2 and Additional file 1: Table S3. Assembly were assessed through average length of unigenes, as well as quality score N50 and N90. As the result shows in Additional file 1: Table S2, average length, N50 and N90 of clustered unigenes are significantly higher than results of single De Novo assembly tools. Composition of assembled unigenes showed in Additional file 1: Table S3 demonstrates that clustered unigenes tends to have higher composition of longer nucleotides. Hence assembles generated through clustering were considered to have higher accuracy and were used for further annotation. In Additional file 1: Figure S2, we demonstrate that expression level measured with FPKM, frequency of reads per kilo base per million, distributes through all different length of assembles.
Summary of the functional annotation is listed in Additional file 1: Table S4. From the total 32,210 assembled unigenes, 16,106 were found to be aligned to known proteins in NCBI Non Redundant protein database. 10,848 of the unigenes were found to contain functional domains through RPSBLAST against NCBI Conserved Domain Database. 5641 transcripts were found to involve with system biological pathway in Kyoto Encyclopedia of Genes and Genomes (KEGG) . 13,434 transcripts were annotated to certain Gene Ontology terms. Up to top 5 blast hits per query were considered in the process. Distribution of homologs belonging to other organisms were illustrated in Additional file 1: Figure S3.
Expression level of the transcripts was examined together with their frequency to be assigned to certain Gene Ontology terms. As demonstrated in Additional file 1: Figure S4, transcripts regarding enzyme regulator activity express level exceeded the total average expression level despite the fact that only four of them were found. Distribution of assembled transcripts through different KEGG pathways categories (Additional file 1: Figures S5 and S6) were also observed alongside with average expression level. As shows in Additional file 1: Figure S5, despite only few transcripts found in some pathway categories such as Circulatory system and Reaction module maps, average expression level of transcripts within these categories demonstrates potential rich activities of these pathways.
Protein functional domains found on the assembled transcripts were also viewed in their distribution alongside with expression level, demonstrated in Additional file 1: Figures S7 and S8. Protein functional domains with extreme expression level can provide guidance to the future protein- protein interaction study.
Catadromous eels’ reproduction is limited by their long life cycle and migration spawning. To produce enough supply without consuming wild glass eels, development of technology that would shorten the period of time for eels to mature, and would artificially induced spawning of healthy larvae is inevitable. Hence, revealing the mechanisms of metamorphosis from leptocephalus larvae into glass eels, as well as fermentation from glass eels to mature silver eels is the key to successful artificial reproduction to supply commercial demands and keep wild eels from extinction.
In the past, transcriptomic studies of eels mainly relied on classical molecular biological experimental methods. Studies of various mechanisms were performed with classical molecular methods such as cloning and protein purification. Cloning and protein purification provide only partial view of the transcripts . To fully capture all protein coding transcripts, a combination of next generation sequencing and new transcript assembly algorisms is necessary . In 2014, a study of mRNA expression profile through RT-PCR of prolactin, growth hormone, and somatolactin of Japanese eel was reported . However, researches utilizing genome information of Japanese eel, and the respective resources available for the experimental design are still limited. On the other hand, hybrids of European and American eels were found occurred naturally in Iceland , import of European glass eels into East Asia could trigger interspecific hybridization of Anguilla eel, inducing further anthropogenic impacts to this species near extinction . Proven possibilities of hybrid reproduction , as well as the successful artificial hybrid of European and Japanese eels also bring new possibilities to the artificially reproduction . However, the transcriptomic information is still limited. In 2013, the first transcriptomic study through 454 deep sequencing was performed on gill of Anguilla japonica . Utilization of proteomic approaches and transcriptomic sequencing gave insights into the osmoregulation mechanism, providing transcriptomic view of Anguilla japonica’s catadromous behavior. However the study  didn’t correlate with the currently available draft genome.
On eel sexualize mechanism, several surveys have been conducted on ovarian steroid genesis . Expression level of several genes were also found to be related to the ovarian development. Through the attempts of artificial reproduction of glass eels has been attempted since 1930s in Europe , only until 2003, first successful artificially induced spawning of Japanese eel was achieved through injection of salmon pituitary extracts into the female eel and human chorionic gonadotropin into male eel . An unpublished successful F2 generation was declaimed in 2011 . However current technology is not sufficient for large scale reproduction. Mortality of artificially cultured eels is still high.
Under current circumstance, all main stream studies of Japanese eels should inevitably focus on how to successfully improve life cycle of eels under the artificial environment to suit the existing demand. Such studies would take into consideration with all kinds of mechanisms. Metamorphosis, pigmentation and sexualize mechanisms of eels are all deeply correlated to their catadromous spawning activities, especially the metamorphosis mechanisms including reorganization of the entire body plan. Complete genome structure and transcriptome is essential for future study.
To suite such a purpose, we provide the first complete transcriptome of glass eels. Application of deep sequencing provides not only the information of homologs, but also the potential novel genes of Japanese eels. Clustering of De Novo assembled transcripts from different tools through overlapping successfully increase the assembly quality. Distribution of assembled transcripts through different species, GO terms, system biological pathway and protein functional domains of found genes were examined and demonstrated. In addition, we further provide expression level alongside the distributions. Such demonstration successfully provides the hidden information about pathways with few genes but extreme expression levels.
Without structural and functional annotation, draft genome  provides only limited information since Anguilla japonica is not an established model organism . The massive amount of information provided by RNA Sequencing for our experiment makes transcriptome evidence sufficient enough to perform complete structure annotation, which identifies genes and their intron-exon structure. Before doing so, due to the fact that Eukaryotic genomes are rich with repeat sequences, a process of repeat masking needs to be carried out. Usually such a process would need a well-established repeat sequence library to serve as guideline. However such a library doesn’t exist for Anguilla japonica. In this case , de novo constructing a new library  from the draft genome is a better choice than using established model organism data base such as Repbase . Structural annotation supported by RNA-seq could be done by directly assembling reads on to the genome with tool like Cufflinks. We further improved the de novo assembly into a genome based assembly following the convention guideline . An overview of the annotation process can be found in Fig. 1. The automatic pipeline we used, MAKER , performs multiple ab initio gene predictions, and cross verifies with the evidence driven prediction. Such prediction, backed up by the rich evidences, increase the accuracy of the structural annotation.
Many assemblies of the bone fish draft genomes were built upon SNP base linkage groups. With the homologs synteny information derived from the functional annotation, we successfully allocated 137 scaffolds (13 Mb) into the established linkage group. The relative position of the homologs and clusters on these linkage map are illustrated in Figs. 2, 3, 4 and 5. The plots were generated with Mapchart .
We hypothesis that metamorphosis mechanism of Anguilla japonica, like other Teleost fishes, conserve with amphibians , based on the fact that development and growth of fish correlates to thyroid hormone had been widely accepted . Such metamorphosis of vertebrate tends to be triggered by environmental control on hypothalamo-pituitary-thyroid axis in the brain , and regulated by the thyroid hormone receptor on cell membrane. The environment control, in this case, would be the catadromous activity from ocean to inland. In certain developmental stages of the larval, the brain senses the environmental stress urging the metamorphosis to glass eel, and releases corticotropin release factor (CRF), which correlated with genes crh and CRHBP. CRF binds on receptors of anterior pituitary, forcing it produce thyrotrophin (TSH) and adrenocorticotrophic hormone (ACTH). TSH triggers thyroids to secret thyroid hormone, while ACTH triggers adrenals or Interrenals to secret corticoids. From our functional annotation, we observed that the trha homolog of Japanese eel is correlated with TSH, and pomcb homolog is correlated with ACTH of Anguilla japonica. Finally, thyroid hormone receptors on the cell membrane as well as nuclear receptors within the cell regulate whether the metamorphosis would be triggered. We believe that thyroid hormone receptors of Anguilla japonica are regulated by the TRIP4, THRAP3, TRIP11, TRIP12, TRIP13 genes homologs, while the nuclear receptors are regulated by the genes NR4A3, NR2C2, NCOA3 and NR2F6 homologs respectfully. Phylogenetic tree of thyroid hormone receptor interactor (TRIP) family homologs is illustrated in Figs. 6 and 7. Through the analysis we found that the thyroid hormone receptor interactor family genes of Japanese eel are homologous to Asian arowana, Northern Pike, Rainbow trout, Marbled rockcod, Atlantic herring and Spotted gar. Phylogenetic tree of nuclear receptor subfamily 2 is illustrated in Figs. 8 and 9. From the graph we observed that the receptors were also closely related to homologs of same six fishes, Asian arowana, Northern Pike, Rainbow trout, Marbled rockcod, Atlantic herring and Spotted gar.
Among the homolog clusters we have found, lysosomal trafficking regulator LYST cluster in linkage group 10 (LG10) was related to the GO term pigmentation GO:0043473 . Correlated scaffolds in the linkage group and phylogenetic analysis of the found homologs are illustrated in Fig. 10. From the phylogenetic tree we found that LYST homolog of Japanese eel is also close related to Asian arowana, a kind of freshwater East Asia fish with high commercial value.
While such a full transcriptome from whole body is proven to be effective on functional annotation, scope of this annotation is still limited by the sample. Since the messenger RNAs were isolated from glass eels before sexualization, certain types of hormones from gender specific tissues of silver eels such as ovarian and testicular cannot be found in such samples. On the other hand, satellite sequences and mRNAs from eggs and larval might not necessary been expressed in our samples. Despite some of a portion of this information is available on NCBI, which we used in our annotation, these factors could still limit the completeness of the annotation.
Materials and methods
Isolated whole RNA from the five glass eels were prepare for the RNA sequencing. Libraries for the RNA-Seq were sequenced through Illumina HiSeqTM 2000 following the manufacturer’s manual. Pair end libraries were sequenced in 101 X 2 nucleotides length reads, with 120 nucleotides adaptors. The entire fragment length was 357 nucleotides. The base calling and image analysis were done following Illumina standard pipeline. Raw reads of deep sequencing went through quality control procedure done by using FASTX-Toolkit: FASTQ/A short reads pre-processing tools, with only Quality value Phred score over 20 nucleotides remain, which means only reads with per base accuracy over 99 % were kept. Also, we trimmed the length of the reads down to 70 nucleotides for low quality reads. The transcriptome were first reconstructed through de novo assembly. To achieve maximum accuracy, we applied three different main stream de novo assembly tools : Trinity , Oases , and SOAPdenovo-Trans . Quality controlled reads were assembled into three separate sets of contigs, with the three different tools. Trinity was applied with default settings, while Oases and SOAPdenovo-Trans were applied with multiple-kmers strategy. To further eliminate overlapping contigs, we clustered the three sets of contigs with CD-HIT-EST  into three sets of unigenes. Finally, we clustered unigenes with high similarity together with the tool CAP3 . Quality of the assembly was estimated mainly through an average length of unigenes, as well as quality score N50 and N90. N50 represents the length of the longest unigene among the collection of unigenes equal to a half of the sum of all unigenes, while N90 means the length of the shortest unigene among the collection of unigenes equal to ninety percent of the sum of all unigenes. Maximum and minimum length of assembled unigene also serves as an index for the assessment.
Abundance of the assembled unigenes were estimated through RSEM pipeline . Quantities of the transcripts were estimated through FPKM value. FPKM, frequency of reads per kilo base per million ,value was calculated through aligning reads onto assembled transcripts with Bowtie . A de novo repeat library of Anguilla japonica was built from the draft genome  through RepeatScout . Then, de novo assembly of the RNA-Seq data were pooled with the complete and partial CDS, EST and previously done gill RNA-seq assembly  of Anguilla japonica from NCBI as EST evidence. The EST evidence includes the public available RNA-seq data sets SRX482728, SRX247092, SRX115953 and the whole body transcriptome sequence data. Together with the known proteins from NCBI, genome structural annotation was performed through the pipeline MAKER . The pipeline firstly masked the repeat sequences according to the previously build library with Repeatmasker (http://repeatmasker.org), and then perform ab initio prediction through repeat training of SNAP  and polished with Exonerate .
To find the assembled transcripts coding proteins, unigenes were blasted against NCBI non-redundant protein data base, TrEMBL and Swiss-Port  with BLASTX. Hits with an e value lower than 10 to negative 5, filtered by penalty estimation through the credibility of the protein, would be considered as homologs. Next, available Gene Ontology  terms were found listed. On the other hand, potential protein conserved domain were found through RPSBLAST against Pfam  and NCBI COG . To help the system biological analysis in the future, available KEGG pathways  were also annotated.
For scaffolding purposes, we utilized an improved version (to be described elsewhere) of the gene order based scaffolding method developed by Aganezov et al. . Since this method relies on gene orders of multiple genomes, we preprocessed genomic sequences of Fugu , Stickleback , Medaka , Tetraodon , Coelacanth  and Zebra fish  in addition to Anguilla japonica to represent them as sequences of homologous gene (decided by respective scaffolds boundaries). Scaffolds with no homologous genes were filtered out from genomes during the preprocessing. In contrast to the original method described in Aganezov et al. , the improved method accounts for gene insertions/deletions and duplications and thus no filtration for unique gene content was needed. We utilized the phylogenetic tree in Fig. 11 a. While provided with 7190 scaffolds with homologous genes on them, scaffolding method identified 525 links and assembled scaffolds respectively. These scaffolds were then mapped into male and female linkage maps provided by the Kai et al. study . Order of the combined scaffolds was determined by an application of topological sort. With the scaffolds grouped into male and female linkage groups overlapped with each other, order of the scaffolds on the chromosomes can be sorted with topological sort algorithm. Phylogenetic tree of the fishes and the process of topological sort are illustrated in Fig. 11.
Conclusions and prospective works
We provide a reconstructed transcriptome from whole body samples of juvenile (glass eels). The high throughput sequencing provides unprecedented amount of transcriptomic information. For future experimental design and guidance on ecological, physiological, artificially breeding and even toxicity resistance study of Japanese eel, the transcriptome provide guidance. For example, expression of specific genes shows extreme patterns in glass eels and can be further compared with larvae as well as silver eels through QPCR to provide further reevaluation of the metamorphosis and pigmentation mechanism.
Link to a gtf file of the annotation, fasta files of the protein and transcripts and genetic linkage map would be available in Additional file 1.
Minegishi Y, Henkel CV, Dirks RP, van den Thillart GEEJM: Genomics in Eels - Towards Aquaculture and Biology. Marine Biotechnol. 2012, 14 (5): 583-590. 10.1007/s10126-012-9444-5.
Tsukamoto K, Aoyama J, Miller MJ: Migration, speciation, and the evolution of diadromy in anguillid eels. Can J Fish Aquat Sci. 2002, 59 (12): 1989-1998. 10.1139/f02-165.
Henkel CV, Burgerhout E, de Wijze DL, Dirks RP, Minegishi Y, Jansen HJ, et al: Primitive Duplicate Hox Clusters in the European Eel's Genome. PloS One. 2012, 7 (2): e32231-10.1371/journal.pone.0032231.
Nomura K, Ozaki A, Morishima K, Yoshikawa Y, Tanaka H, Unuma T, et al: A genetic linkage map of the Japanese eel (Anguilla japonica) based on AFLP and microsatellite markers. Aquaculture. 2011, 310 (3–4): 329-342. 10.1016/j.aquaculture.2010.11.006.
Coppe A, Pujolar JM, Maes GE, Larsen PF, Hansen MM, Bernatchez L, et al: Sequencing, de novo annotation and analysis of the first Anguilla anguilla transcriptome: EeelBase opens new perspectives for the study of the critically endangered european eel. BMC Genomics. 2010, 11: 635-10.1186/1471-2164-11-635.
Pujolar JM, Marino IA, Milan M, Coppe A, Maes GE, Capoccioni F, et al: Surviving in a toxic world: transcriptomics and gene expression profiling in response to environmental pollution in the critically endangered European eel. BMC Genomics. 2012, 13: 507-10.1186/1471-2164-13-507.
Henkel CV, Dirks RP, de Wijze DL, Minegishi Y, Aoyama J, Jansen HJ, et al: First draft genome sequence of the Japanese eel, Anguilla japonica. Gene. 2012, 511 (2): 195-201. 10.1016/j.gene.2012.09.064.
Kai W, Nomura K, Fujiwara A, Nakamura Y, Yasuike M, Ojima N, et al: A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication. BMC Genomics. 2014, 15 (1): 233-10.1186/1471-2164-15-233.
Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, et al: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008, 18 (1): 188-196. 10.1101/gr.6743907.
Aganezov S, Sitdykova N, Alekseyev MA, Consortium A: Scaffold assembly based on genome rearrangement analysis. Comput Biol Chem. 2015, 57: 46-53. 10.1016/j.compbiolchem.2015.02.005.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013, 30 (12): 2725-2729. 10.1093/molbev/mst197.
Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.
Martin JA, Wang Z: Next-generation transcriptome assembly. Nat Rev Genet. 2011, 12 (10): 671-682. 10.1038/nrg3068.
Sudo R, Suetake H, Suzuki Y, Aoyama J, Tsukamoto K: Profiles of mRNA expression for prolactin, growth hormone, and somatolactin in Japanese eels, Anguilla japonica: The effect of salinity, silvering and seasonal change. Comp Biochem Physiol A Mol Integr Physiol. 2013, 164 (1): 10-16. 10.1016/j.cbpa.2012.09.019.
Albert V, Jonsson B, Bernatchez L: Natural hybrids in Atlantic eels (Anguilla anguilla, A. rostrata): evidence for successful reproduction and fluctuating abundance in space and time. Mol Ecol. 2006, 15 (7): 1903-1916. 10.1111/j.1365-294X.2006.02917.x.
Okamura A, Zhang H, Utoh T, Akazawa A, Yamada Y, Horie N, et al: Artificial hybrid between Anguilla anguilla and A. japonica. J Fish Biol. 2004, 64 (5): 1450-1454. 10.1111/j.0022-1112.2004.00409.x.
Tse WK, Sun J, Zhang H, Law AY, Yeung BH, Chow SC, et al: Transcriptomic and iTRAQ proteomic approaches reveal novel short-term hyperosmotic stress responsive proteins in the gill of the Japanese eel (Anguilla japonica). J Proteomics. 2013, 89: 81-94. 10.1016/j.jprot.2013.05.026.
Kazeto Y, Tosaka R, Matsubara H, Ijiri S, Adachi S: Ovarian steroidogenesis and the role of sex steroid hormones on ovarian growth and maturation of the Japanese eel. J Steroid Biochem Mol Biol. 2011, 127 (3–5): 149-154. 10.1016/j.jsbmb.2011.03.013.
Tanaka H, Kagawa H, Ohta H, Unuma T, Nomura K: The first production of glass eel in captivity: fish reproductive physiology facilitates great progress in aquaculture. Fish Physiol Biochem. 2003, 28 (1–4): 493-497. 10.1023/B:FISH.0000030638.56031.ed.
Yandell M, Ence D: A beginner's guide to eukaryotic genome annotation. Nat Rev Genet. 2012, 13 (5): 329-342. 10.1038/nrg3174.
Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-358. 10.1093/bioinformatics/bti1018.
Kapitonov VV, Jurka J: A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet. 2008, 9 (5): 411-412.
Voorrips R: MapChart: software for the graphical presentation of linkage maps and QTLs. J Hered. 2002, 93 (1): 77-78. 10.1093/jhered/93.1.77.
Laudet V: The origins and evolution of vertebrate metamorphosis. Curr Biol. 2011, 21 (18): R726-737. 10.1016/j.cub.2011.07.030.
Power D, Llewellyn L, Faustino M, Nowell MA, Björnsson BT, Einarsdottir I, et al: Thyroid hormones in growth and development of fish. Comp Biochem Physiol C Toxicol Pharmacol. 2001, 130 (4): 447-459. 10.1016/S1532-0456(01)00271-X.
Sturm RA: Molecular genetics of human pigmentation diversity. Hum Mol Genet. 2009, 18 (R1): R9-R17. 10.1093/hmg/ddp003.
Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012, 28 (8): 1086-1092. 10.1093/bioinformatics/bts094.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012, 1 (1): 18-10.1186/2047-217X-1-18.
Fu L, Niu B, Zhu Z, Wu S, Li W: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012, 28 (23): 3150-3152. 10.1093/bioinformatics/bts565.
Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.
Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011, 12: 323-10.1186/1471-2105-12-323.
Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.
Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5(1):59.
Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005, 6: 31-10.1186/1471-2105-6-31.
Bairoch A, Apweiler R: The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 1999, 27 (1): 49-54. 10.1093/nar/27.1.49.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, et al: The Pfam protein families database. Nucleic Acids Res. 2012, 40 (Database issue): D290-301. 10.1093/nar/gkr1065.
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012, 40 (Database issue): D109-114. 10.1093/nar/gkr988.
Aparicio S, Chapman J, Stupka E, Putnam N, Chia J-M, Dehal P, et al: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002, 297 (5585): 1301-1310. 10.1126/science.1072104.
Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, Mauceli E, et al: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431 (7011): 946-957. 10.1038/nature03025.
Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, et al: The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007, 447 (7145): 714-719. 10.1038/nature05846.
Amemiya CT, Alföldi J, Lee AP, Fan S, Philippe H, MacCallum I, et al: The African coelacanth genome provides insights into tetrapod evolution. Nature. 2013, 496 (7445): 311-316. 10.1038/nature12027.
Postlethwait JH, Yan Y-L, Gates MA, Horne S, Amores A, Brownlie A, et al: Vertebrate genome evolution and the zebrafish gene map. Nat Genet. 1998, 18 (4): 345-349. 10.1038/ng0498-345.
The authors would like to thank the Ministry of Science and Technology, for financially supporting this research. We also thank the UST-UCSD International Center of Excellence in Advanced Bioengineering and Veterans General Hospitals and University System of Taiwan. This work was funded by Ministry of Science and Technology, [MOST 101-2311-B-009-005-MY3, MOST 103-2628-B-009-001-MY3, MOST104-2319-B-400-002, MOST 103-2319-B-010-002 and MOST103-3111-Y-001-027], UST-UCSD International Center of Excellence in Advanced Bioengineering. sponsored by the Ministry of Science and Technology I-RiCE Program [MOST 103-2911-I-009-101], Veterans General Hospitals and University System of Taiwan (VGHUST) Joint Research Program [VGHUST103-G5-11-2], MOE ATU and the funding for open access charge: Ministry of Science and Technology of the Republic of China, Taiwan [103-2628-B-009-001-MY3] and [MOHW104-TDU-B-212-124-005]. The work of S.A. and M.A.A. was supported by the National Science Foundation under the grant No. IIS-1462107.
The authors declare no financial competing interests.
Y.C.L. carried out the draft genome based annotation, constructed the linkage maps, performed the phylogenetic analysis and drafted the manuscript. S.D.H., C.H.C., W.Y.H., Y.H.C., C.Y.L., G.J.L., and S.Z.H. participated in the quality control of the sequence reads, construct De Novo assembly of the transcriptome and participated in the sequence alignment based functional annotation. S.A.A. and M.A.A. conducted the comparative genomic analysis and helped the revision of the manuscript. C.D.H. prepared the samples, provided the transcriptome sequence data. C.D.H. and H.D.H. supervised the progress of the study, provided guidance in its design and coordination and helped to draft and revise the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Contains a link to the gtf file of the annotation, fasta files of the protein and transcripts and genetic linkage map. Additional file 1 also includes Figure S1 to Figure S8, Table S1 to Table S4. (DOCX 429 kb) (DOCX 430 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.