Deep sequencing for de novo construction of a marine fish (Sparus aurata)transcriptome database with a large coverage of protein-coding transcripts
© Calduch-Giner et al.; licensee BioMed Central Ltd. 2013
Received: 29 November 2012
Accepted: 8 March 2013
Published: 15 March 2013
The gilthead sea bream (Sparus aurata) is the main fish species cultured in the Mediterranean area and constitutes an interesting model of research. Nevertheless, transcriptomic and genomic data are still scarce for this highly valuable species. A transcriptome database was constructed by de novo assembly of gilthead sea bream sequences derived from public repositories of mRNA and collections of expressed sequence tags together with new high-quality reads from five cDNA 454 normalized libraries of skeletal muscle (1), intestine (1), head kidney (2) and blood (1).
Sequencing of the new 454 normalized libraries produced 2,945,914 high-quality reads and the de novo global assembly yielded 125,263 unique sequences with an average length of 727 nt. Blast analysis directed to protein and nucleotide databases annotated 63,880 sequences encoding for 21,384 gene descriptions, that were curated for redundancies and frameshifting at the homopolymer regions of open reading frames, and hosted at http://www.nutrigroup-iats.org/seabreamdb. Among the annotated gene descriptions, 16,177 were mapped in the Ingenuity Pathway Analysis (IPA) database, and 10,899 were eligible for functional analysis with a representation in 341 out of 372 IPA canonical pathways. The high representation of randomly selected stickleback transcripts by Blast search in the nucleotide gilthead sea bream database evidenced its high coverage of protein-coding transcripts.
The newly assembled gilthead sea bream transcriptome represents a progress in genomic resources for this species, as it probably contains more than 75% of actively transcribed genes, constituting a valuable tool to assist studies on functional genomics and future genome projects.
KeywordsSparus aurata Next-generation sequencing De novo assembly Transcriptome Database
The gilthead sea bream (Sparus aurata) is a member of the Sparidae family widely and successfully cultured in the Mediterranean region. During the last decade more than 1,200 scientific papers have focused on gilthead sea bream nutrition, immune response, physiology and genetics. This high valuable fish for aquaculture industry becomes, thereby, an interesting animal model, and the development of molecular and genomic tools for research is highly desirable. Previous attempts have been made in this way and the assembly and annotation of more than 40,000 expressed sequence tags (ESTs) allowed the development of specific gilthead sea bream microarrays that have been used in transcriptomic studies of crowding stress  and parasite and nutritional challenges [2, 3]. Microarray approaches have also been done in this species to assess transcriptome differences between two early larval stages , to underline the liver transcriptomic response after cortisol administration [5, 6] or to identify the key genes involved in the re-epithelialization process after scale removal . Nevertheless, genomic tools still remain relatively scarce in gilthead sea bream and need to be further improved in comparison to other cultured fish, such as Atlantic salmon (Salmo salar) , common carp (Cyprinus carpio) , Nile tilapia (Oreochromis niloticus)  or turbot (Scophthalmus maximus) , for which large ESTs collections generated by Sanger sequencing of cDNA libraries are currently available.
With the advent of the next generation sequencing technologies, the gathering of large amounts of sequence data for a given organism at affordable costs is more feasible , and high-throughput 454 pyrosequencing has been used to explore the transcriptome of rainbow trout (Oncorhynchus mykiss) [13, 14], Atlantic cod (Gadus morhua)  and turbot . In the case of gilthead sea bream, two deep sequencing studies have been reported from whole larval tissues  and skeletal muscle , yielding 68,289 and 43,461 assembled sequences, respectively. Nevertheless, the assembly of high-throughput sequences from one unique tissue usually results in relatively short sequences. As the annotation success of a sequencing project is highly dependent on the size of the assembled sequences, approaches conducted to obtain longer sequences become desirable. At this respect, it has been proved in some animal models, including rainbow trout [13, 14], that the use of two or more tissues for sequencing and assembly increases the number of annotated genes. On this basis, the primary goal of the present study was to generate a large amount of gilthead sea bream transcriptomic reads from metabolically and immunologically relevant tissues by means of the construction of five 454 pyrosequencing libraries, combining them with Sanger sequences from public repositories and our own published data. It comprised nine previously constructed suppression subtractive hybridization (SSH) libraries that were obtained from a variety of tissues (liver, gills, brain, intestine, head kidney, adipose tissue) from animals exposed to confinement stress, parasite infection or a nutritional stress (essential fatty acid deficiencies) [1–3]. Having this into account, tissues selected for the improvement of the current knowledge on gilthead sea bream transcriptome by means of high-throughput sequencing were related to animal growth (skeletal muscle), nutrient digestion (intestine) and immune response (head kidney at two stages of parasitic infection). Because of the importance of the development of non-lethal diagnostic methods, blood was also considered for 454 sequencing. The second goal of this work was to build a reliable assembly database, with a high confidence of functional annotation by means of similarity searches, gene ontology (GO) terms and detailed insights on pathway analysis for its use as a practical tool by the scientific community. This will assist gene discovery and studies on functional genomics, as well as future genome projects of this important fish species.
Results and discussion
Statistics for 454 pyrosequencing libraries
Average read length (bp)
Number of contigs
Average contig length (bp)
Number of singletons
Average singleton length (nt)
Total consensus Megabases
Average sequences coverage
De novo assembly statistics
Number of contigs
Average contig length (bp)
Number of singletons
Average singleton length (nt)
Total consensus Megabases
Average sequences coverage
With independence of a high coverage of protein-coding transcripts, more than 60,000 assemblies still remain without annotation. This relatively high number of unknown sequences could correspond to more divergently evolved genes through vertebrate evolution, though we cannot exclude that some of these sequences may result from assembly artifacts. Alternatively, some of these sequences might correspond to long non-coding RNAs (lncRNAs), which are now emerging as an important class of regulatory transcripts with an extent that increases much more with organism complexity [24, 25]. Otherwise, high-throughput sequencing is prone to sequencing errors at homopolymer regions, even when assembled at high coverage, which often give rise to a range of artificial sequences [26–28]. To overcome this issue, an in silico correction step was introduced in the pipeline procedure, which allowed to obtain continuous open reading frames for annotated sequences avoiding frameshifting by edition (insertion or deletion) of single nucleotides at homopolymer regions. With this newly developed tool, up to 34% of annotated sequences (21,748 out of 63,880 assembled sequences) were detected to carry one or more frameshifts. Among them, 21,105 were satisfactorily corrected with the pipeline and only 643 needed a manual curation.
Blast2GO analysis of the different annotations (Figure 4) reveals that the most abundant GO terms related to biological processes were transport (GO:0006810; 2,091 different genes), protein modification process (GO:0036211; 1,558 genes), response to stress (GO:0006950; 1,468 genes), regulation of biological quality (GO:0065008; 1,430 genes) and negative regulation of cellular process (GO:0048523; 1,428 genes). Other highly represented terms are immune system process (GO:0002376; 865 different genes), haemopoiesis (GO:0030097; 289 genes) and coagulation (GO:0050817; 290 genes), which is not surprising given that 3 out of five 454 libraries were derived from blood and head kidney. Regarding skeletal muscle, tissue-specific biological processes like muscle structure development (GO:0061061; 302 genes), muscle cell differentiation (GO:0042692; 200 genes) or muscle contraction (GO:0006936; 156 genes) were also highly represented in the annotated gilthead sea bream transcriptome. Likewise, the intestine participates in nutrient digestion and absorption with also an important role in xenobiotic metabolism , and accordingly genes related to the molecular functions hydrolase activity (GO:0016787; 1,821 genes) and oxidoreductase activity (GO:0016491; 637 genes) were highly abundant in our database.
Most significant canonical pathways determined by Ingenuity Pathway Analysis and represented in the assembled gilthead sea bream transcriptome
Regulation of eIF4 and p70S6K Signaling
Protein Ubiquitination Pathway
Induction of Apoptosis by HIV1
Glucocorticoid Receptor Signaling
Molecular Mechanisms of Cancer
Role of NFAT in Cardiac Hypertrophy
FLT3 Signaling in Hematopoietics
Inositol Phosphate Metabolism
Fcγ Receptor-mediated Phagocytosis in
Signaling by Rho Family GTPases
Macrophages and Monocytes
Estrogen Receptor Signaling
Acute Phase Response Signaling
Small Cell Lung Cancer Signaling
Breast Cancer Regulation by Stathmin1
T Cell Receptor Signaling
Type II Diabetes Mellitus Signaling
B Cell Receptor Signaling
Huntington’s Disease Signaling
Production of Nitric Oxide and Reactive
Oxygen Species in Macrophages
LPS-stimulated MAPK Signaling
Insulin Receptor Signaling
Cyclins and Cell Cycle Regulation
PI3K Signaling in B Lymphocytes
Tight Junction Signaling
Chronic Myeloid Leukemia Signaling
NRF2-mediated Oxidative Stress Response
Prostate Cancer Signaling
Actin Cytoskeleton Signaling
Nicotinate and Nicotinamide Metabolism
Role of Tissue Factor in Cancer
Hereditary Breast Cancer Signaling
Pentose Phosphate Pathway
Pancreatic Adenocarcinoma Signaling
NF-κB Activation by Viruses
Mitotic Roles of Polo-Like Kinase
Acute Myeloid Leukemia Signaling
Cardiac Hypertrophy Signaling
Endometrial Cancer Signaling
Germ Cell-Sertoli Cell Junction Signaling
Melanocyte Development and Pigmentation
Protein Kinase A Signaling
Aldosterone Signaling in Epithelial Cells
Colorectal Cancer Metastasis Signaling
RANK Signaling in Osteoclasts
IL-17A Signaling in Airway Cells
G Beta Gamma Signaling
fMLP Signaling in Neutrophils
Phospholipase C Signaling
Role of BRCA1 in DNA Damage Response
P2Y Purigenic Receptor Signaling Pathway
Death Receptor Signaling
Leukocyte Extravasation Signaling
Starch and Sucrose Metabolism
Virus Entry via Endocytic Pathways
Non-Small Cell Lung Cancer Signaling
Growth Hormone Signaling
Role of NFAT in Regulation of the Immune
Lymphotoxin β Receptor Signaling
In order to construct a reliable database of protein-coding transcripts with a minimum of frameshift errors or redundancies, a strict filtering was applied to the annotated sequences after the in silico homopolymer correction step. Sequences with a significant hit against Swissprot were selected, and then sequences encoding for a same annotation regardless of the species were clustered. Filtering selected for each annotation non-overlaping sequences and putative complete open reading frames, that made a total of 17,809 sequences that were uploaded in our nucleotide database (http://www.nutrigroup-iats.org/seabreamdb). This online resource is intended to become a dynamic and useful tool for scientific community. With this interface, data can be queried using different strategies, such as several Blast options or direct word search for annotation or GO terms. Search results provide additional information for each sequence, like its length, depth or the related Swissprot accession for the assigned annotation, among others.
A gilthead sea bream transcriptome database has been constructed by de novo assembly of five normalized 454 libraries from metabolically and immunologically relevant tissues combined with public sequences that included nine SSH libraries. This approach yielded 125,263 different sequences, and for 63,880 a reliable annotation was found, resulting in 21,384 different gene descriptions that comprised a vast array of functional categories and biological pathways. This constitutes the largest and most complete transcriptome reported to date for gilthead sea bream, having a size and depth equivalent to those reported in the Ensembl genome database for Atlantic cod and other cultured fish species. The information of annotated contigs has been semi-automatically corrected and filtered for redundancies, and is stored in a web database (http://www.nutrigroup-iats.org/seabreamdb) that has been provided with Blast and other search options for the scientific community. This represents a valuable tool to assist fish phenotyping and the concomitant development of molecular biomarker panels (microarrays or process-focused PCR-arrays) of prognostic and diagnostic value to cope with developmental-, nutritional-, environmental- and disease-related stressors.
Experimental setup and tissue sampling
Juveniles of gilthead sea bream were maintained under intensive rearing conditions in the indoor experimental facilities of the Institute of Aquaculture Torre de la Sal (IATS), following standard conditions of photoperiod and temperature at our latitude (40º5′N, 0º10′E). Animals were fed with conventional diets and culture densities remained lower than 15 Kg/m3. Naïve stock fish were sampled after overnight fasting for blood, skeletal muscle and intestine. Blood was taken from the caudal vein with EDTA-treated syringes and 150 μl were disposed in cooled eppendorf tubes with 500 μl of lysis solution until RNA extraction (Real Total RNA Spin Blood Kit, Durviz). Prior to tissue collection, fish were killed by cervical section. Skeletal muscle and intestine samples were then rapidly excised, frozen in liquid nitrogen and stored at −80°C until RNA extraction. Additionally, another stock of fish was infected with the myxosporean parasite Enteromyxum leei by exposure to contaminated effluent, following the infection procedure previously described , and samples of head kidney, the equivalent to mammalian bone marrow, were taken after 38 (HK-38, early immune response) and 105 days (HK-105, chronic immune response) post exposure. All procedures were carried out according to the national and institutional regulations on animal experimental handling (IATS-CSIC Review Board).
Total RNA from each individual tissue sample was isolated by means of the Ambion MagMax-96 for Microarray kit (Applied Biosystems) after tissue homogenization in TRI reagent at a concentration of 100 mg/ml following the manufacturers’ instructions. Purification of total RNA from fish blood was performed according to the procedure of Real Total RNA Spin Blood Kit (Durviz). RNA quantity and purity was determined by Nanodrop (Thermo Scientific) and Agilent 2100 bioanalyzer (Agilent Technologies). Tissue samples of higher quality (absorbance ratios at 260 nm/280 nm above 1.9 and RNA integrity numbers between 9.2 and 10) were selected for high-throughput sequencing.
cDNA synthesis and normalization
For each tissue and condition (skeletal muscle, intestine, blood, HK-38, HK-105), a single RNA sample (700 ng) was taken for polyA cDNA synthesis using the MINT kit (Evrogen). To increase the rate recovery of rare and unique transcripts, amplified cDNAs were normalized by duplex-specific nuclease with the Trimmer kit (Evrogen)  following manufacturer’s indications. Normalized cDNAs samples were measured with Quant-iT PicoGreen dsDNA quantification Kit (Life Technologies) using a VersaFluorTM Fluorometer system (Bio-Rad).
Libraries construction, pyrosequencing and assembly
For each normalized cDNA, 500 ng were sheared into small fragments (250–600 nt) by nebulization with compressed nitrogen. Then, sequencing adapters were ligated to the blunt ends of fragments, and an emulsion PCR (emPCR) was performed. After emPCR, beads with the cloned amplicons were enriched, loaded onto the 454 microtiter plate and sequenced with a Titanium GS FLX 454 platform (Roche). The sff files containing all reads for each library have been deposited to NCBI Short Read Archive under accession ERP001436.
The quality of the reads was assessed with PERL scripts developed at Lifesequencing S.L. (Valencia, Spain) for trimming of adaptors and validation of high quality sequences. Only high-quality reads (q-value > 25) passed the filter for further assembly and they were assembled using Newbler version 2.5.3 program (454 Life Science-Roche) with the parameters by default with the cDNA option.
De novoassembly and annotation
According to the presented scheme pipeline (Figure 1), the set of nucleotide sequences for de novo gilthead sea bream transcriptome assembly was composed of i) unassembled high-quality reads from five 454 libraries (2,945.914 reads), ii) available mRNA sequences from GenBank database (1,733 sequences) and iii) EST collections made available by the Consortium of Marine Genomics Europe  and the AQUAFIRST  and AQUAMAX  EU projects (80,956 ESTs). All sequences were edited to remove vector and adaptor sequences, and cleaned and filtered before assembly and annotation by the SIGENAE information system (INRA Toulouse, France). Cleaning involved masking of poor quality bases and low complexity sequences such as polyA tails. Filtering removed contaminating sequences (bacteria, yeast) and only high quality sequences with more than 100 bases in length were retained. The global assembly was performed by means of the MIRA software version 3.2.0 [35, 36].
Assembled contigs and singletons were annotated searching sequence homologies against following databases: UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, RefSeq Protein, Pfam, RefSeq RNA Index Blast, TIGR Fugu, TIGR Medaka, TIGR Salmon, TIGR Trout, TIGR ZebraFish, UniGene FatheadMinnow, UniGene Fugu, UniGene Human, UniGene KilliFish, UniGene Medaka, UniGene Salmon, UniGene Trout, UniGene ZebraFish, Ensembl Fugu transcripts, Ensembl Human transcripts, Ensembl Medaka transcripts, Ensembl Tetraodon transcripts and Ensembl ZebraFish transcripts. The e-value threshold value to determine similarities was set to 1e-5, and the Uniprot entry to which they received the highest similarity was usually assigned as the gene identity.
Database quality control
Transcriptome sequences of the fish three-spined stickleback (Gasterosteus aculeatus) were retrieved from the Ensembl genome database (http://www.ensembl.org/Gasterosteus_aculeatus). Then, randomly selected stickleback sequences (n=200) were compared in a similarity search by BlastX (E-value < 1e-9) into the newly developed gilthead sea bream assembled transcriptome database. For annotated stickleback transcripts, hit results were only considered positive when the most similar gilthead sea bream sequence shared the same annotation.
Automated frameshift correction and redundancy filtering
To correct the 454 sequencing errors due to frameshifting at homopolymer regions , a pipeline based on the combination of the NCBI-Blast package  with HMMER [http://hmmer.janelia.org], ClustalW , HMM-FRAME , PFAM  and GyDB  databases of HMMs was designed. Additional file 3 implements a most extensive detail of this pipeline and the corrections (punctual insertions/deletions) performed on each sequence.
For more extensive information about the statistics composing the algorithm please refer the Blast user guide at NCBI [http://www.ncbi.nlm.nih.gov/books/NBK1763/]. The software reads in csv format the annotation file of the whole transcriptome and then lets to state one or more classificatory filters to run the algorithm. When various sequences shared the same SwissProt description and a nucleotide identity higher than 95%, they were grouped in the same cluster, retaining within the same categorization the sequences covering non-overlapping regions of the mapped Refseq protein or complete sequences sharing the same description.
Gene ontology annotation was made from the nucleotide sequences of the most representative contig/singleton for each gene identity by means of the Blast2GO software  with a threshold cutoff set at 1e-3. Pathway analysis of annotated sequences was performed using the IPA software (http://www.ingenuity.com). The dynamic canonical pathways contained in IPA are well-characterized metabolic and cell-signaling pathways that come from specific journal articles, review articles, textbooks and the Kyoto encyclopedia of genes and genomes (KEGG). The IPA canonical pathways display genes/proteins involved, their interactions and the cellular and metabolic reactions in which the pathway is involved. To provide analysis, IPA must be supplied with the Uniprot accession of genes belonging to one of the following model species: Human (Homo sapiens), house mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), thale cress (Arabidopsis thaliana), the nematode Caenorhabditis elegans, the bacteria Escherichia coli or the yeast Saccharomyces cerevisiae. Hence, for each annotated sea bream sequence the protein equivalence for one of the three higher vertebrates IPA model species was searched in Uniprot, and the corresponding accession number was included in the analysis.
Fisher’s exact test was used in IPA analysis to estimate the significance of the incidence of different canonical pathways. This method calculates the probability that the association between experimental gene set and the reference gene set associated with a canonical pathway is due to random chance. A P-value ≤ 0.05 was considered statistically significant and indicated a nonrandom enrichment of an experimental dataset by members of a specific pathway.
This work was funded under EU seventh Framework Programme by projects ARRAINA (Advanced Research Initiatives for Nutrition & Aquaculture, FP7/2007-2013; grant agreement nº 288925) and AQUAEXCEL (Aquaculture Infrastructures for Excellence in European Fish Research, FP7/2007-2012; grant agreement nº 262336). The views expressed in this work are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission. Additional funding was obtained by Generalitat Valenciana (research grant PROMETEO 2010/006) and Spanish Government through AQUAGENOMICS (Ingenio-2010 Programme), AQUAFAT (AGL2009-07797) and ENTEROMYXCONTROL (AGL2009-13282-C02-01) projects.
- Calduch-Giner JA, Davey G, Saera-Vila A, Houeix B, Talbot A, Prunet P, Cairns MT, Pérez-Sánchez J: Use of microarray technology to assess the time course of liver stress response after confinement exposure in gilthead sea bream (Sparus aurata L.). BMC Genomics. 2010, 11: 193-10.1186/1471-2164-11-193.PubMed CentralView ArticlePubMed
- Davey GC, Calduch-Giner JA, Houeix B, Talbot A, Sitjà-Bobadilla A, Prunet P, Pérez-Sánchez J, Cairns MT: Molecular profiling of the gilthead sea bream (Sparus aurata L.) response to chronic exposure to the myxosporean parasite Enteromyxum leei. Mol Immunol. 2011, 48: 2102-2112. 10.1016/j.molimm.2011.07.003.View ArticlePubMed
- Calduch-Giner JA, Sitjà-Bobadilla A, Davey GC, Cairns MT, Kaushik S, Pérez-Sánchez J: Dietary vegetable oils do not alter the intestine transcriptome of gilthead sea bream (Sparus aurata), but modulate the transcriptomic response to infection with Enteromyxum leei. BMC Genomics. 2012, 13: 470-10.1186/1471-2164-13-470.PubMed CentralView ArticlePubMed
- Ferraresso S, Vitulo N, Mininni AN, Romualdi C, Cardazzo B, Negrisolo E, Reinhardt R, Canario AVM, Patarnello T, Bargelloni L: Development and validation of a gene expression oligo microarray for the gilthead sea bream (Sparus aurata). BMC Genomics. 2008, 9: 580-10.1186/1471-2164-9-580.PubMed CentralView ArticlePubMed
- Sarropoulou E, Kotoulas G, Power DM, Geisler R: Gene expression profiling of gilthead sea bream during early development and detection of stress-related genes by the application of cDNA microarray technology. Physiol Genomics. 2005, 23: 182-191. 10.1152/physiolgenomics.00139.2005.View ArticlePubMed
- Teles M, Boltaña S, Reyes-López F, Santos MA, Mackenzie S, Tort L: Effects of chronic cortisol administration of GR and the liver transcriptome in Sparus aurata. Mar Biotechnol. 2013, 15: 104-114. 10.1007/s10126-012-9467-y.View ArticlePubMed
- Vieira FA, Gregório SF, Ferraresso S, Thorne MAS, Costa R, Milan M, Bargelloni L, Clark MS, Canario AVM, Power DM: Skin healing and scale regeneration in fed and unfed sea bream Sparus auratus. BMC Genomics. 2011, 12: 490-10.1186/1471-2164-12-490.PubMed CentralView ArticlePubMed
- Adzhubei AA, Vlasova AV, Hagen-Larsen H, Ruden TA, Laerdahl JK, Hoyheim B: Annotated expressed sequence tags (ESTs) from pre-smolt Atlantic salmon (Salmo salar) in a searchable data resource. BMC Genomics. 2007, 8: 209-10.1186/1471-2164-8-209.PubMed CentralView ArticlePubMed
- Gonzalez S, Chatziandreou N, Nielsen M, Li W, Rogers J, Taylor R, Santos Y, Cossins A: Cutaneous immune responses in the common carp detected using transcript analysis. Mol Immunol. 2007, 44: 1675-1690.View Article
- Lee BY, Howe AE, Conte MA, D’Cotta H, Pepey E, Baroiller JF, di Palma F, Carleton KL, Kocher TD: An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags. BMC Genomics. 2010, 11: 278-10.1186/1471-2164-11-278.PubMed CentralView ArticlePubMed
- Pardo BG, Fernández C, Millán A, Bouza C, Vázquez-López A, Vera M, Alvarez-Dios JA, Calaza M, Gómez-Tato A, Vázquez M, Cabaleiros S, Magariños B, Lemos ML, Leiro JM, Martínez P: Expressed sequence tags (ESTs) from immune tissues of turbot (Scophthalmus maximus) challenged by pathogens. BMC Vet Res. 2008, 4: 37-10.1186/1746-6148-4-37.PubMed CentralView ArticlePubMed
- Ansorge WJ: Next-generation DNA sequencing techniques. New Biotechnology. 2009, 25: 195-203. 10.1016/j.nbt.2008.12.009.View ArticlePubMed
- Salem M, Rexroad CE, Wang J, Thorgaard GH, Yao J: Characterization of the rainbow trout transcriptome using Sanger and 454-pyrosequencing approaches. BMC Genomics. 2010, 11: 564-10.1186/1471-2164-11-564.PubMed CentralView ArticlePubMed
- Le Cam A, Bobe J, Bouchez O, Cabau C, Kah O, Klopp C, Lareyre JJ, Le Guen I, Lluch J, Montfort J, Moreews F, Nicol B, Prunet P, Rescan PY, Servili A, Guiguen Y: Characterization of rainbow trout gonad, brain and gill deep cDNA repertoires using a Roche 454-Titanium sequencing approach. Gene. 2012, 500: 32-39. 10.1016/j.gene.2012.03.053.View ArticlePubMed
- Johansen SD, Karlsen BO, Furmanek T, Andreassen M, Jɵ rgensen TE, Bizuayehu TT, Breines R, Emblem Å, Kettunen P, Luukko K, Edvardsen RB, Nordeide JT, Coucheron DH, Moum T: RNA deep sequencing of the Atlantic cod transcriptome. Comp Biochem Physiol part D. 2011, 6: 18-22.
- Pereiro P, Balseiro P, Romero A, Dios S, Forn-Cuni G, Fuste B, Planas JV, Beltran S, Novoa B, Figueras A: High-throughput sequence analysis of turbot (Scophthalmus maximus) transcriptome using 454-pyrosequencing for the discovery of antiviral immune genes. PLoS One. 2012, 7: e35369-10.1371/journal.pone.0035369.PubMed CentralView ArticlePubMed
- Yúfera M, Halm S, Beltran S, Fusté B, Planas JV, Martínez-Rodríguez G: Transcriptomic characterization of the larval stage in gilthead seabream (Sparus aurata) by 454 pyrosequencing. Mar Biotechnol. 2012, 14: 423-435. 10.1007/s10126-011-9422-3.View ArticlePubMed
- Garcia de la Serrana Castillo D, Estevez A, Andree K, Johnston IA: Fast skeletal muscle transcriptome of the gilthead sea bream (Sparus aurata) determined by next generation sequencing. BMC Genomics. 2012, 13: 181-10.1186/1471-2164-13-181.View Article
- Meyer E, Aglyamova GA, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV: Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 2009, 10: 219-10.1186/1471-2164-10-219.PubMed CentralView ArticlePubMed
- Gao Z, Luo W, Liu H, Zeng C, Liu X, Yi S, Wang W: Transcriptome analysis and SSR/SNP markers information of the blunt snout bream (Megalobrama amblycephala). PLoS One. 2012, 7: e42637-10.1371/journal.pone.0042637.PubMed CentralView ArticlePubMed
- Fu B, He S: Transcriptome analysis of silver carp (Hypophthalmichthys molitrix) by paired-end DNA sequencing. DNA Res. 2012, 19: 131-142. 10.1093/dnares/dsr046.PubMed CentralView ArticlePubMed
- Roest Crollius H, Weissenbach J: Fish genomics and biology. Genome Res. 2005, 15: 1675-1682. 10.1101/gr.3735805.View ArticlePubMed
- Sato Y, Hashiguchi Y, Nishida M: Temporal pattern of loss/persistence of duplicate genes involved in signal transduction and metabolic pathways after teleost-specific genome duplication. BMC Evol Biol. 2009, 9: 127-10.1186/1471-2148-9-127.PubMed CentralView ArticlePubMed
- Mattick JS: The central role of RNA in human development and cognition. FEBS Lett. 2011, 585: 1600-1616. 10.1016/j.febslet.2011.05.001.View ArticlePubMed
- Kapranov P, St Laurent G: Dark matter RNA: existence, function, and controversy. Front Genet. 2012, 3: 57-PubMed CentralPubMed
- Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 2007, 8: R143-10.1186/gb-2007-8-7-r143.PubMed CentralView ArticlePubMed
- Balzer S, Malde K, Lanzén A, Sharma A, Jonassen I: Characteristics of 454 pyrosequencing data – enabling realistic simulation with flowsim. Bioinformatics. 2010, 26: i420-i425. 10.1093/bioinformatics/btq365.PubMed CentralView ArticlePubMed
- Loman N, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ: Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012, 30: 434-439. 10.1038/nbt.2198.View ArticlePubMed
- Van Herwaarden AE, Van Waterschoot RAB, Schinkel AH: How important is intestinal cytochrome P450 3A metabolism?. Trends Pharmacol Sci. 2009, 30: 223-227. 10.1016/j.tips.2009.02.003.View ArticlePubMed
- Khatri P, Sirota M, Butte AJ: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012, 8: e1002375-10.1371/journal.pcbi.1002375.PubMed CentralView ArticlePubMed
- Li S, Pozhitkov A, Ryan RA, Manning CS, Brown-Peterson N, Brouwer M: Constructing a fish metabolic network model. Genome Biol. 2010, 11: R115-10.1186/gb-2010-11-11-r115.PubMed CentralView ArticlePubMed
- Sitjà-Bobadilla A, Diamant A, Palenzuela O, Alvarez-Pellitero P: Effect of host factors and experimental conditions on the horizontal transmission of Enteromyxum leei (Myxozoa) to gilthead sea bream (Sparus aurata L.) and European sea bass (Dicentrarchus labrax L.). J Fish Dis. 2007, 30: 243-250. 10.1111/j.1365-2761.2007.00804.x.View ArticlePubMed
- Zhulidov PA, Bogdanova EA, Shcheglov AS, Vagner LL, Khaspekov GL, Kozhemyako VB, Matz MV, Meleshkevitch E, Moroz LL, Lukyanov SA, Shagin DA: Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 2004, 32: e37-10.1093/nar/gnh031.PubMed CentralView ArticlePubMed
- Louro B, Passos ALS, Souche EL, Tsigenopoulos C, Beck A, Lagnel J, Bonhomme F, Cancela L, Cerdà J, Clark M, Lubzens E, Magoulas A, Planas JV, Volckaert FAM, Reinhardt R, Canario AVM: Gilthead sea bream (Sparus auratus) and European sea bass (Dicentrarchus labrax) expressed sequence tags: Characterization, tissue-specific expression and gene markers. Marine Genomics. 2010, 3: 179-191. 10.1016/j.margen.2010.09.005.View ArticlePubMed
- Chevreux B, Wetter T, Suhai S: Genome sequence assembly using trace signals and additional sequence information computer science and biology. Proceedings of the German Conference on Bioinformatics. 1999, 45-56.
- Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WE, Wetter T, Suhai S: Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs. Genome Res. 2004, 14: 1147-1159. 10.1101/gr.1917404.PubMed CentralView ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. 2002, 23: 2.3.1-2.3.22.
- Zhang Y, Sun Y: HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors. BMC Bioinformatics. 2011, 12: 198-10.1186/1471-2105-12-198.PubMed CentralView ArticlePubMed
- Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res. 2012, 40: D290-D301. 10.1093/nar/gkr1065.PubMed CentralView ArticlePubMed
- Llorens C, Futami R, Covelli L, Dominguez-Escriba L, Viu JM, Tamarit D, Aguilar-Rodríguez J, Vicente-Ripolles M, Fuster G, Bernet GP, Maumus F, Munoz-Pomer A, Sempere JM, Latorre A, Moya A: The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 2011, 39: D70-D74. 10.1093/nar/gkq1061.PubMed CentralView ArticlePubMed
- Futami R, Muñoz-Pomer L, Viu JM, Dominguez-Escriba L, Covelli L, Bernet GP, Sempere JM, Moya A, Llorens C: GPRO The professional tool for annotation, management and functional analysis of omic databases. Biotechvana Bioinformatics. 2011, SOFT3
- Conesa A, Götz S, García-Gómez JM, Perol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.