The juvenile gilthead sea bream (Sparus aurata L.) used in the present study originated from a fish farm brood stock kept at the Institute de Recerca i Tecnologia Agroalimentàries (IRTA) at St Carles de la Ràpita (IRTA-SCR, Spain) and were reared from the larval to juvenile stages according to the standard production procedures of this research facility. After thirteen months, two hundred juvenile gilthead sea bream, weighing 88.1 ± 7.3 g (mean ± SD, n = 35), were selected and maintained in two 400 litre tanks (22.5 kg m−3) in a temperature-controlled seawater re-circulation system (IRTAmarTM) at a mean temperature of 21°C (20.7-21.4°C) and natural photoperiod (13 L:11D). Fish were fed a commercial diet (OptiBreamTM, Skretting; pellet size: 2.6 mm; proximate biochemical composition: 46% protein, 18% fat, 7% ash) at a ration level of 3% (m/m) d−1. An adult female of 2 kg body mass that had been held at ambient temperature (annual range: 10-26°C) and natural photoperiod for several years at IRTA-SCR facilities and fed 3% body mass d-1 was also sampled.
In order to obtain the widest possible range of expressed transcript sub-sets, fish were exposed to different water temperatures and fasting. Experiments were conducted in 400 litres cylindrical tanks connected to a re-circulation unit in order to maintain constant water temperature and dissolved oxygen over 85% saturation. Fish (n = 5) were transferred from 21°C to 11°C or 33°C over 48 h. During the treatments fish were fed as previously described, however those maintained at 11°C, stopped feeding after their transfer to low temperature. Additionally, another group of fish maintained at 21°C were fasted for 5 days.
Since transcripts concentration will change over time with treatment fish were sampled at day 3 (n = 2) and day 5 (n = 3) following attainment of the new environmental conditions in order to obtain a broader representation of expressed genes. Fish were sacrificed using an overdose of 1:5,000 (m/v) of bicarbonate-buffered tricaine methanesulphonate (MS222, Sigma, Madrid, Spain) in seawater followed by spinal cord transection. Pure samples of fast skeletal muscle were dissected from dorsal epaxial myotomes at ~ 0.5 fork length (FL) on a pre-chilled glass plate maintained at 0–4°C. Muscle samples were flash frozen in liquid nitrogen and stored at −80°C until further analysis. Fish handling and trials were conducted in September 2009 in accordance with EC Directive 86/609/EEC for animal experimentation.
RNA extraction and dsDNA synthesis
RNA was extracted using QIAzol (QIAGEN, Crawley - West Sussex, UK) following the manufacturer’s recommendations. The integrity of the RNA was confirmed by ethidium bromide gel electrophoresis. RNA concentration, 260/280 and 260/230 ratios were evaluated using a NanoDrop 1000 spectrophotometer (Thermo Fischer Scientific, Waltman, MA). All RNA samples extracted had a 260/280 ratio higher than 1.9 and 260/230 above 2.2. Samples from each experimental condition were pooled in equal concentrations and the RNA integrity, concentration and ratios evaluated again. The pooled RNA samples were used for the following steps.
The dsDNA synthesis was performed using a MINT cDNA synthesis kit (Evrogen, Moscow, Russia) using cDNA synthesis primer described by Meyer et al., 2009  with a broken poly-T to avoid 454 sequencing problems in mono-nucleotide regions (5′-AAGCAGTGGTATCAACGCAGAGTCGCAGTCGGTACTTTTTTCTTTTTTV-3′). For an accurate evaluation of the dsDNA concentration Quati-IT™ PicoGreen® (Invitrogen, Pailey, UK) was used. PicoGreen® fluorescence was detected by a MSPx3000 qPCR machine as previously described .
The transcriptome for each physiological condition was determined using Roche 454 GS FLX Titanium pyrosequencing using the service run by Genepool, University of Edinburgh, School of Biological Sciences. Each physiological condition was sequenced using a half 454-plate generating around 390,000–490,000 reads with an average length of 400 bp. Because of a technical problem an initial run of the fasted sampled yielded reads with an average length of only 300 bp and therefore this plate was repeated. Both plates yielded high quality reads and were therefore used in the subsequent global assembly.
454 assembly and annotation
Around 2,700,000 reads were used to generate the sea bream transcriptome. For the partial assemblies we used the reads generated from each experimental condition. For the fasted treatment partial assembly reads from the 454 plate that yielded average read lengths (400 bp) were used. Reads were assembled using Newbler 2.5 software (Roche, 454 Life-sciences) which performs well for de novo assembly of 454 transcriptome data . Assemblies were run in a Debain Linux system, IBM x3755 8877, with 8 CPU cores (4 x dual-core AMD Opteron), 64-bit, 2.8GHz processor with 128 Gb of RAM maintained by the University of St Andrews.
To avoid assembly problems caused by the reads from highly expressed genes we trimed them using the –vs against a fasta file with the available sequences for these genes in gilthead sea bream (adapters and genes sequences used from trimming are in Additional file 16). Isotigs generated by the Newbler software are contigs that are consistently connected by subsets of reads. Isotigs are longer than contigs and were used for the annotation and transcriptome analysis.
Isotigs were Blasted and annotated using Blast2GO software . Sequences were blasted using Blastx against the NCBI non-redundant protein collection (nr) database with a threshold of 10-3. Annotation was done with an E-value Hit Filter of 10-6 combined with an Annotation Cutoff of 55 and GO weighting of 5. Blast2GO also annotated sequences for functional domains using InterProScan.
NGS and Sanger sequencing comparisons
Known sea bream sequences produced by the SANGER sequencing method were downloaded from GenBank  and blasted (blastn) against the sea bream transcriptome using a BLAST server  generated by the Genepool group. The best hits isotig/GeneBank were aligned using ClustalW  to determine the nature and number of differences.
Successfully annotated isotigs were introduced in the KEGG Automatic Annotation Server (KAAS) . The SBH method, optimized for ESTs annotation, was used against human, chimpanzee, orang-utan, rhesus, mouse, rat, dog, giant panda, cow, pig, horse, opossum, platypus, chicken, clawed frog, zebrafish, fruit fly and nematode pathway databases. For a more detailed reconstruction of the pathway components the PPT-Toolkit-Cell-Biology from motifolio.com was used.
Identification of full-length cDNAs
Annotated isotigs were translated to the longest amino acids sequence possible using the ORF translator tool in Blast2GO package (no longer available). Sequences with more than 150 amino acids that started with a methionine or had a methionine in the first 50 amino acids were manually blasted using NCBI Blast server against nr/nt database . Blast results were analysed to confirm that the translated isotig covered, at least 90% of the sequence with best hits and that cover the whole CDS.
Isotigs successfully annotated were used for microsatellite repeats search using msatcommander-1.0.2-alpha . An isotig was considered to contain a microsatellite if contain any of the following repeated motifs: at least 10 repeated mononucleotides (other than A), 8 repeated di- or trinucleotides, or 6 repeated tetra-, penta- or hexanucleotid motifs. Their position outside coding sequences was confirmed in those microsatellites linked to annotated isotigs by analysing the translated sequences.
Identification of splice variants
For splice variant identification we screened the list of isogroups generated during Newbler assembly. Each isogroup represents a collection of isotigs containing reads that imply connections between the isotigs. Different isotigs from a given isogroup can be used to infer splice variants. Isogroups with non-annotated isotigs were discarded. The screening was focused on detecting splice variants affecting the coding sequence. The isotigs translated sequences from each isogroup were aligned with ClustalW to detect changes in peptide sequence.
Potential splice variants were filtered a second time by blasting them against the stickleback (Gasterosteus aculeatus) genome where possible, or otherwise the green puffer fish (Tetraodon nigroviridis) genome using the Ensembl webpage BLAT algorithm . Loci positive alignments were retrieved. Splice variants sequences and loci were aligned using the Spidey mRNA/genome analyser  to predict changes in the exon composition. Splice variants with potential changes in exon composition were submitted to InterProScan annotation to detect changes in functional domains. Genes with domain annotation that were altered by splicing were experimentally confirmed using conventional PCR.
Identification of transcription factors (TF)
For the detection of transcription factors and molecules associated with transcription such as methyl transferases, histone acetyl transferases and others we screened isotigs annotated with GO levels related to transcription: GO:0006355 (regulation of cellular transcription), GO:0003700 (modulate transcription), GO:0003677 (interacts selectively with DNA), GO:0008134 (TF binding), GO:0033276 (protein complex able to transcription regulation), GO:0043425 (basic Helix-Loop-Helix interactive elements), GO:0016563 (any activity required for initiation or upregulation of transcription) and GO:0045941 (any transcription regulator activity). IDs were checked against a Transcription Factor database to confirm a role in transcription regulation and to categorize them into families  and against the Uniprot database .
Identification of gene paralogues
Because no formal software has been developed specifically for paralogue screening in assemblies from Next Generation Sequencing we used an indirect approximation using the translated isotigs. A list of protein sequences of known genes from mouse (Mus musculus) was downloaded using BioMart tool from ESEMBL . We also downloaded a list of known paralogues from different teleost species: Takifugu rubripes
Oryzias latipes and Danio rerio. Comparisons between proteins groups were performed using Inparanoid 4.0 . Comparisons were performed using the gilthead sea bream translated transcriptome against one of the datasets at time. When at least two different isotigs were identified to represent the same transcript matched with a single mouse gene they were consider as potential paralogues. In addition, if two or more teleost known paralogues matched with two different isotigs they were also considered as potential paralogues. Other relations between transcripts can give similar output from Inparanoid and be included as paralogues: redundant transcripts, splice variants, sequence fragments and wrongly translated isotigs by insertions/deletions. Inparanoid output was explored by aligning translated sequences of paralogues against each other using ClustalW. This exploration allowed us to detect and trim these “False positives” from the list of potential paralogues.
The amino acids sequences of potential paralogues were blasted against the zebrafish (Danio rerio), stickleback (Gasterosteus aculeatus), takifugu (Takifugu rubripes), medaka (Oryzias latipes), green pufferfish (Tetraodon nigroviridis), chicken (Gallus gallus), frog (Xenopus laevis) and human (Homo sapiens) genomes using Essembl . The sequences from the best hits were downloaded. Alignment of the potential paralogues and their orthologues was performed using the GUIDANCE web tool . Only fragments with an alignment confidence score over 0.93 were used for the phylogenetic analysis. The best evolutionary model was estimated for each alignment using MEGA5 software . Maximum Likelihood phylogenetic analysis was constructed, with the best evolutionary model, using the online pipeline from PhylM .
Reads from each experimental condition were mapped against the total isotigs from the global assembly using GS Reference Mapper (Roche, 454 Life Sciences). The number of reads per contig from each condition was extracted using the R statistical package . Chi-square statistic was applied to detect significant differences in the number of reads per condition per isotig. Isotigs with less than 10 reads were excluded from the analysis. A FDR correction was applied to all p-values below 0.05. Plot graphs comparing the contribution of reads from each experimental condition to the isotig formation were constructed using R package.