The aim of the present work was to develop an integrated platform for mRNA expression profiling in the gilthead sea bream. The first step was the construction of a data base of unique transcripts clustering all publicly available mRNA sequences and >50,000 expressed sequence tags (ESTs) originating from a medium-scale EST sequencing project, which had been recently completed, within the framework of the Network of Excellence Marine Genomics Europe. The number of unique clusters obtained is similar to what reported for comparable EST collections in other fish species/stages (stickleback, Japanese medaka, channel catfish, Atlantic halibut, Atlantic salmon, Atlantic cod, fathead minnow , and largemouth bass ). Approximately 40% of these unique transcripts found a significant similarity with at least one annotated gene/protein present in public data bases (see Methods), in agreement with the percentage of annotated clusters for the largemouth bass (46%, ), and slightly lower than the value observed for the pre-smolt Atlantic salmon (50.3%, ), the Atlantic halibut (60%, ), and the channel catfish (51% ). On the other hand, a sufficiently high number of sea bream transcripts could be associated with a GO entry, potentially allowing for the functional analysis of differentially expressed genes. The relatively low number of annotated expressed sequences appears to be a major limitation of most EST sequencing projects in commercial fish, even in those species where the transcriptome has been characterized in greater depth. However, the percentage of annotated transcripts is expected to increase substantially in the near future, when additional draft sequences of fish genomes (e.g. Nile tilapia, Atlantic salmon) will become available. Further sequence information for comparative analysis will also arise from the application of ultra-high throughput DNA sequencing technologies to EST production in non-model species.
The relatively small number of ESTs available for S. aurata did not seems to affect significantly the efficiency of probe design, as for most clusters two non-overlapping probes could be successfully designed. Moreover, for most target sequences a strong correlation was reported between probe-pairs. Only for 385 transcripts (3%) Probe_1 and Probe_2 showed a negative correlation. Several different factors can account for such observation. First, alternative splicing could produce differentially expressed transcripts for the same gene; such a difference can then be revealed by the use of two independent probes per gene. Second, a greater stability of the 3'-end of some transcript might reduce the signal for the 5'-end probe. However, this seems not to be a general phenomenon because no significant bias was observed between 3'-end probes and 5'-end ones. Finally, high sequence similarity across different genes (e.g. recently duplicated loci) might lead to the widely documented problem of probe cross-hybridization or to spurious EST clusters in consequence of assembly errors.
Before normalization and statistical analysis, data for 12% of the total number of probes were removed, following a very stringent criterion (a maximum of two missing spots was allowed for each probe across five biological replicates). Such filtering step was performed to maximize the probability of detecting real differences in gene expression at the expense of some loss of information. Detailed analysis of filtered-out probes shows that 60% of excluded probes in Stage 1 were detected in Stage 4, and vice versa 65% of missing spots in Stage 4 were present in Stage 1. This observation suggests that differential expression between ontogenetic phases rather than poor probe quality might explain why a relatively large number of probes were excluded. It should also be noted that experimental samples represented two early larval stages, where a certain number of "adult-only" genes might not be expressed at all. Finally, both probes (1 and 2) were excluded from the analysis only for less than 4% of all genes (769). For the majority of transcripts either one (3,308 genes) or two probes (15,638) yielded a positive signal in all experiments. This clearly suggests that a "safe" approach in microarray design should incorporate at least two probes per gene.
Repeatability of microarray data, across either technical or biological replicates, appeared to be quite high and not influenced by the presently limited knowledge of the sea bream transcriptome. Good repeatability for the Agilent and other oligo-array platforms was already reported in a large initiative on microarray quality . The results obtained here further confirm this evidence. In the MAQC evaluation single- and two-colour designs were compared . This comparison indicated that data quality is essentially equivalent between the one- and two-color approaches and strongly suggested that this variable need not be a primary factor in decisions regarding experimental microarray design. Repeatability was extremely good also in the case of the gilthead sea bream array (correlation coefficient > 0.99 across technical replicates). The use of just one dye (Cy3) allows for a simplified experimental design and easier comparison across different experiments. At the same time, labeling with only Cy3 is less expensive and it reduces the risk of ozone-mediated dye degradation, as Cy5 is more sensitive to this ubiquitous contaminant. A single color scheme, however, requires a highly efficient signal normalization across experiments. Based on the comparison of Spike-in probe signal between arrays after normalization, cyclic lowess was found to be superior to quantile normalization, and to outperform averaging with median fluorescence value, which is the method suggested by Agilent for one-color array experiments (data not shown). This result is in agreement with evidence reported for other array platforms . In the Agilent array technology, the simplicity and economy of a single color design is coupled with the flexibility of programmable in-situ synthesis of oligonucleotide probes. This feature is extremely important especially for non-model species, where the knowledge of the transcriptome is often substantially incomplete. A flexible array design can accommodate the elimination of unsuitable probes and, more importantly, the subsequent inclusion of additional probes as soon as novel unique transcripts are identified.
The quality of the gilthead sea bream oligo-microarray data was also confirmed after qRT-PCR validation of expression results for selected target genes. The use of qRT-PCR for cross-validation of microarray data is generally limited to the most significant differentially expressed genes. In the present study, genes were selected for validation across the entire range of absolute signal intensity and fold-change. Although this approach cannot substitute for systematic qPCR analysis of all target genes as reported in other studies , it should provide a less biased comparison between microarray- and RT-PCR-technology. In the case of the gilthead sea bream oligo-array, a highly significant positive correlation was observed when comparing individual expression values, further confirming the reliability of the gilthead sea bream array platform. PGK1 was the only exception. For this gene, a positive, but not significant correlation was observed only between results of Probe_1 and qRT-PCR data. This is likely due to the small difference in expression between the two sample groups (mean fold-change estimated from array data is 1.1–1.2). Lack of correlation between microarray and qRT-PCR for genes exhibiting low levels of change (<1.4 fold) has been commonly reported. Indeed usually a two-fold change is considered as the cut-off below which microarray and qRT-PCR data begin to loose correlation . Plotting microarray-estimated fold-changes against qRT-PCR results (see Figure 4) also showed the occurrence of fold-change compression for differences in expression value above one order of magnitude. This is, however, a well-known phenomenon, due to various technical limitations, including limited dynamic range, signal saturations, and cross-hybridizations of microarrays .
As mentioned above, the main focus of the present study was the construction and validation of a microarray platform for the gilthead sea bream. Nevertheless, significant results on the biological process of gilthead sea bream early development were obtained. It should be remarked here that the expression levels of target genes obtained in the present work reflect a mixture of cell types and tissues, as whole larvae were analyzed. Thus, the variation in expression observed in the comparison between Stage 1 and 4 might represent changes in the proportion of different tissues during development rather than changes in specific levels of transcription of target genes. Furthermore, absence of variation in expression may represent the cancelling out of variations in different tissues of opposite signal. Indeed, genes down-regulated in the transition between 1-day old and 4-days old larvae mainly belong to "essential" (housekeeping) biological processes such as DNA replication, cell cycle, and protein synthesis or catabolism. It is therefore likely that as tissue- and cell-differentiation proceeds cell-type and tissue-type specific transcripts start to be produced, leading to a "dilution" of mRNAs encoding housekeeping proteins. A similar effect might cause the observed down-regulation of proteins involved in lipid metabolism, which is essential for cellular and sub-cellular membrane biosynthesis. On the other hand, in Stage 4 larvae the yolk sac is reduced to one-eighth of its original size, with a corresponding reduction in the contribution of yolk lipids as nutrients. Thus, the reduced abundance of mRNAs encoding proteins associated with lipid metabolic processes could actually reflect a transition toward autonomous feeding. In 4-days old larvae mouth opening is initiated, the digestive system is formed, with a lengthened intestine and a pancreatic gland anlage. In keeping with this evidence, digestive enzymes such as elastase, as already reported by Sarropoulou and colleagues , and two different isoforms of chymotrypsinogen [see Additional file 2] begin to appear in the list of up-regulated transcripts. Four-days old larvae also start to show a pigmented eye, as mirrored by the expression of green-sensitive opsin and other eye-specific genes (retinal cone arrestin-3, which is supposed to bind photo-activated opsins, or cathepsin L2, involved in corneal development).
Myogenesis is well underway in early larval stages. The differentiation of embryonic and larval muscle fibres involves a complex temporal sequence of gene activation [35–37] that includes structural and contractile proteins (e.g. myosin, tropomyosin) as well as soluble muscle proteins and enzymes (e.g. parvalbumin, muscle creatine kinase). Unfortunately, little is known on the temporal and spatial organization of gene expression for the maturation and diversification of fish embryonic muscle cells.
In the present study high expression levels of the myogenic regulatory factor MyoD have been detected in both Stage 1 and Stage 4 larvae. Similarly, transcritps encoding proteins involved in muscle contraction such as myosin light chain 1, parvalbumin, tropomyosin, and sarcomeric creatine kinase (ckm) are abundantly expressed. The latter shows strong up-regulation in Stage 4, thus confirming previous findings on the constant increase of ckm expression from the embryo to the adult . Differences of gene expression have been detected also for tropomyosin, increasing in expression as the embryos get older, while myosin and parvalbumin show a weak up-regulation (< 4-fold) in Stage 1 compared to Stage 4, when the larvae has just hatched, as already reported by Sarropoulou and colleagues . Finally, stromal cell derived-factor, a molecule promoting early myogenic differentiation of external cell precursors , appears to be down-regulated in Stage 4 compared to Stage 1.
More in general, signal transduction is a well represented biological process among up-regulated genes, indicating an increasing importance of intra-cellular signaling pathways in parallel with tissue- and cell-differentiation. In some cases, the appearance of specific pathways seems to precede that of the corresponding anatomical organs. For instance, the glucocorticoid receptor is up-regulated in agreement with a functional hypothalamus-pituitary-interrenal axis at an early stage  and suggesting a role of glucocorticoids in early development. The shift from "essential" transcripts toward tissue- and cell-specific ones might also explain the highly significant bias in the percentage of annotated/unknown transcripts between up-regulated and down-regulated genes. A low frequency (21%) of annotated clusters among up-regulated transcripts in Stage 4 larvae was observed when compared to down-regulated ones (80%). Cluster annotation was based essentially on sequence similarity, therefore sea bream transcripts from highly conserved genes are more likely to find a significant match with known sequences from other taxa. A correlation between sequence conservation and protein function/tissue-distribution/expression has been the focus of several studies [40–44]. It seems, at least in mammals, that essential genes (defined on the basis of gene-ablation studies in mice) or housekeeping genes (ubiquitously expressed genes) evolve significantly slower than non-essential or tissue-specific genes. These two categories do not necessarily coincide, but there is a substantial overlapping. In the case of gilthead sea bream expression data, the transition between Stage 1 and Stage 4 larvae represents an increase in tissue- and cell-types with a correspondingly larger proportion of tissue- and cell-specific transcripts. This likely translates into a higher share of essential/housekeeping genes in Stage 1 than in Stage 4, as already evident from GO biological process entries associated with up-regulated and down-regulated genes. Since a significantly higher number of down-regulated transcripts shows a meaningful similarity with putative homologs in other species, it seems likely that essential/housekeeping genes evolve more slowly in the gilthead sea bream as well. Thus, similar selective processes appear to shape the evolution of protein-encoding genes in both lower and higher vertebrates.