Profiling of infection specific mRNA transcripts of the European seabass Dicentrarchus labrax

Background The European seabass (Dicentrarchus labrax), one of the most extensively cultured species in European aquaculture productions, is, along with the gilthead sea bream (Sparus aurata), a prospective model species for the Perciformes which includes several other commercially important species. Massive mortalities may be caused by bacterial or viral infections in intensive aquaculture production. Revealing transcripts involved in immune response and studying their relative expression enhances the understanding of the immune response mechanism and consequently also the creation of vaccines. The analysis of expressed sequence tags (EST) is an efficient and easy approach for gene discovery, comparative genomics and for examining gene expression in specific tissues in a qualitative and quantitative way. Results Here we describe the construction, analysis and comparison of a total of ten cDNA libraries, six from different tissues infected with V. anguillarum (liver, spleen, head kidney, gill, peritoneal exudates and intestine) and four cDNA libraries from different tissues infected with Nodavirus (liver, spleen, head kidney and brain). In total 9605 sequences representing 3075 (32%) unique sequences (set of sequences obtained after clustering) were obtained and analysed. Among the sequences several immune-related proteins were identified for the first time in the order of Perciformes as well as in Teleostei. Conclusion The present study provides new information to the Gene Index of seabass. It gives a unigene set that will make a significant contribution to functional genomic studies and to studies of differential gene expression in relation to the immune system. In addition some of the potentially interesting genes identified by in silico analysis and confirmed by real-time PCR are putative biomarkers for bacterial and viral infections in fish.


Background
The European seabass Dicentrarchus labrax is one of the most extensively aquacultured fish species in the Mediterranean, resulting in steadily increasing pressure on producers. Consequently, it is important to acquire new techniques and knowledge in order to improve aquaculture practices. Detailed information concerning growth, health, disease resistance and flesh quality benefit from the molecular as well as from the physiological point of view can provide illuminating new findings leading to improved aquaculture techniques. Several efforts have been made up till now to enrich the genomic resources in aquaculture production in the Mediterranean (chiefly for the gilthead sea bream Sparus aurata and for the European seabass Dicentrarchus labrax), e.g. Marine Genomics Europe (Network of Excellence) (CT-2003-505403), [1][2][3][4][5] as well as in the Atlantic (e.g. Atlantic halibut Hippoglossus hippoglossus, Salmon Salmo salar) e.g. [6][7][8][9][10][11][12][13]. These studies focused mainly on non-challenged tissues in order to obtain a first unigene catalogue. Aquaculture production however is affected by viral and pathogenic bacteria, particularly in respect of D. labrax which has been shown to be the species most sensitive to pathogenic bacteria such as Vibrio anguillarum [14] and to viral infections such as Nodavirus [15,16]. There are several commercial vaccines which provide protection against infection from V. anguillarum although the mechanism of immune response still remains unknown. Nodavirus can cause massive mortalities [17] and cannot be controlled so far because the production of commercial vaccines here is still in its infancy. In the present study we have generated a collection of EST sequences from tissues of European seabass infected with V. anguillarum and Nodavirus. Within this collection we were able to isolate immune relevant genes, and have gone on to compare gene expression in different tissues after viral and pathogenic bacteria infection. Additionally we determined in silico differential expression between the two infections. In this context the construction and analysis of a total of ten cDNA libraries are described; six cDNA libraries were from tissues of the European seabass infected with V. anguillarum (liver, spleen, head kidney, peritoneal exudate, gill and intestine) with peritoneal exudate, gill and intestine as target organs for V. anguillarum infections, and four cDNA libraries were from tissues of the European seabass infected with Nodavirus (liver, spleen, head kidney and brain) with the brain as target organ of the virus. Comparisons between the predicted European seabass peptide data set and the zebrafish, medaka, stickleback, tetraodon and human proteomes were performed. Genes showing in silico differential expression between Nodavirus infection and V. anguillarum infection were further analysed by real-time PCR.

Summary of ESTs from the cDNA libraries infected with Nodavirus and V. anguillarum
The amplified libraries contained insert size from approximately 0.5 to 2.0 kb. Single pass sequencing was per-formed resulting in 9605 high quality sequences. All sequences were submitted to the EST database (dbEST http://www.ncbi.nlm.nih.gov/projects/dbEST/ with the accession numbers FK939975 -FK944381, FL484477 -FL488763 and FL501471 -FL502381. A set of 3075 unique sequences was generated. Among the unique sequences (3075) [see Additional file 1] 371 [12%, see Additional file 2] sequences contained Simple Sequence Repeats (SSR). Cluster analyses performed for each library separately (Table 1a and Table 1b) revealed redundancy rates which varied from 72% (28% unique sequences) in intestine cDNA library infected with V. anguillarum to 37% (63% unique sequences) in spleen cDNA library infected with V. anguillarum (Table 1a). The set of unique EST sequences was annotated with Blast2GO which carries out BLASTX searches and attempts to assign function and GO classification. Out of the 3075 unique EST sequences submitted to GO2Blast for annotation and GO classification, 1521 sequences fell into 14 categories of biological process function at GO annotation level 2 ( Fig.  1), where two categories, cellular process and metabolic process, were predominant. The category "immune system process" was represented by 79 transcripts.

EST matches with known function
Out of the 3075 EST sequences, 1246 (~ 41%) had a positive hit after submission to BLASTX database search. Among those EST sequences with a known function, 128 homologues were found to be involved in the immune response and 79 of these were grouped into the GO category "immune system process". The remaining 49 transcripts were manually determined to be involved in the immune response (see Additional file 3). Immune related transcripts isolated for the first time for seabass amounted to 115 (Table  2). Among transcripts of interest, the transcript encoding for an important antimicrobial protein, hepicidin, was isolated. Aligning EST sequences grouped into one contig can provide additional data. In the case of hepcidin it is probable that different isoforms are grouped together. Alignments of other cDNA sequences either showed alternative polyadenylation or they showed in silico polymorphism of microsatellite DNA as for instance the transcript coding for cysteine-rich protein 1-I (see Additional file 4).

Expression analysis
The test to compare multiple cDNA libraries with each other [22] revealed that the genes with the value > 6 of the test statistic R can be confidently considered as genes with true variation with the slope of 1.081 and are therefore not significantly different from -1 at the 5% level (see Additional file 6). The hits with R > 6 are in total 109 out of 2234 contigs resulting from EST sequences of liver, spleen and head kidney infected with Nodavirus and V. anguillarum. The list of the 109 transcripts with R > 6 and their putative homologues are shown in Additional file 5. It is interesting to note that although most transcripts were abundantly expressed in both bacterial and viral infected tissues, not all of them could be considered as specific markers of a specific infection. For example, fructose-1,6-biphosphate aldolase A, hepcidin, apolipoprotein A1 precursor, ferritin heavy chain and chemokine receptor 4 transcripts were found in V. anguillaruminfected tissues, though rarely in Nodavirus-infected tissues (see Additional file 5). Conversely, fructose-1,6biphosphate aldolase B and 14 kDa apolipoprotein tran-   scripts were frequently observed in Nodavirus infected tissues compared with V. anguillarum-infected tissues. The above results were further validated by determining the expression of putative markers for each infection in key tissues using real-time PCR. Here also control tissues were included in order to determine the expression of untreated fish. The real-time PCR confirmed the results obtained with the in silico analysis for selected genes. Taking into account the fold inductions of the real-time PCR experiments the correlations between in silico and qPCR are uniform. For instance the transcript for hepicidin precursor revealed in silico (R = 298.16) high expression only in liver tissues infected with V. anguillarum. The real-time PCR results show higher expression in all three tissues infected with V. anguillarum. However fold induction in liver is 20,000 times more than in spleen tissue, therefore theoretically 20,000 more cDNA clones had to be sequenced to obtain the sequence for hepicidin precursor in spleen infected with V. anguillarum. This correlation of high fold induction with in silico results can be observed for each transcript examined in this study. Thus, while the mRNA levels of hepcidin were found to increase considerably 24 h post-infection in the liver, spleen and head kidney of V. anguillarum-infected fish, they increased only slightly in Nodavirus-infected fish (Fig. 4). Notably, although the mRNA levels of transferrin and ferritin, both involved in iron metabolism with spleen and liver as the two main organs, increased in the liver after infection with both pathogens, they increased only in the spleen of V. anguillarum-infected animals (Figs. 5a and 5b).
The mRNA levels of the chemokine receptor 4 were not affected or were slightly reduced in the head kidney and spleen of Nodavirus-infected fish but were considerably increased in these two tissues after V. anguillarum infection ( Fig. 6). On the other hand, the mRNA levels of the 14 kDa apolipoprotein increased in the fish livers infected with both pathogens, but at 4 h and 24 h post-infection in the case of the Nodavirus and at 4 h post-infection in the case of V. anguillarum (Fig. 7). Here the expression in the liver is studied, as the liver is the major organ in the production of apoliprotein. Finally, although the mRNA lev- x-box binding protein 1 AAQ08005

Discussion
Although viral and bacterial infections are among the key challenges in fish aquaculture, nevertheless today the immune response of fish against V. anguillarum and Nodavirus remains largely unknown. Identification of genes involved in the immune response as well as the detection of differentially expressed genes between the two infections can make a significant contribution to future research leading to a better understanding of the biological system of immune response after fish infection. In the present study ten cDNA libraries, six from tissues infected with V. anguillarum and four from tissues infected with Nodavirus were analysed. Analysis of EST sequences coming from infected tissues will enhance the construction of an immune specific microarray chip containing already known transcripts involved in immune-related biological processes, such as the immune response as well as transcripts for which no annotation is available so far. Furthermore, transcripts indicating a higher expression level in one of the infections can be taken for future functional studies at RNA or DNA level as well as at protein level.
Over the past 30 years cDNA cloning for gene discovery and transcriptome analysis has become a very important molecular technique. Various techniques have been developed to address several scientific issues such as the cloning of rare transcripts, the construction of libraries with a wider cloning range, etc. (for review [26]). Construction of non-normalized libraries in the present study gave a first insight into the tissue-specific manner of transcript abundance according to their origin. Besides the possibility of identifying higher expressed transcripts, the percentages of unique sequences can also be assessed. In this study the redundancy of the cDNA libraries of liver, spleen and kidney infected with Nodavirus and V. anguillarum was in agreement with all three tissues (~33%, ~63% and 38% respectively). This result is in line with other cDNA libraries of various fish species where the redundancy ranges between 40% and 60% depending on the tissues of origin [e.g. [12,13]]. Besides the identification and characterization of ESTs for components of the immune system, detection of microsatellite sequences will help in the completion of quantitative trait locus (QTL) scans currently being performed. Microsatellite sequences, also called Simple Sequence Repeats (SSR), are frequent in non-coding regions and are used as molecular markers. Detection of SSR within ESTs (exonic microsatellites or EST-SSRs) presents a shortcut to obtaining microsatellite markers. Since EST-SSRs are exonic they have two advantages over    Head kidney intergenic microsatellites. First, it is expected that their flanking regions are more conserved, so that the primers can be used even in related species, and second, it is assumed that they are in strong linkage disequilibrium with functionally important sites. Therefore they are frequently used in population genomics or in mapping of genes of economic significance identified as candidate markers for QTL and/or quantitative trait nucleotide (QTN). For EST similarity search in the present study a homologue of a known gene is defined as a cDNA whose similarity to a gene of any other organism in the database exceeds a certain fixed threshold. The identification of orthologues is outside the scope of this study. In total 1246 (41%) were assigned to a known transcript, with 79 (6%) categorized to the GO category "immune system process". Separate examination of these 79 transcripts by GO annotation reveals their involvement in 11 other categories of biological function (Fig. 9), with three dominant categories of response to stimulus, cellular process, and biological regulation. This collection should provide the base material for further research into understanding the immune response of European seabass as well as for the isolation of putative biomarkers.

Similarity relationships
Comparison of predicted seabass genes compared to the genomes of zebrafish, medaka, tetraodon, stickleback and human (Fig. 2) showed that the majority of putative proteins were located in the centre. From separate examina-tion of the different triads a bias towards the top and right sections is revealed. This bias is not unexpected as seabass is more closely related to medaka, zebrafish, stickleback and tetraodon. However it is worth noting, that the seabass cytochrome b seems to be more similar to human cytochrome b than to the tetraodon and medaka cytochrome b as shown in Fig. 2. This was not the case with stickleback and zebrafish cytochrome b. Interestingly, results from comparisons of putative proteins of the Atlantic halibut (Hippoglossus hippoglossus) with the human, zebrafish and tetraodon protein database showed that the halibut cytochrome c oxidase subunit 3 (Cox3) is more similar to human COX3 than the zebrafish and tetraodon Cox3 [13]. Comparison of predicted proteins with only the protein database of fish genomes shows a slight bias towards medaka and stickleback looking at the triad medaka, stickleback and tetraodon (Fig. 3A) and again a slight bias towards medaka and stickleback looking at the triad stickleback, medaka and zebrafish (Fig.  3B). These results give a first insight towards the evolution of immune related genes as the relatively equal distribution indicate that sequence variation between the clade Percomorpha is comparable to that between the clade Percomorpha and Ostariophysi.

Expression analysis
For in silico expression analysis transcript appearing more than once in the cDNA libraries were selected and their relative abundance were submitted to expression analysis GO categorization of only the EST sequences grouped into the category of immune system process Figure 9 GO categorization of only the EST sequences grouped into the category of immune system process.
after Stekel et al. [22]. Validation of in silico analysis was performed by qPCR. Individual variation may be masked in this approach as pooling strategy was chosen for qPCR experiments. The differential expressed transcripts detected in the present study can be further put forward for analysis of individual expression pattern. Nonetheless in order to study individual expression pattern the sampling frame has to be extended. In the present study the pooling strategy for qPCR was chosen in order to show cross-method consistency. However since results are consistent between the two approaches, influence of between individuals variability in response to infection has been addressed to some extent. In addition total RNA for qPCR analysis was extracted out of different individuals than the once used for cDNA library construction and patterns appear to be consistent between the different samples for all the selected candidate genes, which reflect the robustness of the approach and the small, if any bias, contributed by individual outliers. In silico expression analysis revealed a number of genes for R > 6 that are considerably above the exponential curve (see Additional file 6). Genes with R > 6 can be considered as significant and thus are candidate genes for further studies. Several of those transcripts including transcripts involved in iron metabolism such as ferritin and transferrin are also reported as differential expressed genes in the catfish Ictalurus punctatus and Ictalurus furcatus infected with the gram negative bacterium Edwardsiella ictaluri [27,28]. One of the main mechanisms whereby gram-negative bacteria pathogens like V. anguillarum obtain iron is the use of free heme or heme proteins from the host tissues [29]. The heme uptake mechanisms are considered to contribute to V. anguillarum virulence in fish [29]. However, it is surprising that Nodavirus infection also resulted in the up-regulation of transferrin and ferritin expression, especially within 24 h of infection. The abundance of transferrin transcripts in Nodavirus-infected tissues may not be related to the alteration of the iron metabolism by the pathogen but rather to the ability of enzymatically cleaved forms of this pro-tein to activate fish macrophages [30]. The specific alteration of iron metabolism by V. anguillarum infection is also supported by the higher abundance of transcripts coding for hepcidin, a major homeostatic regulator of iron metabolism [31], and for α and β chains of hemoglobin in V. anguillarum-than Nodavirus-infected livers (255 clones vs. 1 clone, respectively) . In this study the qPCR experiments confirmed the up-regulation of hepicidin in D. labrax after infection with V. anguillarum and showed in addition to this, that the expression of hepcidin might be considered as an excellent marker of bacterial infections, since it was up-regulated in all examined tissues of V. anguillarum-infected fish but unaffected in Nodavirusinfected tissues. Another interesting observation of the in silico gene expression analysis is the differential abundance of transcripts encoding the isoforms A and B glycolytic/gluconeogenic enzyme fructose-1,6-biphopshate aldolase in bacterial and viral infected tissues. Although the role played by this enzyme in the outcome of these infections is difficult to anticipate due to its dual role in glucose metabolism, these results suggest that the expression ratio between the two enzyme isoforms may be used as a good indicator of the type of infection in the European seabass. Thus, the up-regulation of the B isoform in the spleen exclusively by V. anguillarum might be considered another potential marker for this bacterial infection. Similarly, apolipoprotein A1 and 14 kDa apolipoprotein, two major components of high density lipoproteins (HDL) and synthesized in the fish liver [41], also show a differential expression in the liver of fish infected with V. anguillarum and Nodavirus following the time course and, therefore, they also may be good candidate indicators of the fish health status and/or the type of infection. The real-time PCR confirmed observations of in silico expression analysis and also revealed that the expression of the 14 kDa apolipoprotein and aldolase B in the spleen is an appropriate marker of Nodavirus and V. anguillarum infections, respectively. Previous studies in carp and medaka have also shown the involvement of apolipoproteins in the immune response [42,43]. Finally, the differential expression of one of the clear immune-related genes, the chemokine receptor 4, was also found to be a good putative marker for V. anguillarum infection. For assessment of variability of putative markers further studies looking at individuals, exposed to other environmental or pathogenic conditions are needed to exclude possible biological variability caused by infections.

Conclusion
In this study we generated a collection of EST sequences from tissues of the European seabass infected with V. anguillarum and Nodavirus. We compared gene expression of different tissues after viral and pathogenic bacteria infection. A collection of 3075 unigenes was generated and candidate microsatellite sequences detected. Furthermore, comparisons of D. labrax transcripts with zebrafish, human, tetraodon, medaka and stickleback were performed. The majority of putative proteins were located in the centre with a bias towards the right sections, with D. labrax as expected being more closely related to the other fish species than to human. Comparison of putative D. labrax proteins was also performed among fish species. In this case a slight bias towards stickleback and medaka was observed when comparing medaka, stickleback and tetraodon and a slight bias towards stickleback and medaka was observed when comparing medaka, stickleback and zebrafish. Furthermore, in silico analysis of differential gene expression between the two infections based on EST sequences suggests a list of genes with a presumed function in the immune response of D. labrax revealing also the importance of looking at "non-classical" immune host proteins and emphasizing the significance of EST sequences generated from cDNA libraries of infected fish tissues. In addition, we show the power of sequencing cDNA sequences for expression analysis by performing real-time PCR experiments for transcripts with high, medium and low R-value. In view of new and high throughput sequence techniques detection of differential expression by measuring in silico the abundance of each transcript will enhance significantly the era of functional genomics. Furthermore in silico analysis in this study, followed by the confirmation with real-time PCR of potentially interested genes, has revealed some of them as potential biomarkers for bacterial and viral infections in fish.

Experimental condition and tissues collection
Two infections, one with Nodavirus strain 475-9/99 isolated from diseased sea bass [from the Instituto Zooprofilattico Sperimentale delle Venezie (Italy) [16]] and one with V. anguillarum strain R-82 (serogoup 01) [from the University of Santiago (Spain) [14]] were performed with seabass as previously described [14,16]. Tissues were taken 4 and 24 h post-infection. Three tissue types (spleen, liver and head kidney) of each experimental condition as well as peritoneal exudate, gill, intestine from V. anguillarum infection and brain from Nodavirus infection were selected and immediately frozen with liquid nitrogen. The experiments described comply with the guidelines of the European Union Council (86/609/EU) for the use of laboratory animals and have been approved by the Bioethical Committee of the University of Murcia (Spain) and the CSIC National Committee on Bioethics.
In brief; For Nodavirus infection fish were injected intramuscularly with 100 μl of nodavirus suspension in Mini-mum Essential Medium (MEM) (5.9 × 10 6 TCID 50 ml -1 ) and placed at 25°C. Mock-infected control fish were injected with the medium alone, and maintained under the same experimental conditions. Three fish from each experimental and control groups were sampled 4 and 24 hours post-infection. Animals were sacrificed by anesthetic (MS-222) overdose and dissected. For the present study brain, spleen, head kidney and liver were sampled.

RNA extraction
Total RNA was extracted using the NucleoSplin RNA II extraction kit (Machinery Nagel, Dueren, Germany). RNA quality was checked on EtBr stained agarose gels and RNA concentrations and purity were measured using a Nano-Drop spectrophotometer. For library construction equal amounts of total RNA extracted out of infected tissues (4 h and 24 h) were pooled. For qPCR experiments total RNA was freshly extracted out of infected tissues originating from three different individuals pooled prior to RNA extraction (liver, spleen and head kidney) with 4 h and 24 h post-infection.

cDNA library construction
All libraries were constructed from total RNA using the Creator SMART cDNA library construction kit (BD Bioscience-Clontech, Mountain View, Canada) using the LD PCR based method. Between 20 and 22 PCR cycles were performed before size separation of inserts. cDNA fragments > 600 bp were selected and directionally ligated at the restriction site Sfi1 of the pDNR-lib vector (BD Clontech) or the pal 32 vector. Plasmids were transformed into E. coli strain DH10B (Invitrogen) by electroporation. The libraries were tested for the presence and the size of insert by PCR using two primer pairs. For the libraries constructed with pal 32 vector, the primer pair pal 32 FOR: 5'-CTCGGGAAGCGCGCCATT-3' and pal 32, REV: 5'-TAATACGACTCACTATAGGGC-3' were used. For the libraries constructed with pDNR-lib vector pDNR FOR: 5'-TAAAACGACGGCCAGTA-3' pDNR REV: 5'-GAAACAGCTATGACCATGTTC-3' were used. The products were run on an EtBr stained agarose gel.

DNA sequencing
After plasmid preparation, dideoxy-temination DNA cycle sequencing was performed using the BigDye 3.1 sequencing method and the pDNR FOR (5'-TAAAACGACG-GCCAGTA-3') primer. The sequences were run on an ABI 3730 XL sequencer at MPI Molecular Genetics, Berlin.

Sequence analysis
The raw sequence reads were quality-trimmed and vectorand poly-A-clipped using PREGAP4 [18]. Clustering (grouping of clones related to one another by sequence homology) was performed using the software SeqManII (DNAstar Inc.). After clustering the term 'contig' is used to describe the sequence obtained from one cluster (the sequences of a cluster can be collapsed into a single, nonredundant sequence) and the term 'singleton' describes sequences appearing only once in the entire dataset. The set of sequences obtained by merging contigs and singletons are named as unique sequences.

Simple Sequence Repeats (SSR) in EST sequences
In silico mining for repeat motifs within the obtained unique sequences was perfomed with the programme Msatfinder http://www.genomics.ceh.ac.uk/msatfinder/ [19].

Homology search and GO annotation
Gene Ontology (GO) category (Biological process) was assigned after BLASTX search of 3075 unique EST sequences using BLAST2GO. Threshold cutoff was at Evalue 1e -3 and the alignment length of 33 amino acids (aa).

Similarity relationships
The unique sequences from all seabass libraries were submitted to BLASTX similarity searches [20] against the zebrafish, tetraodon, stickleback, medaka and human predicted proteomes (downloadable from http:// www.ensembl.org/index.html). For each database the highest BLAST scores (bit score values) in excess of 50 were retained. Relative similarities between triads were visualized as a triangular plot generated by the SimiTri software [21].

Expression analysis
In silico All sequences of each cDNA library were submitted to BLASTX and BLASTN searches [20]. Transcripts appearing more than once in the cDNA libraries were selected for in silico expression analysis after Stekel et al. [22]. In brief, this method allows the comparison of gene expression in any number of libraries in order to identify differential expressed genes. The method uses a single statistical test to describe the extent to which a gene is differentially expressed between libraries by a log likelihood ratio statistic and tends asymptotically to a χ 2 distribution [22]. For real-time PCR experiments transcripts with high, medium and low R-value were selected.