Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals.


Background
Since it is a representative population of lower vertebrates serving as an essential link to early vertebrate evolution, fish is believed to be an important model in various developmental and comparative evolutionary studies [1][2][3]. Fish immunogenetics has received considerable attention due to its essential role in understanding the origin and evolution of immune systems [4][5][6].
Further, it is also beneficial in the creation of immunebased therapy of severe fish diseases. Great progress in bioinformatics and genome projects in model organisms, including human (Homo sapiens), mouse (Mus musculus), frog (Xenopus laevis), chicken (Gallus gallus), and zebrafish (Danio rerio), has led to the emergence of studies focusing on the identification and characterization of immune-related genes in teleost fish based on comparative genomics. These have provided preliminary observations on fish immunogenetics and evolutionary history of immune systems from lower vertebrates to mammals [7,8]. However, large-scale identification of immune-related genes at the genome or transcriptome levels in fish was seen in limited species (e.g. Danio rerio) due to the inadequate number of high-throughput deep sequencing technologies available [9,10]. This is an even more difficult problem in non-model fish species with totally unknown genome sequences.
Recently developed RNA deep sequencing technologies, such as Solexa/Illumina RNA-seq and Digital gene expression (DGE), have dramatically changed the way immune-related genes in fish are identified because these technologies facilitate the investigation of the functional complexity of transcriptomes [11,12]. RNA-Seq refers to whole transcriptome shotgun sequencing wherein mRNA or cDNA is mechanically fragmented, resulting in overlapping short fragments that cover the entire transcriptome. DGE is a tag-based transcriptome sequencing approach where short raw tags are generated by endonuclease. The expression level of virtually all genes in the sample is measured by counting the number of individual mRNA molecules produced from each gene. Compared with DGE analysis, the RNA-Seq approach is more powerful for unraveling transcriptome complexity, and for identification of genes, structure of transcripts, alternative splicing, non-coding RNAs, and new transcription units. In contrast, the DGE protocol is more suitable and affordable for comparative gene expression studies because it enables direct transcript profiling without compromise and potential bias, thus allowing for a more sensitive and accurate profiling of the transcriptome that more closely resembles the biology of the cell [9,[13][14][15][16][17]. These two technologies have been used in transcriptome profiling studies for various applications, including cellular development, cancer, and immune defence of various organisms [10,[18][19][20][21][22][23][24][25][26][27][28][29]. However, they have not been used in immunogenetic analysis of marine fish species.
Japanese sea bass (Lateolabrax japonicus) is an economically important marine species widely cultured in fisheries worldwide. Various diseases caused by bacterial and viral pathogens plague this species [30]. High mortality is associated with infection with Vibrio harveyi, a typical gram-negative pathogen of a wide range of marine animals. Infection results in a variety of vibriosis, a common aquatic animal disease associated with high mortality throughout the world [31]. In L. japonicus, V. harveyi infection leads to bacterial septicaemia with muscle ulcer as well as subcutaneous and gastroenteritic haemorrhage.
The present study is the first to conduct a transcriptome profiling analysis of V. harveyi-challenged L. japonicus using RNA-seq and DGE to gain deep insight into the immunogenetics of marine fish. Bacteria-challenged L. japonicus is expected to be a model system for studying bacterial immunity in marine fish. Further, a global survey of anti-bacterial immune defence gene activities in marine fish can contribute to the in-depth investigation of candidate genes in fish immunity. Results are also expected to improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals.

Results
Aligning raw sequencing reads to non-redundant consensus Approximately 34.59 and 33.03 million 75-bp pair end (PE) raw reads (submitted to GEO database, Association No. GSE21721) from the head kidney and spleen tissues of bacteria-and mock-challenged fish, respectively, were generated using Solexa/Illumina RNA-seq deep sequencing analysis. Repetitive, low-complexity, and low-quality reads were filtered out prior to assembly of sequence reads for non-redundant consensus. Using Grape software, reliable reads were assembled into contigs, which were then compared with all PE reads. Overlap of PE reads with two contigs was taken to indicate that the contigs are short segments of a scaffold. Reads were used for gap-filling of these scaffolds to generate final scaffold sequences. Using tgicl and cap3 software programs, scaffold sequences were assembled into clusters that were then analysed for consensus. A total of 150,125 and 140,330 non-redundant consensus sequences, ranging from 100 to 2,000-bp, were generated from each group. Then, consensus sequences were merged for DGE analysis. Removal of partial overlapping sequences yielded 169,950 non-redundant consensus sequences (Table 1). These sequences provide abundant data on healthy and infected conditions, thus allowing for better reference of immune-relevant genes.
Annotation of all non-redundant consensus sequences BLASTX and ESTscan software analysis of 169,950 nonredundant consensus sequences revealed that about  [32,33]. CD-containing consensus sequences have high potential for translation into functional proteins and most of them translated to proteins with more than 100 aa. Comparison with the Nr and Swissprot databases revealed that 44,842 consensus sequences had good comparability with known gene sequences in existing species. Annotation of the 44,842 sequences using GO and COG databases yielded good results for approximately 16,469 consensus sequences and 9,545 putative proteins [34,35]. GO-annotated consensus sequences belonged to the biological process, cellular component, and molecular function clusters and distributed among more than 50 categories (Figure 1), including biochemistry, metabolism, growth, development, apoptosis, and immune defence. Similarly, COG-annotated putative proteins were classified functionally into at least 25 molecular families, including cellular structure, biochemistry metabolism, molecular processing, signal transduction, gene expression, and immune defence, that correspond to the categories observed in GO analysis ( Figure 2). The KEGG database was used to analyse potential involvement of the consensus sequences in cellular metabolic pathways ( Figure 3) [36]. Among the 44,842 consensus sequences, 24,496 can be grouped into seven categories comprised of 219 known metabolic or signalling pathways, including cellular growth, differentiation, apoptosis, migration, endocrine, and various immune-relevant signalling or metabolic pathways ( Table 2).

Annotation of immune-relevant genes and pathways
To gain deep insight into the molecular biology of immune systems in L. japonicus, the immune-relevant genes, metabolic and signalling pathways were analysed. Approximately 2,673 consensus sequences were found to be homologous to known immune-relevant genes in other species (Table 3, Additional file 1, Table S1), including the most important elements of innate and adaptive immunity, such as pattern recognition receptors (mannose binding proteins, scavenger receptors, Toll-like receptors, and lipopolysaccharide-binding proteins), inflammatory cytokines and receptors (IL-1β, IL-10, IL-12, IL-6, IL-8, TNF-α, IFN-γ, TGF-β, VEGF, IL-1, and TNF receptors), immunoglobins (IgM and IgD), transcriptional factors (NF-kB, Ik-B, AP-1, transcription factor 4, chorion specific transcription factor, and transcriptional enhancer factor 5), complement components (C1q, C1r, C1 s, C2, C4, MASP, Bf, Df, and CR1-5), leukocyte differentiation antigens (CD3, CD4, CD8, and CD80/86), antigen presenting and processing molecules (MHC I, MHC II, TAP1, TAP2, TAP2B, proteosome subunit, PA28α/β, and constant chain protein li), regulatory molecules involved in immune cellular proliferation, differentiation, and apoptosis (growth differentiation factor 11, proliferation associated protein 2G4, apoptotic peptidase activating factor 1, and anti-apoptotic protein NR13), and other molecules involved in immune response (hepcidin, lectin, lysozyme, antimicrobial peptide NK-lysin, chemokines, ferritin, RAG1, and RAG2). KEGG analysis revealed that approximately 2,082 consensus sequences were significantly enriched in various known immune-relevant metabolic or signalling pathways ( Figure 3). These suggest a considerable conservation of immune-relevant genes and pathways between L. japonicus and mammals. Conserved genes and pathway members might include Toll-like receptors (TLR1-9 and TLR13) and corresponding adaptors in mammals ; key molecules involved in antigen presenting and processing pathways (MHC class I and II, Hsp70, Hsp90, and Calnexin); members of complement and coagulation pathways (classical pathway members C1q, C1r, C1 s, C2, and C4, MBL pathway members MBL and MASP, alternative pathway members Bf and Df, and complement receptor type 1); and members involved in FcεR I signalling pathway (PI3K and SYK), leukocyte migration (lymphotactin, Rho GTPase-activating protein 5, RhoH, integrin alpha-M, and leukaemia inhibitory factor receptor), and natural killer-mediated cytotoxicity (natural killer cell receptor 2B4, natural killer cell protease 1, natural killer cell enhancement factor, natural killer-tumour recognition sequence, and natural killer cell stimulatory factor 2). Furthermore, a number of consensus genes involved in cellular adhesion, energy production, and amino acid (arginine and praline) metabolisms were also conserved between fish and mammals. These genes are indirectly related to immune responses in mammals. For instance, L-arginine metabolism has been proven to be related to phagocytosis of macrophages, which eventually led to the discovery of NO signalling molecule [37]. Thus, the involvement of these consensus genes in metabolic pathways provides the basis for further identification of the biological functions of candidate genes in fish immune responses.

Digital gene expression profile analysis after bacterial challenge
Solexa/Illumina DGE analysis was performed to identify the genes involved in L. Japonicus response to bacterial challenge [12]. A total of 3.44 and 3.22 million raw tags (submitted to GEO database, Association No. GSE21712) of the mRNAs extracted from head kidney and spleen of the mock-and bacteria-challenged groups, respectively, were identified by base calling [38,39]. After transformation of raw sequences into clean tags by data-processing steps using bio-perl scripts, approximately 0.33 and 0.27 million high quality non-redundant tags were obtained in both groups ( Figure 6). Gene annotation was performed by tag mapping analysis using the 169,950 non-redundant consensus sequences from RNA-seq-based transcriptome analysis as reference transcript database. Results showed that 71.41% and 74.53% of all distinct tags can be mapped to the entire reference database (sense or anti-sense) in both groups. Out of the 26,394 sense strands and 23,790 anti-sense strands detected in the mock-challenged group, about 36,782 (75.1%) sense or anti-sense strands were mapped by the tags. In contrast, about 34,840 (71.1%) sense or anti-sense genes were mapped out of the 23,359 sense Figure 3 KEGG categories of non-redundant consensus sequences. All non-redundant consensus sequences were annotated using KEGG Automatic Annotation Server for pathway information, and about 24,496 consensus sequences were annotated. The categories GIP and EIP stand for genetic information processing and environmental information processing, respectively.    Figure 4 Putative TLR signal pathway. Putative TLR signal pathway of L. japonicus was constructed based on knowledge of TLR signalling in mammalian species. However, most interactions have to be confirmed experimentally. strands and 21,046 anti-sense strands in the infected group ( Figure 6). Among the detectable expressed consensus sequences, 9,643 genes had successful annotations. Mapping results are summarized in Additional file 2, Table S2 and Additional file 3, Figure S1. Strict Bayesian algorithm was used in differential DGE analysis in order to consider the differences in library size for differential selection between the two differentially expressed gene libraries [40]. Soap2 software was used to map all measured tags to the corresponding assembled consensus sequences [41]. P ≤ 0.01 and absolute value of log 2 Ratio≥1 were used as the threshold of significant differences in gene expression. Approximately 1,224 CD-containing consensus sequences mapped by 19,548 differentially expressed tags exhibited significant differences after bacterial challenge. Among them, about 376 consensus sequences were significantly up-regulated (including 61 most overexpressed sequences that increased by more than 100fold), while 848 consensus sequences were significantly down-regulated (including 10 most down-regulated sequences that decreased by 10-100-times) (Figure 7, Additional file 4, Table S3). The distribution of the significant changes detected is illustrated in a volcano plot (Figure 8), where the statistical significance of each consensus was plotted against fold change. Sequences with the highest average differences between the bacteriaand mock-challenged groups (far right and left of the plot) also had the smallest false discovery rate (FDR) values. Analysis of the CD-containing consensus sequences revealed that they all had FDR values less than 0.1, with the highest value being 0.063.

Putative novel immune/stress response genes
Among the differentially expressed transcripts, more than 1,183 transcripts were well annotated, whereas  (Figures A and B, respectively). 'Not DETs' and 'Not DEGs' indicate 'not detected expression tags' and 'not detected expression genes', respectively. For Figures A and B, the x-axis contains Log 10 of transcript per million of the bacteria-challenged group and the y-axis indicates Log 10 of transcript per million of the mock-challenged group. Limitations are based on P < 0.01, and the absolute value of Log 2 (B/A) is greater than 1. approximately 41 transcripts had low sequence homology to known sequences in public databases, suggesting that they might be putative novel immune-relevant genes in L. japonicus involved in the response to bacterial challenge. Among these novel sequences, 13 were differentially regulated by more than 100-fold, implying that they were strongly infection-responsive genes. Pro-Dom analysis identified one HSP domain-and one protein kinase domain-containing sequence [42]. SignalP and TMHMM programs distinguished 24 sequences with signal peptides or transmembrane domains, suggesting that they are cytokines and transmembrane proteins, respectively [43,44]. Observations presented in Additional file 5, Table S4 can provide guidance for further identification. In-depth functional studies of these novel sequences may benefit the exploration of potential marine fish-specific immune-relevant genes for application in the control of fish diseases.

Experimental validation of consensus sequences
To validate the integrity of RNA-seq results, representative consensus sequences with complete encoding regions, such as hepcidin, Myf5, SNARE, and two IL-8like CXC chemokine family members, were selected for experimental cloning and sequencing analyses by RT-PCR (Additional file 6, Table S5). All experimentally examined genes matched the RNA-seq-generated sequences perfectly. One of the two IL-8-like CXC chemokines was newly discovered by this study. The two IL-8-like CXC chemokine family members were identified through phylogenetic analysis. Both sequences conserved the four cysteine residues (C 46 -C 48 -C 74 -C 91 ) that are the hallmarks of IL-8 CXC chemokines and can be found throughout the vertebrate IL-8 family (Additional file 7, Figure S2 and Additional file 8, Figure S3). This demonstrates the reliability of RNA-seq results and indicates the necessity for further identification of immunerelated genes in L. japonicus.

Discussion
The transcriptome is the complete repertoire of expressed RNA transcripts in a cell. Its characterization is essential in deciphering the functional complexity of the genome and in obtaining a better understanding of cellular activities in organisms, including growth, development, disease, and immune defence. The definition of the transcriptome has long been a challenging task. Traditionally, global gene expression analysis has relied mostly on several approaches, including RNA hybridisation on high-density arrays, whole-genome tiling arrays, expressed sequence tag (EST), serial analysis of gene expression (SAGE), and SAGE-derived technologies, which include massively parallel signature sequencing (MPSS) and polony multiplex analysis of gene   expression. However, these approaches have several inherent limitations. For example, the array-based approaches allow detection of specific sequences only and capture the transcriptome while ignoring splicejunction information or alternative splicing events. The EST approach provides only partial sequences of individual cDNA clones, is sensitive to cloning biases, and is associated with high costs and difficulties in data analysis. SAGE and MPSS are also costly and cannot be used for splicing events [9]. Thus, the newly developed Solexa/Illumina RNA-seq and DGE high-throughput deep sequencing approaches have dramatically changed how functional complexity of the transcriptome can be studied. These approaches overcome many of the inherent limitations of traditional systems, making the detection of alternative splicing events and low-abundance transcripts possible. They have been applied recently to several species, such as yeast, Arabidopsis, Chlamydomonas, Zebrafish, Drosophila, Caenorhabditis, and human, for different purposes [9,10,[18][19][20][21][22][23][24][25][26][27][28][29].
In this study, the transcriptome profile analysis of bacteria-challenged L. japonicus was conducted through these two approaches in an attempt to gain deep insights into the immunogenetics of a marine species. As expected, a large set of transcriptional sequences with complete or differing lengths of encoding regions was generated. KEGG analysis showed that more than 52% of transcripts are enrichment factors involved in approximately 219 known metabolic or signalling pathways, including cellular growth, differentiation, apoptosis, migration, endocrine, and immune system processes. Further, more than 8% of transcripts represent novel fish-specific genes that have never been described previously. Detailed analysis of immune-relevant genes and pathways showed that more than 2,673 transcripts are homologous to known immune-relevant genes, whereas approximately 2,082 transcripts can be enriched in various immune-relevant metabolic or signalling pathways. Challenging the fish with V. harveyi resulted in large alterations of the host transcriptome profile, including significant up-or down-regulation of 1,224 transcripts, among which 41 sequences might be novel immunerelevant genes in fish. In addition, several other biological processes that have not been linked to host immunity before, such as the metabolism of carbohydrates, amino acids, and lipids; activation of ATPase, NADH dehydrogenase, NAD kinase, and tyrosine protein kinase; and up-regulation of nuclear receptors, replication initiators, and ribosomal proteins, were found to be dramatically involved in host immune response. These significantly regulated transcripts might represent strong infection-responsive genes in L. japonicus, and reflect a number of immune activities during fish defence against bacterial challenge. The transcriptome profiling data sets obtained in this study provide strong basis for future genetic research in marine fish and support further in-depth genome annotation in vertebrates. Future molecular and functional characterisation of infectionresponsive genes could lead to global identification of immune-relevant genes and infection markers in marine fish.
At present, transcriptome analysis in fish relies mainly on the EST approach [45,46]. Although there have been an increasing number of ESTs sequenced in a large number of libraries in various fish species, including rainbow trout (Oncorhynchus mykiss), Atlantic salmon (Salmo salar), medaka (Oryzias latipes), and zebrafish (Danio rerio), the immune-relevant transcriptional profiling data sets obtained from fish are still insufficient. Recently, DGE-and microarray-based transcriptome profiling studies performed in zebrafish revealed that zebrafish and its developing embryo are useful in vivo Limitations of all differentially expressed genes are based on P ≤ 0.01. The absolute value of Log 2 Ratio≥1 and FDR≤0.05 showed that the genes were significantly altered after bacterial challenge. The cutoff E-value was set to 1e-10 during the gene annotation process. The absolute value of "Fold Change" means magnitude of up-or down-regulation for each gene/homolog after bacteria challenge; "+" indicates up-regulation, "-" indicates down-regulation.
models for the identification of host determinants of responses to bacterial infection [9,10]. However, transcriptional information on immune responses to infection in a non-model marine fish remains elusive. Therefore, the large set of immune-relevant genes and their role in responses to bacterial challenge in L. japonicus presented in this study may largely improve knowledge on fish immunogenetics in other analytical systems. The present study also demonstrates the advantages of new deep sequencing approaches for gene discovery, thus providing new leads for functional studies of candidate genes involved in host-bacteria interactions. The RNA-Seq and DGE analyses conducted in this study were found to complement each other well. RNA-Seq was very effective in unravelling transcriptome complexity, and can detect a large set of genes, including numerous low-expressing genes or novel genes. DEG data can be merged with RNA-Seq data sets, indicating an affordable method for comparative gene expression study. Thus, RNA-Seq was initially performed in this study to provide strong reference transcriptome database for subsequent DGE analysis. Emerging hallmark components and the cells necessary for innate and adaptive immunity in higher vertebrates have been identified in fish [47,48]. This was the basis for the widely accepted notion that innate and adaptive immunity was established in teleosts about 470 million years ago. However, the exact molecular and cellular basis of immune systems in teleosts remains poorly understood. The precise regulatory mechanisms underlying the innate and adaptive immunity of teleosts remain vague due to the limited immune-relevant genetic information available in fish. The present work on the definition of high-throughput transcriptome data set of the immune system of L. japonicus may contribute greatly to better understanding of the molecular and cellular activities involved in fish immunity. Results unexpectedly showed that the fish immune system is more complex than previously thought. On one hand, the substantial amount of immune-relevant genes involved in metabolic and signalling pathways and the induction of genes encoding cell surface receptors, signalling intermediates, transcription factors, and inflammatory mediators show a clear conservation of mechanisms detected in other vertebrate models, including humans. On the other hand, a large set of novel immune response genes and infection markers that have never been linked previously to immune responses in other vertebrate systems was identified in L. japonicus, indicating the existence of numerous fish-specific immune activities during early vertebrate evolution.
For instance, the TLR family is the most important class of pattern recognition receptors that play crucial roles in mediating immune responses to pathogenic microorganisms [8,49,50]. Triggering of TLRs by ligands leads to the recruitment of adaptor proteins, resulting in the activation of a range of transcription factors, such as NF-B, activator protein 1 (AP-1), and IFN regulatory factors (IRFs), through distinct signalling pathways. This eventually leads to the downstream activation of proinflammatory cytokines and receptors, such as IFN-α/β, TNF-α, IL-2, IL-6, IL-8, IL10, CD40, CD86, and MIP1α. To date, 13 TLRs (TLR1-13), at least five adaptor proteins (MyD88, Mal/TIRAP, TIR domain-containing adaptor protein, TRIF/TICAM1, TRAM/TICAM2, and SARM), and numerous downstream effectors have been described in mammals and humans. In the present study, a series of TLRs and corresponding adaptor proteins and downstream effectors were identified in L. japonicus. The identified TLRs include the majority seen in mammals and humans (TLR1-13), and four TLRs (TLR14, TLR18, TLR21, and TLR23) seen in fish species. Adaptor proteins and downstream effectors identified include the majority known in mammals and humans, including MYD88, BTK, TOLLIP, FADD, HMGB1, HRAS, HSPD1, CASP8, MAPK8IP3, PELI1, RIPK2, SARM1, TICAM2, TIRAP, EIF2AK2, IRAK1, IRAK2, MAP3K7, MAP3K7IP1, NR2C2, PPARA, PRKRA, TRAF6, UBE2N, and UBE2V1. These adaptor proteins and downstream effectors have been found to be well enriched in various known TLR signalling pathways. Downstream transcriptional factors and proinflammatory cytokines mediated by these pathways, including NF-B, JNK/p38, NF/IL6, IRF, IFN-α/β, TNFα, IL-2, IL-6, IL-8, and IL-10, was also be identified successfully. These suggest that TLR mechanisms are conserved from fish to mammals throughout vertebrate evolution. A putative draft of TLR signalling pathways in L. japonicus based on knowledge of TLR signalling in mammalian species was constructed ( Figure 4). However, TLR signalling pathways in fish might be more intricate compared with those in mammalian species because of the novel TLRs (TLR14, TLR18, TLR21, and TLR23). An in-depth study of novel TLRs will improve understanding of fish-specific innate immunity in early vertebrates and even the complete evolutionary history of TLR-based innate immunity. DGE analysis revealed that TLR-1, -3, -13, -18, -21 and their signalling intermediates (Rac1, AKT, CASP8, IRAK1, TRAM, IRAK1, IKK alpha/beta, IRF7, and STAT1) were up-or downregulated dramatically at different levels in the pathway upon bacterial challenge (P ≤ 0.01). This provides evidence that both conserved (TLR-1, -3, -13) and fish-specific (TLR-18, -21) TLR-based immunity participates in fish defence against bacterial challenge. The innate immune system is generally believed to represent the evolutionarily ancient aspect of vertebrate immunity. As a representative of lower vertebrates, fish is suggested to possess stronger innate immune responses. However, fish adaptive immunity might be more primitive because of limited immunoglobulins and hallmark components necessary for adaptive immunity identified in this species [7]. In recent years, several hallmarks for T and B cells (TCR, BCR, CD3, CD4, and CD8), antigen presenting and processing molecules (MHCI, MHCII, and DC-SIGN/CD209), co-stimulatory factors (CD80/86, CD83, CD154, and CD40), and immunoglobulins (IgM, IgD, and IgZ/T) have been identified in teleost fish, thus providing preliminary evidence that the adaptive immune system might also be well-established in fish. However, the precise molecular and cellular bases and mechanisms underlying teleost adaptive immunity are still uncharacterised and require further immunogenetic studies. The present study successfully identified a large number of adaptive immunerelevant components homologous to those in higher vertebrates, providing abundant data sets for insights into the characterisation and origin of adaptive immunity in early vertebrates. Data sets imply that adaptive immunity in teleost fish seems to be much more complicated than previously believed. The basic components and signalling pathways necessary for adaptive immunity exist in fish, and a majority showed clear conservation between fish and mammals. For instance, T cell receptor (TCR) signalling pathways regulate T cell activation, one of the most important processes in adaptive immunity [51]. Majority of the four types of TCRs (TCR-α/β, -γδ) and numerous signalling transducers (Zap70, Lyn, LCK, SHP1, CD3, ITAM, LAT, Fyb, SLP-76, CBL, NCK, LAT, GRB2, CARMA1, NFAT, AP1, MALT1, and GRB2) discovered in humans and mammals can be identified in L. japonicus. DGE analysis showed that a number of TCR signalling pathway members, including TCR beta chain, Zap70, LCK, SHP1, CARMA1, Vav, NFAT, GRB2, MALT1, NCK, and Raf1, are induced significantly after bacterial challenge (P ≤ 0.01). These pathway members largely contribute to the proliferation and activation of T cells in mammals, thus suggesting that TCR signalling mechanisms underlying the T cell activation might be conserved between teleost fish and mammals. A putative draft of TCR signalling pathways based on knowledge of pathways known in mammals was constructed ( Figure  5). Future studies on these pathways are expected to not only enrich current knowledge on fish immunology but also contribute to better understanding of the evolutionary history of adaptive immunity.

Conclusions
This study investigated the transcriptome profile of bacteria-challenged L. japonicus using Solexa/Illumina RNA-seq and DGE deep sequencing technologies. The substantial amount of transcripts obtained provides a strong basis for future genomic research on marine fish and supports in-depth genome annotation in vertebrates. Globally identified immune-candidate genes, infection markers, and putative signalling pathways in L. japonicus revealed that the immune system of fish may be much more complex than previously believed. A considerable amount of immune-relevant genes and pathways in fish showed significant similarity to vertebrate models, suggesting that mechanisms underlying the innate and adaptive immunity in fish may be conserved in higher vertebrates. In addition, a large set of novel immune response genes that have never been linked previously to immune responses in other vertebrate systems indicate the existence of numerous fishspecific immune events during early evolution. This suggests that innate and adaptive immunity might be well established in teleost marine fish. Findings provide deep insight into the immunogenetics of fish species, which can be clinically applied in the therapy of fish diseases. They also contribute to a better understanding of the evolutionary history of innate and adaptive immunity from fish to mammals.

Experimental fish
One-year-old Japanese sea bass of both sexes, weighing 48.6 ± 2.5 g, were obtained from the fishery institute of Zhejiang, China. They were kept in running aerated seawater (salinity 30) at 25°C and fed with commercial pellet food at a daily ration of 0.7% body weight. All fish were maintained in the laboratory for at least two weeks prior to experimental use to allow for acclimatisation and evaluation of overall fish health. Only healthy fish, as determined by general appearance and level of activity, were used in the experiment.

Bacterial strain
Wild-type marine fish-virulent V harveyi strain (96-915), a pathogen for bacterial septicaemia in L. japonicas, was maintained in the laboratory. It was cultured in Thiosulfate Citrate Bile Salts Sucrose at 27°C overnight. The desired number of cells was adjusted to 5 × 108 CFU/ ml. Cells were inactivated with 5% formalin at 27°C overnight before thorough washing with sterile PBS (pH 7.0). They were re-suspended in PBS prior to use.

Bacterial challenge and RNA preparation
Fish in the experimental groups were inoculated intraperitoneally with 0.2 ml of V harveyi at 1 × 10 8 CFU per fish. In parallel, fish in the control groups were administrated with 0.2 ml of mock PBS. Both groups were kept under conditions as described above. At seven days post-challenge, fish were sacrificed after anaesthesia, and tissues from the head kidney and spleen were collected. Tissue samples from 15 fishes were mixed for RNA preparation. Total RNA was isolated using a TRIzol reagent (Gibco BRL) following the manufacturer's instructions and treated with RNase free DNase I (Qiagen). RNA concentrations were measured using a spectrophotometer and integrity was ensured through analysis on a 1.5% (w/v) agarose gel.

Sample Preparation for RNA-seq
After RNA extraction, poly-A-containing mRNAs were purified using oligo-dT-attached magnetic beads and fragmented into small pieces using divalent cations under elevated temperature. Cleaved RNA fragments were copied into first strand cDNA using reverse transcriptase and random primers. This was followed by second strand cDNA synthesis using DNA polymerase I and RNase H. These cDNA fragments underwent end repair process, addition of a single 'A' base, and ligation of adapters. Products were subsequently purified and amplified through PCR to create the final cDNA libraries.

Transcriptome analysis
Transcriptome sequencing was conducted using Solexa/ Illumina RNA-seq. Four fluorescently labelled nucleotides and a specialised polymerase were used to determine the clusters base by base in parallel. The 75-bp raw PE reads (submitted to GEO database, Association No. GSE21721) were generated by the Illumina Genome Analyzer II system. Raw reads were then assembled into non-redundant consensus sequences using Grape, tgicl, and CAP3 softwares [52,53]. All sequences were examined for possible sequencing errors. Adaptor sequences were trimmed using the Cross_Match software in the Phrap package http://www.phrap.org/ [54]. Short sequences (< 100 bp in length) were removed using custom Perl program [55]. The resulting high-quality sequences were assembled into sequence contigs with the TGICL program, which creates an assembly using CAP3. Sequence homology searches were performed using local BLASTall programs against sequences in NCBI non-redundant (nr) protein database and the Swissprot database (E-value < 1e-10). Genes were tentatively identified according to the best hits against known sequences. Assembled consensus sequences were used to determine the GO term, COG term, and were analyzed further using KEGG.

DGE-tag profiling
DGE analysis included sample preparation and sequencing. Sequence tag preparation was performed using the Digital Gene Expression Tag Profile Kit (Illumina) according to the manufacturer's instructions. Briefly, 6 μg total RNA was used for mRNA purification using oligo-dT magnetic bead adsorption and oligo-dT was used to guide reverse transcription for double-stranded cDNA synthesis. The generation of 5' ends of tags was done using endonuclease NlaIII, which recognizes and cuts off the CATG sites on cDNA. cDNA fragments with 3' ends were purified through magnetic bead precipitation, and Illumina adapter 1 was added to the 5' ends. The junction of Illumina adapter 1 and CATG site was the recognition site of MmeI, which cuts 17 bp downstream of the CATG site, producing tags with adapter 1. After removal of 3' fragments with magnetic bead precipitation, the 21-bp unique tags with adaptor 1 were purified and ligated to adaptor 2 to form a cDNA tag library. These adapter-ligated cDNA tags were enriched after 15 cycles of linear PCR amplification. The resulting 85-bp fragments were purified by 6% TBE polyacrylamide gel electrophoresis. Fragments were then digested and the single-chain molecules were fixed onto the Solexa Sequencing Chip (flowcell). Sequencing by synthesis was performed using the Illumina Genome Analyzer II system according to the manufacturer's protocols. Image analysis, base calling, generation of raw 17-bp tags, and tag counting were performed using the Illumina pipeline. Raw data (tag sequences) were deposited in the GEO database under submission number GSE21712.
Aligning DGE tags to reference transcriptome data set Clean tags and count number of DGE libraries from bacteria-and mock-challenged groups were collected and summarised using custom Bio-perl scripts. All tags were mapped to the reference transcriptome generated by RNA-seq. To monitor mapping events on both strands, both sense and complementary antisense sequences were included in the mapping process. Only perfect matches over the entire 21-bp length of the 17bp tag plus the 4-bp NlaIII recognition site were allowed. This study was limited to tags that mapped to ORFs only and cannot show tags that mapped to mRNA with long 3'UTRs.

Identification of differentially expressed genes
Rigorous algorithms were developed to identify differentially expressed genes between two samples. The correlation of the detected count numbers between parallel libraries was assessed statistically by calculating the Pearson correlation. In addition to the P value, FDR was manipulated to determine differentially expressed genes [56]. Assuming that R differentially expressed genes have been selected, S genes really show differential expression, whereas the other V genes are false positives. If the error ratio Q = V/R must remain below a cutoff (5%), FDR should not exceed 0.05. In this research, P ≤ 0.01, FDR ≤ 0.1, and the absolute value of Authors' contributions LXX and DH conceived and designed the study, participated in the bioinformatics analysis, and drafted the manuscript. WRD and YWZ performed the experiments and designed the tables. JZS conceptualized the project, reviewed the manuscript, and provided guidance. All authors read and approved the final manuscript.