Systematic sequencing of mRNA from the Antarctic krill (Euphausia superba) and first tissue specific transcriptional signature

Background Little is known about the genome sequences of Euphausiacea (krill) although these crustaceans are abundant components of the pelagic ecosystems in all oceans and used for aquaculture and pharmaceutical industry. This study reports the results of an expressed sequence tag (EST) sequencing project from different tissues of Euphausia superba (the Antarctic krill). Results We have constructed and sequenced five cDNA libraries from different Antarctic krill tissues: head, abdomen, thoracopods and photophores. We have identified 1.770 high-quality ESTs which were assembled into 216 overlapping clusters and 801 singletons resulting in a total of 1.017 non-redundant sequences. Quantitative RT-PCR analysis was performed to quantify and validate the expression levels of ten genes presenting different EST countings in krill tissues. In addition, bioinformatic screening of the non-redundant E. superba sequences identified 69 microsatellite containing ESTs. Clusters, consensuses and related similarity and gene ontology searches were organized in a dedicated E. superba database . Conclusion We defined the first tissue transcriptional signatures of E. superba based on functional categorization among the examined tissues. The analyses of annotated transcripts showed a higher similarity with genes from insects with respect to Malacostraca possibly as an effect of the limited number of Malacostraca sequences in the public databases. Our catalogue provides for the first time a genomic tool to investigate the biology of the Antarctic krill.


Conclusion:
We defined the first tissue transcriptional signatures of E. superba based on functional categorization among the examined tissues. The analyses of annotated transcripts showed a higher similarity with genes from insects with respect to Malacostraca possibly as an effect of the limited number of Malacostraca sequences in the public databases. Our catalogue provides for the first time a genomic tool to investigate the biology of the Antarctic krill.

Background
Euphausiacea (krill) are small shrimplike crustaceans that are abundant in the pelagic ecosystems of all oceans. There are about 85 species of Euphausiacea, making this one of the smallest orders in the class of Malacostraca [1]. Phylogenetic analysis of the Eumalacostraca orders based on 28S rDNA sequences suggests that Euphausiacea are more closely related to Mysida than to the Decapoda [2].
In the Southern Ocean, krill is a critical link between primary productivity and most of the predators at higher trophic levels such as birds, fish, seals, squid and whales [3]. The krill biomass in the Southern Ocean has been estimated at 400-1550 million tons with sustainable annual harvest at around 70-200 million tons. Therefore, krill biomass that could be available for human food is comparable to the biomass of all the other aquatic species currently fished by humans, but only six species of krill are at present harvested commercially [4,5]. Commercial fishing of krill is done in the Southern Ocean and around Japan. The global annual production amounts to 150 -200.000 tons, most of this from the Scotia Sea [6,7]. Most of the fished krill is used for aquaculture and aquarium feedings, as baits in sport fishing, or in the pharmaceutical industry.
The Antarctic krill (Euphausia superba, Dana 1852) has a circumpolar distribution with the highest concentrations in the Atlantic sector of the Southern Ocean. It is a key species of the Antarctic ecosystem and plays an important role both as feeder of algae, bacteria and micro-zooplancton and as a prey of vertebrates [8]. E. superba displays a large daily vertical migration that occurs generally within the upper 200 m water column making a significant amount of biomass available as food for predators near the surface at night and in deeper waters during the day [9]. Basic knowledge of crustacean biology is limited by the lack of information about their genomes. Considering all orders in the class of Malacostraca, no genome has yet been fully sequenced. At present Genbank carries 116,640 nucleotide and 11,932 protein sequences (Table 1), with a high rate of redundancy. Currently only 434 nucleotide and 310 protein sequences have been identified in Euphausiacea (GenBank source, release of November 2007). Specifically for E. superba only 69 nucleotide and 17 amino acid sequences have been obtained; they iden-tify key proteins and enzymes of oxidative phosphorylation (NADH dehydrogenase subunit 1, 2, 3, 4, 4L, 5; Cytochrome oxidase subunit I, II, III; ATP synthase subunit 6; cytochrome b; cytochrome b apoenzyme and cytochrome c oxidase subunit I) and of phototransduction (opsin). In the subphylum Crustacea there are 33 complete (or nearly complete) mitochondrial DNA sequences: 4 Branchiopoda, 8 Maxillopoda, 18 Malacostraca, and one of Ostracoda, Cephalocarida and Remipedia (Table 2). In a previous investigation Machida et al. [10] determined the nearly complete DNA sequence of the mitochondrial genome of E. superba (14,606 bp).
The identification of novel shrimp genes by systematic sequencing of genomic DNA is hindered by the dispersion of the genes among large non-coding regions and by the presence of introns within genes. Current genomics technologies, like SAGE [11], differential display [12] and systematic sequencing of expressed sequence tags [13,14], are very useful approaches to identify protein coding genes rapidly on a large scale. Moreover, the frequency of a given sequence in the SAGE or cDNA libraries can be related to the relative abundance of the corresponding mRNA, giving an indication of the level of gene expression [15].
The aim of our study was to significantly increase the number of krill genes in the public database and to discover tissue specific genes. For this purpose we have produced and sequenced five cDNA libraries from different Antarctic krill tissues: head, abdomen, thoracopods and photophores. We have developed special cDNA libraries optimized to directionally cloning full-length cDNA in plasmid vectors without enzymatic digestion. We have identified 1,770 high-quality EST clones that have been grouped in 1,017 different clusters. Of these, 309 clusters were successfully annotated while 708 did not show a significant similarity with known genes from other organisms. Clusters, consensus and related similarity and gene ontology searches were organized in a dedicated E. superba database [16].

EST assembly and construction of an Antarctic krill transcript catalogue
A total of 2,046 ESTs were initially analyzed for sequence quality and vector sequences were recognized and deleted. Two-hundred-seventy-six low quality ESTs were removed and 1,770 (86.5%) high-quality ESTs were further processed. These ESTs assembled by similarity into 216 clusters and 801 singletons, resulting in a total of 1,017 nonredundant (consensus) sequences. A list of the sequencing trend for each cDNA library is presented in Table 3. Interestingly, we obtained a low percentage of clusters composed by ≥ 2 ESTs from cDNA libraries prepared from the Electropherograms of E. superba tissue-specific total RNAs Figure 1 Electropherograms of E. superba tissue-specific total RNAs. (A-D) Electropherograms resulting from Agilent 2100 bioanalyzer analysis on total RNA extracted from head, abdomen, photophores and thoracopods. X-axis: time of ribosomal RNA peak appearance, corresponding to the size of the fragment; Y-axis: fluorescence of the peak, corresponding to its concentration. The size and the concentration of the sample peaks are calculated by the software via comparison with a RNA ladder at known concentration (E). E. superba RNA samples showed some products with a migration time between 22 and 35 seconds (from 200 bp to 1.000 bp) indicating a partial RNA degradation. Each non redundant sequence was searched in the nucleotides database and UniProtKB database using Blast-N and Blast-X with an e-value cut off of < e -40 and < e -10 , respectively. These values were empirically chosen considering the low amount of sequences data available for Euphausiacea and similar shrimp species and the need of stringency in providing a reliable catalogue of Antarctic krill genes. All annotations were further manually examined, in order to assign the best describing text to the correspondent cluster.
Since very few abundant transcripts were found that could hamper the identification of rare transcripts, it seems plausible that random sequencing of our Antarctic krill libraries would continue to represent an effective strategy for identifying novel E. superba mRNAs.

Functional categorization of E. superba ESTs
In order to facilitate functional genomic studies in Antarctic krill, 309 consensus sequences showing similarity with known genes or proteins were grouped into 14 functional categories (Table 4) according to Gene Ontology [18] and other resources developed for gene functional annotation [19]. A list of all annotated transcripts is shown in Additional file 1. A large majority of ESTs (20.51%), displaying putative identity with ribosomal sequences and genes for the translation machinery, were grouped in the transla-tion functional category, characterized by 8% of all known transcripts. We found genes with regulative functions in the translational initiation, like translation factor SUI1,initiation factor 4A and 3, in the translational elongation like elongation factor 1α, 1β, 2 and a specific tail muscle elongation factor 1γ. Other abundant E. superba sequences fall into gene categories related to cell structure, cell motility and functional homeostasis. For instance, genes involved in the mechanisms of DNA transcription (1.47%), transport (1.97%), skeletal muscle contraction (3.4%), and in amino acid, fatty acid and carbohydrate metabolism (4.03 %) are comprised in this category. This class includes also SEC61 β-subunit, an important transport protein that plays a crucial role in the insertion of secretory and membrane polypeptides into the endoplasmic reticulum and cellular retinoic acid/retinol binding protein (RBP1), involved in the transport of retinol from the digestive gland to peripheral tissues [20]. A transcript included in this class (ID: KRC00589) shows a good similarity with hemocyanin, the main oxygen carrier molecule in arthropods and molluscs [21]. Other interesting krill transcripts that we were able to annotate are those involved in stress responses, proteolysis and immunoresponse (3.24%) like Hsp90, chaperones, cathepsine L-like cysteine protease, a lysosomal cysteine proteinase [22], peptidyl-prolyl cis-trans isomerase A1 (cyclophilin A1) and peptidylprolyl isomerase B (cyclophilin B).
Cyclophilines are members of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. ESTs with good similarity to hemocyanin are present in our collection: this protein has been recently reported to have antifungal and antiviral activities [23,24].
Some krill ESTs identify histone 2A (KRC00431) and histone 3.3A (KRC00024) indicating the presence of unexpected polyadenylated histone transcripts displaying the polyadenylation signal and tail. In vertebrates, these evolutionary conserved housekeeping mRNAs are not polyadenylated, and this has been related to the high turnover of these transcripts in the dividing cells. Interestingly, polyadenylated H2A and H3 histone sequences were detected also in the systematic sequencing of 3'-end cDNA libraries obtained from brain and kidney of channel catfish Ictarulus punctatus [25,26] and from various tissues (haemolymph, gills, digestive glands, mantles and adductor muscles) of the mussel Mytilus galloprovincialis [27]. The presence of polyadenylation signals in E. superba histone transcripts deserves a more detailed analysis. In fact, Eirin-Lopez et al. [28] have recently shown that all histone genes in the repetitive unit are characterized by two different mRNA termination signals in their 3' UTR: the typical stem-loop or hairpin-loop signal followed by a purinerich element and a polyadenylation signal AATAAA located downstream to this last element. The presence of a double mRNA termination signal is unique to histone genes and common for other invertebrates such as Chironomus thummi [29], D. melanogaster [30], Chaetopterus variopedatus [31], M. galloprovincialis [32] and Crustacea [33]. Although in some invertebrates core histone transcripts (H2A, H2B, H3 and H4) include polyA tails, these sequences are among the most evolutionary conserved eukaryotic proteins [34].

Transcriptional signature of E. superba tissues
Gene expression profiling depends on the functional specificity of cells composing different tissues. So, the systematic sequencing of EST from unbiased cDNA libraries is a suitable approach for analyzing the gene expression profile of a given tissue [35]. In fact, the frequency of a given EST in the cDNA library can be related to the relative abundance of the corresponding mRNA in the source tissue.
To define tissue transcriptional signatures of E. superba, annotated ESTs obtained from the four tissue-specific cDNA libraries (head, abdomen, photophores, thoracopods) were separately grouped in 13 functional categories (Table 5) a further abundant category was created for those ESTs to which no function may be yet associated. Fig. 2 shows four different diagrams standing for ESTs distribution among functional categories in each cDNA library. The presence of highly represented functional categories is peculiar of strictly committed tissues such as abdomen and thoracopods in which transcripts involved in striated muscle contraction are very abundant (about 26% in abdomen and 7% in thoracopods). In the abdomen library, we were able to recognize the principal structural components of the sarcomeric contractile machinery (myosin heavy chain, myosin light chain 1, myosin-2, actin, alpha-tubulin, tropomyosin) and two subunits of the troponin complex (troponin T, troponin I), a key regulator of muscle contraction. About 10% of sequences produced from head and thoracopods libraries fall in functional categories related to metabolic processes (amino acid, fatty acid and carbohydrate metabolism). Interestingly, about 6% and 4% of ESTs respectively sequenced from head and thoracopods libraries identified structural constituents of cuticle (arthrodial cuticle protein AMP16.3, arthrodial cuticle protein AMP1A, calcification-associated peptide-1 precursor). This reflects the presence of cuticle traces in the head and thoracopods samples. In photophores and thoracopods transcripts displaying putative identity with ribosomal sequences are more abundant compared to other tissues (55% and 46%, respectively), indicating a relevant activity of the translation machinery.
We have also identified from the head cDNA library a novel opsin sequence (ID ESTs: KRC00735, KRC00802), a light-sensitive membrane-bound G protein-coupled receptors mediating the conversion of a photon of light into an electrochemical signal in the visual transduction cascade. In insects there are at least four main spectral classes: long-wavelenght-sensitive (LWS), middle-wavelenght-sensitive (MWS) and two short-wavelenght-sensitive (SWS) groups. The opsin sequences available for E. superba (GenBank accession no. DQ852576-DQ852580) show a spectral sensitivity with short wavelength (496-501 nm, λ max = 487) and cannot be aligned with our consensus [36].
Quantitative RT-PCR analysis was performed to quantify and validate the expression level of some genes presenting different EST countings in krill tissues. We selected ten genes (compound eye opsin BCRH1, myosin light chain, myosin heavy chain, arthrodial cuticle protein AMP16.3, tail muscle elongation factor 1 gamma, cellular retinoic acid/retinol binding protein, eukaryotic initiation factor 4A, transport protein SEC61 subunit gamma, chromodomain helicase DNA binding protein and voltage-dependent calcium channel) representative of different levels of transcript abundance.
The housekeeping gene 18S rRNA was used as endogenous control. As reported in the Additional file 2, the expression values obtained with the quantitative RT-PCR for the tested transcripts were in agreement with the EST counting in the four libraries. In particular, we have demonstrated that the compound eye opsin is strongly expressed in head compared to other tissues and myosin light chain and myosin heavy chain are highly expressed in abdomen and thoracopods confirming their key role in the contractile machinery. Instead, the eukaryotic initiation factor 4A is expressed at about the same level in all tested tissues.

Identification of microsatellite-containing ESTs
Among the 1,017 non-redundant sequences examined in this study, 41 (4%) consensus sequences containing ESTs were identified by using MISA software. Twelve of these consensus present 2 distinct simple sequence repeats interrupted by more than 100 bp for a total of 69 identified microsatellites (SSR). The majority of these sequences (72%) fall into the 3 bp repeat type class with a preponderance of GAA and GAT. After a manual inspection of redundancy, raw sequence, data quality and the presence of sufficient flanking sequences we designed 9 pairs of specific PCR primers. We obtained successful amplifications for 6 of these 9 pairs of primers. Assessment of polymorphism information content (PIC), observed and expected heterozygosity and other population genetics analysis will be performed in the near future. These markers will increase the currently available Euphausiacea SSR markers. In fact, only five microsatellite loci isolated from the northern krill Meganyctiphanes norvegica have been reported so far [37]. Since our novel microsatellite markers were developed on the basis of expressed sequences and they are presumably conserved across other Euphausiacea species, they could also be useful for comparative mapping and for a molecular approach to Antarctic krill ecology.

Conclusion
Since genome sequencing and BAC libraries of Antarctic krill are not yet available, EST sequencing from randomly selected cDNA clones represents a powerful approach to identify large numbers of transcripts that could be used in gene expression and functional genomics studies [38]. The systematic sequencing of four cDNA libraries prepared from different E. superba tissues has allowed us to establish an EST database containing 1,017 unique sequences. Over 65% of the Antarctic krill sequences resulted in no BLAST matches with published sequences and they probably represent novel genes that could be functionally characterized. We have defined the transcriptional signatures of krill tissues and performed qRT-PCR to validate the level of expression of ten representative genes. All sequencing data have been deposited on the E. superba EST database available from our web site [16]. In addition, the EST collection is a potential source for the development of genetic markers including microsatellite and single nucleotide polymorphisms. Among the 1,017 unique sequences, 41 (4%) unique microsatellite containing ESTs were identified by using MISA software. Moreover, we have designed and successfully tested 6 pairs of specific PCR primers for microsatellite loci.
Our EST catalogue could provide a source for the design of microarray platform that will allow the study of the

Tissues samples, RNA extraction and quality control
Antarctic krill (Euphausia superba) were fished from the Ross Sea (longitude: 167°28'81" E -179°54'68" W, latitude 68°40'54" S -77°01"81" S) in the January 2004 during the XIX Italian Antarctic Expedition. Specimens were collected at different time of the day (01:00, 06:00, 10:00, 15:00, 18:00), over a complete 24-hour cycle. Samples were frozen at -40°C in RNA stabilization solution (RNAlater, Ambion). For each fishing, selected tissues (head including compound eyes and brain, abdomen, thoracopods and photophores) from five animals were dissected individually in RNA later ice solution (Ambion). After dissection, tissues were rapidly rinsed in sterile water, weighed, frozen in Trizol reagent (Invitrogen) and stored at -80°C. A large excess of Trizol (15 ml for 0.5-1.5 g. of sample) was used in order to prevent RNA degradation by endogenous RNAse. Frozen tissues were minced and homogenized for 3-5 min using an ultra-turrax-T8.01 blender (IKA-Werke). Total RNA was isolated using the Trizol reagent (Invitrogen) following the manufacturer's instruction and further purified with LiCl in order to remove glucidic contaminants. All RNA samples were checked for quality by capillary electrophoresis (RNA 6000 Nano LabChip, Agilent Bioanalyzer 2100, Agilent Technologies). For each tissue (head, abdomen, thoracopods and photophores), equal amounts of total RNA (2 μg) extracted from every collection were pooled.  Table 5 for more details.
We have developed a new method using a combination of SMART protocol (Clontech), ensuring almost full-length cDNA, and Gateway technology (Invitrogen), allowing unidirectional cloning without enzymatic digestion. In this protocol, only fully-transcribed first strand cDNA (ss cDNA) is tagged with a short sequence complementary to a modified SMART oligo (Fig. 3).

Computer Management of Data
Trace2dbest and Partigene [41] were used to process chromatograms, clusterize sequences, and build an annotation database. Trace2dbest extracts sequences and quality information from traces (Phred algorithm), removes vector contamination and poly(A), and performs the trimming of low quality sequences. Sequences shorter than 100 bp were discarded. Partigene reads all sequence files and performs an assembling process in two step: 1) CLOBB software [42]  Each cluster annotation in our database was further manually examined to assign the best describing text to the correspondent cluster: matches with expectations values greater than e -10 for protein (Blast-X) and e-40 for nucleotide (Blast-N) were considered as poorly informative. Moreover, for each UniProt ID, taken from Blast-X description field, we associated specific Gene Ontology annotation, that integrates information about process, function, and component. Clusters, consensus and related similarity and gene ontology searches were electronically organized and stored in a dedicated PostgreSQL database.

Identification of microsatellite containing ESTs
The unique sequences were screened for microsatellites by using the MISA software [46]. Only di-, tri-, tetra-, pentaand esanucleotide repeats were targeted, since mononucleotide repeats are not useful for mapping or population genetics due to difficulties in their genotyping. Strings of oligonucleotide sequences were used to search for microsatellites: 6 repeats for dinucleotide; 5 repeats for trinucleotide; 5 repeats for tetranucleotide, pentanucleotide and esanucleotide. Primers were designed for the flanking regions of the SSR using a web-based software, "Primer3" Schematic diagram of the method used for the construction of the cDNA libraries from different krill tissues   [47], and based on the criteria of 50% GC content, a minimum melting temperature of 55°C, and absence of secondary structure. Primers ranged from 18-27 nucleotides in length and amplified products of 150-390 bp. The primers were synthesized with a 5'-KS-tail (KS sequence: 5'-cgaggtcgacggtatcg-3') allowing to amplify microsatellite alleles in combination with a 5'-fluorescent-labeled KS primer [48].

Quantitative RT-PCR
Quantitative RT-PCR was conducted for some genes using the same tissues tested (head, abdomen, thoracopods and photophores) to confirm the integrity and robustness of EST sequencing.
Three μg of total RNA from each tissue was used to perform three independent cDNA syntheses in a final volume of 10 μl, using random decamers and SuperScript II reverse transcriptase (Invitrogen). 1 μl aliquot of diluted first-strand cDNA was PCR amplified in 10 μl volume using SYBR Green chemistry, according to the manufacturer's recommendations (Finnenzymes). Gene-specific primers were designed using Primer Express ® Software (Applera) to amplify fragments of 120-180 bp in length, close to the 3'-end of the transcript. To avoid the amplification of contaminant genomic DNA, we treated total RNA samples with DNase I (Qiagen). The dissociation curve was used to confirm the specificity of the amplicon. PCR reactions were performed in a 7500 Real-Time PCR System (Applied Biosystems). Thermal cycling conditions were as follows: 15 min denaturation at 95°C; followed by 40 cycles of 30 sec denaturation step at 95°C, annealing and elongation steps for 1 min each at 60°C and a final 3 min elongation at 72°C. To evaluate differences in gene expression a relative quantification method was chosen where the expression of the target gene is standardized by a non-regulated reference gene; consequently, three replicates of each sample and endogenous control were amplified. 18S rRNA was used as an endogenous control because the level of rRNA remains essentially constant from sample to sample (QuantumRNA™ 18S Internal Standards, Ambion). To calculate the relative expression ratio, the 2 -ΔΔCt (RQ, relative quantification) method implemented in the 7500 Real Time PCR System software [49] was used. This method determines the change in expression of a nucleic acid sequence (target) in a test sample relative to the same sequence in a calibrator sample. In our experiments, the expression of ten targets was tested (compound eye opsin BCRH1, myosin light chain, myosin heavy chain, arthrodial cuticle protein AMP16.3, tail muscle elongation factor 1 gamma, cellular retinoic acid/retinol binding protein, eukaryotic initiation factor 4A, transport protein SEC61 subunit gamma, chromodomain helicase DNA binding protein and voltage-dependent calcium channel) which displayed differential expression in the head, thoracopods and photophores, compared with the abdomen.

Availability and requirements
Project name: Systematic sequencing of mRNA from the Antarctic krill (Euphausia superba); Project home page: http://krill.cribi.unipd.it; Operating system(s): Debian GNU/Linux; Programming language: PHP; Licence: none; Any restrictions to use by non-academics: none.

Authors' contributions
CDP performed total RNA sample preparation, construction and systematic sequencing of the cDNA libraries, annotation of ESTs, qRT-PCR and drafted the manuscript. CB conceived the study, carried out ESTs analysis and drafted the manuscript. GMM and GR participated in systematic sequencing of cDNA libraries, in design of the study and revision of the manuscript. FB performed bioinformatic analysis of cDNA libraries sequence data, clustering of ESTs and annotation of ESTs and identification of microsatellite containing ESTs. BDN and AP participated in development of cDNA libraries production method and identification of microsatellite containing ESTs. GL supervised the study, participating in the design and coordination of the work, the interpretation of the results and revision of the manuscript. RC supervised the study, participating in the design and coordination of the work, the interpretation of data and manuscript writing. All Authors read and approved the final version of the manuscript declaring that they have no potential conflicts of interests.

Additional file 1
This