Comprehensive EST analysis of Atlantic halibut (Hippoglossus hippoglossus), a commercially relevant aquaculture species
© Douglas et al. 2007
Received: 07 March 2007
Accepted: 04 June 2007
Published: 04 June 2007
Skip to main content
© Douglas et al. 2007
Received: 07 March 2007
Accepted: 04 June 2007
Published: 04 June 2007
An essential first step in the genomic characterisation of a new species, in this case Atlantic halibut (Hippoglossus hippoglossus), is the generation of EST information. This forms the basis for subsequent microarray design, SNP detection and the placement of novel markers on genetic linkage maps.
Normalised directional cDNA libraries were constructed from five different larval stages (hatching, mouth-opening, midway to metamorphosis, premetamorphosis, and post-metamorphosis) and eight different adult tissues (testis, ovary, liver, head kidney, spleen, skin, gill, and intestine). Recombination efficiency of the libraries ranged from 91–98% and insert size averaged 1.4 kb. Approximately 1000 clones were sequenced from the 5'-end of each library and after trimming, 12675 good sequences were obtained. Redundancy within each library was very low and assembly of the entire EST collection into contigs resulted in 7738 unique sequences of which 6722 (87%) had matches in Genbank. Removal of ESTs and contigs that originated from bacteria or food organisms resulted in a total of 7710 unique halibut sequences.
A Unigene collection of 7710 functionally annotated ESTs has been assembled from Atlantic halibut. These have been incorporated into a publicly available, searchable database and form the basis for an oligonucleotide microarray that can be used as a tool to study gene expression in this economically important aquacultured fish.
Atlantic halibut is a cold-water flatfish native to the North Atlantic that shows excellent potential for production in aquaculture due to its highly prized white flesh. Flatfish have long been a choice food fish, with many members of the group e.g., halibuts, flounders, soles, turbot, and plaice, having great commercial value especially in Asia. With the general worldwide decline in the wild fishery, and the predicted global collapse of all currently fished species by the year 2048 , it is crucial that alternatives such as aquaculture be pursued. Investigations into producing flatfish by aquaculture have been underway for the last fifteen to twenty years. Aquaculture production of Japanese flounder, turbot, Atlantic halibut and others has now been successfully achieved, although improvements in efficiency are still clearly required.
Production of Atlantic halibut is relatively recent and is currently underway in Norway, Iceland, Scotland and Canada. Significant hurdles must still be overcome, particularly with regard to judging when to spawn females, reproduction and sex determination , nutrition [3, 4], and enhancing disease resistance. The application of genomics technologies to thoroughly characterize the biological processes of reproduction, development, nutrition, and immunity promises to improve our knowledge of this poorly understood fish and provide for long-term enhancements in aquaculture production.
Flatfish, members of the order Pleuronectiformes, comprise a biologically interesting group of fish. During development, in a process known as metamorphosis, these fish reorient themselves to lie on one side, the body flattens, and the eye migrates to the other side of the body. This settling of the fish on the side vacated by the migrating eye requires a complex reorganization of skeletal, nervous and muscle tissues . Significant losses due to mortality in the early larval stages, as well as developmental abnormalities such as malpigmentation [6, 7], bone deformities , and incomplete eye migration , have hampered the successful production of flatfish. A better understanding of these processes at the molecular level and the impact that rearing conditions have on survival, metamorphosis, and growth will improve the commercial feasibility of flatfish aquaculture.
Expressed sequence tag (EST) surveys of species of interest provide a great deal of information-rich data on the expressed portion of an organism's genome  and are invaluable in comparative genomics . In addition, they can be an important source of microsatellites and single nucleotide polymorphisms (SNPs) that can be used for genetic mapping . EST surveys in teleosts have been performed from both non-normalised cDNA libraries and libraries generated by suppression subtractive hybridisation  in order to elucidate genes involved in immunity [13–28], muscle formation , endocrinology [30–32], and toxin production . The most highly represented teleost in dbEST is the zebrafish, Danio rerio, with over 1.3 million ESTs. Salmonids are also well-represented with over half a millions ESTs. Economically important fish species such as catfish, cod, Japanese flounder, sea bream and sea bass have increasingly been the subject of genomic studies and these species are now represented by several thousand or even tens of thousands of ESTs.
Among the flatfish, considerable effort has been made to determine ESTs for Japanese flounder [13, 17, 22]. There are currently 8842 ESTs for this species, although most are unannotated. Furthermore, the majority of these sequences arise from non-normalised cDNA libraries or those constructed from immune-stimulated fish; few ESTs have been sequenced from non-immune tissues or different developmental stages of flatfish. For turbot, the other main commercially relevant flatfish, there are no published EST studies and most of the sequences from this species in GenBank are microsatellites or rRNA.
As a first step towards developing genomics tools for Atlantic halibut, a large-scale EST survey was performed, annotations undertaken where possible, and a searchable database set up. Considerable effort has been made to annotate these ESTs and associate them with Gene Ontology (GO) terms to facilitate subsequent microarray analyses. The species-independent structured GO vocabulary  is widely accepted and used in most large scale genome annotation projects.
Characteristics of Atlantic halibut normalised cDNA libraries
Avg. insert size (kb)
5 to 15
Midway to Metamorphosis
Most highly represented clones found in each cDNA library
14 kDa apolipoprotein
ribosomal protein L3
fish eggshell protein
NAD(P)H dehydrogenase quinone 1
myosin heavy chain
14 kDa apolipoprotein
apolipoprotein AI precursor
alpha actin 1
myosin light chain 2
Screening of EST sequences for short tandem repeats (2 - 5 bp) identified 129 that contained microsatellite sequences. Of these, 60 had 2 bp repeats (mostly GT or GA), 58 had 3 bp repeats, 7 had 4 bp repeats and 4 had 5 bp repeats. Sixty of these loci were polymorphic and were incorporated into our halibut linkage map (D. Reid, C. Smith, D. Martin-Robichaud, M. Reith, unpublished). All EST data have been deposited in GenBank (accession numbers EB029285-EB041700 &EB080851-EB080975), and preliminary annotations are available on the Pleurogene website .
Classification of Atlantic halibut unique sequences
Number of Sequences
No BLAST hit
BLAST hit >e-10 to unknown protein
BLAST hit >e-10 to unknown EST
Functionally annotated protein
BLAST hit >e-10 to known protein
Domain name-containing protein
Gene Ontology (107)
We also performed a similar analysis on the 4110 partially annotated Atlantic halibut sequences deposited in GenBank by other groups. After assembly, a unigene set of 2337 sequences was obtained of which 40 were rRNA, 781 were unclassified, 80 matched unassigned proteins and 1436 were informative annotations. However, of the informative annotations, 531 fell into five clusters: nuclease diphosphate kinase B (73), cytochrome c oxidase subunit III (109), cytochrome c oxidase subunit II (182), cytochrome c oxidase subunit I (62) and cytochrome b (105). Similarly, of the 788 sequences that received COG annotations, over half (487) were associated with energy production and conversion, and were predominantly of mitochondrial origin.
When the complete EST set of 12675 sequences was searched for GO terms using the GOSLIM classification, two libraries were substantially more enriched in GOSLIM terms than others: the ovary library had 4496, and the testis library had 3371 hits to GOSLIM terms, respectively. Some of this enrichment can be explained by the increased number of ESTs sequenced from these libraries (50%), but even taking this into account, there are still many more GOSLIM terms in each of these libraries than the other tissue-specific libraries, which have between 1017 (gill) and 1871 (liver) terms. Of the larval libraries, that from the mouth-opening stage had 2415 hits to GOSLIM terms. Interestingly, the library constructed from larvae midway to metamorphosis had only 443 hits to GOSLIM terms. The remainder of the libraries had between 1444 and 1788 GOSLIM terms.
Classification of Atlantic halibut unique sequences according to COG
Amino acid transport and metabolism
Carbohydrate transport and metabolism
Coenzyme transport and metabolism
Inorganic ion transport and metabolism
Lipid transport and metabolism
Nucleotide transport and metabolism
Secondary metabolites biosynthesis, transport and catabolism
Energy production and conversion
Total metabolism and energy
Cell cycle control, cell division, chromosome partitioning
Chromatin structure and dynamics
Replication, recombination and repair
RNA processing and modification
Translation, ribosomal structure and biogenesis
Posttranslational modification, protein turnover, chaperones
Total nucleic acid processes
Cell wall/membrane/envelope biogenesis
Total cell structure
Intracellular trafficking, secretion, and vesicular transport
Signal transduction mechanisms
General function prediction only
Most commonly represented KEGG classifications of Atlantic halibut unique sequences
Purine & pyrimidine metabolism
SNARE interactions and vesicular transport
Arginine and proline metabolism
Complement and coagulation cascades
Arachidonic acid metabolism
To improve our understanding of flatfish biology and the problems associated with their development and rearing, a comparative genomics program focusing on Atlantic halibut and Senegal sole (Solea senegalensis) has been initiated (see ). As a prelude to construction of a DNA microarray, the EST survey reported here has been carried out.
Two previous EST surveys have been conducted in Atlantic halibut: one from a study of the effect of vaccination , resulting in approximately 1000 sequences, and a second from an investigation of muscle somite formation , resulting in approximately 4250 sequences. The study reported here greatly enriches the genomic resources for this commercially important flatfish by adding more than 12,000 ESTs to the partially annotated sequences that had already been deposited.
Over 5000 of the 7710 unique transcripts represented by the ESTs have been functionally annotated. These annotations and the development of a searchable database containing all of the information associated with each EST add enormous value to such a study. The main categories of genes represented in the ESTs are involved in binding, catalytic activity, transport, metabolism, response to stimuli, signal transduction, nucleic acid processes, and cellular biogenesis. Again, this adds substantially to the Atlantic halibut sequences from other research groups, many of which were of mitochondrial origin, and which contained slightly over 900 informative AutoFACT annotations.
With only modest resources for EST sequencing available, we choose to normalize our cDNA libraries so as to maximize the number of different ESTs sequenced. The normalization method used (Evrogen Trimmer kit) was very effective at reducing the number of highly expressed cDNAs. Since the libraries are well-normalised (redundancy factor of only 1.5), it is not possible to gain insights into the actual abundance of different types of transcripts; however, the enrichment in GO terms in the reproductive tissues and one of the larval libraries indicates that these libraries represent a broad diversity of transcripts, indicative of the high metabolic and proliferative characteristics of ovary and testis tissues. Larvae at the early mouth-opening stage of development are also undergoing tremendous metabolic changes as they transition from the yolk-sac to first-feeding stages. On the other hand, the library made from larvae midway to metamorphosis has very few GO terms associated with it, possibly because a large number of genes associated with the unusual metamorphic process in flatfish have not yet been described.
A number of ESTs were restricted to only a single tissue library and as such, may be good tissue-specific markers for in situ hybridisation and aid in tracking the appearance of different tissues during development [39, 40]. For example, ESTs for a renal organic ion transporter and nephrosin are only found in the head kidney library. Several transporters (for amino acids and various solutes), binding proteins (for lipid, sterol and lipoproteins) as well as digestive enzymes (elastases, peptidase, aminopeptidase N, carboxypeptidase B, triglyceride lipase) are only found in the intestine library. Various complement components, apolipoproteins, fatty acid binding protein, alpha-2-macroglobulin and biliverdin reductase A are only present in the liver library. A single EST unique to spleen was identified - metaxin 2, similar to the von Willebrand clotting factor. Unique to the ovary library were a number of ESTs specific to reproduction - zona pellucida protein, vitelline envelope protein, chorion protein, choriolytic enzyme, aquaporin, alveolin, estrogen receptor binding protein and luteinizing hormone beta. Similarly, the testis library uniquely contained ESTs specific to reproduction - spindlin protein C, testis intracellular mediator protein, a cysteine and glycine-rich protein, and periostin. The gill library uniquely contained two ESTs involved in chloride transport and the skin uniquely contained keratin, epithelial membrane protein-3, epiplakin and dermatopontin.
The liver, head kidney and spleen libraries proved to be an excellent source of genetic information concerning hematopoiesis and immune function in this fish. The head kidney is the major site of hematopoiesis in fish and an EST survey of zebrafish kidney revealed many insights into this process in fish . From our Atlantic halibut EST survey, many complement components, immune type receptors, lectins, defense proteins, MHC I and II components, cytokines, chemokines as well as signal transduction molecules and transcription factors involved in expression of immune genes were identified. It should be noted that ESTs for components of the immune system were also found in other tissue libraries, particularly those exposed to the environment such as skin and gill; these arose from circulating or resident immune cells in these tissues. This new sequence information will greatly enhance our understanding of the immune system of flatfish and provide molecular tools for further studying disease resistance.
The identification of microsatellite sequences in the Atlantic halibut ESTs will aid in the completion of a genetic linkage map of Atlantic halibut that is currently being constructed. Since these microsatellites are linked to genes, they are useful as Type I markers.
The addition of over 7700 ESTs, of which 5040 are functionally annotated, significantly enhances the genomic tools available for non-model fish species. Given the high degree of sequence similarity between flatfish species, the Atlantic halibut ESTs will be of great interest to the flatfish researchers in general, as well as the halibut aquaculture research community. The publicly accessible, searchable database also adds substantial value to the genomic data generated in this study. This EST survey has provided a number of microsatellite markers that have been placed on the Atlantic halibut genetic linkage map (Reith, pers. comm.) as well as probes for cellular localisation studies by in situ hybridisation. It has also laid the groundwork for the design and construction of a microarray for studying gene expression under different environmental conditions to better understand the impact of nutrition, stress, and environmental conditions on aquaculture production.
Larvae were reared at Scotian Halibut Limited (Clarks Harbour, NS, Canada) in constant light (approximately 1000 lux at the surface) in 7 m3 tanks with flow-through salt water (32 ppt) maintained at 11 ± 0.2°C using a heat exchanger. Larvae were fed Artemia until weaning onto artificial feed at 65 DPH. The ages and sizes of the larvae at the different stages were as follows: hatching (1 dph; 10 mm), mouth-opening (21 dph;15 mm), midway to metamorphosis (64 dph; 20 mm), premetamorphosis (91 dph; 25 mm), and post-metamorphosis (104 dph; 30–35 mm).
Livers were sampled from one male (104 cm; 18 kg) and one female (130 cm; 31 kg) adult broodstock maintained at St. Andrews Biological Station, St. Andrews, NB, Canada. All other tissues were sampled from 3 male and 3 female immature fish (672–945 g) reared at the Institute for Marine Biosciences Marine Research Station (Sambro, NS, Canada). Immediately before sampling, fish were transferred to a bucket containing an overdose of TMS-Aqua MS-222 (Syndel, Vancouver, BC, Canada). Ovaries from 3 females were pooled, as were testes from 3 males. Gill, head kidney, intestine, skin, and spleen samples from each group of 6 fish were pooled, and all tissue samples preserved in RNALater (Ambion, Austin, TX, USA), and stored at -80°C until use. Larvae were pooled and preserved in RNALater, and stored at -80°C until use. All animal procedures were approved by the NRC Institute for Marine Biosciences Animal Care Committee.
For liver, mRNA was extracted from one female and one male halibut using the FastTrack kit (Invitrogen, Burlington, ON, Canada) and equal amounts of mRNA were combined. For spleen, total RNA was extracted from pooled tissues using Trizol Reagent (Invitrogen). For other tissues, mRNA was extracted from pooled tissues using the Micro-FastTrack kit (Invitrogen). For larval libraries, RNA was extracted from pooled samples (~20 larvae each for post-hatch and mouth-opening stages, 15 larvae for midway to metamorphosis, and 5 larvae each for pre- and post-metamorphosis) using Trizol Reagent. All RNA isolation kits were used according to the manufacturer's protocols.
First strand cDNA was prepared from 0.25–0.4 μg mRNA or 2 μg total RNA using the Creator SMART cDNA method (Clontech, Palo Alto, CA, USA) and PowerScript reverse transcriptase (Clontech). The CDS-3M adaptor, included in the TRIMMER-DIRECT kit (Evrogen, Moscow, Russia), was used instead of the SMART CDSIII primer. cDNA was amplified by LD-PCR according to the Creator SMART cDNA method (Clontech) using the 5' PCR primer as the forward and reverse primer. The optimal number of cycles to yield sufficient cDNA for normalisation, but remain within the exponential phase of amplification, was determined by analysing aliquots of the PCR reaction after every second cycle on agarose gels. In all cases, sufficient cDNA was obtained in 18 cycles or less, ensuring even the rarest messages were represented. Amplified cDNA was purified using the QIAquick PCR purification kit (Qiagen, Valencia, CA, USA), quantitated using a NanoDrop® ND-1000 spectrophotometer (NanoDrop® Technologies Inc, Wilmington, DE, USA) and normalized using the TRIMMER-DIRECT protocol (Evrogen). After digestion with SfiI, products smaller than 500 bp were removed using the Chroma Spin-400 column column as described in the Creator SMART protocol.
The resulting cDNAs were directionally cloned into the SfiI sites of pDNR-LIB (Clontech) and transformed into ElectroMAX DH10B T1 phage-resistant cells (Invitrogen) by electroporation using the Cell Porator and Voltage Booster system (Gibco BRL). The Cell Porator settings were 400 V, 330 μF capacitance, low Ω impedance and fast charge rate, and the Voltage Booster was set at 4 kΩ. For each library, 106 primary transformants were amplified by the semi-solid amplification method described in Stratagene's pBluescript XR cDNA library construction kit manual. Randomly picked clones (96 from each library) were screened for insert size by protoplasting  or by PCR using the M13 forward and reverse primers flanking the multiple cloning site of the vector.
Individual bacterial colonies were picked into 96- or 384-well plates containing LB/glycerol using the QPix colony picker (Genetix Ltd., New Milton, Hampshire, UK). A 96-well test plate was prepared from each library for sequencing and if the quality of the library was good, clones were sequenced from two additional 96-well and two 384-well plates, giving a total of 1056 reads. Two additional 384-well plates were sequenced from each of the ovary and testis libraries. Plates were incubated overnight at 37°C. The resulting bacterial suspensions were inoculated into lysis buffer and denatured at 95°C for 5 minutes. DNA from each clone was amplified using TempliPhi™ DNA polymerase (GE Healthcare, Baie d'Urfe, QC, Canada) according to manufacturer's instructions. DNA sequencing was performed using ET terminator chemistry (GE Healthcare) in the 5' direction (primer sequence GGCCGCATAACTTCGTATAGC). Reactions were processed using Sera-Mag™ magnetic carboxylate-modified microparticles (Seradyn™, Indianapolis, IN, USA) to remove excess fluorescent terminators before loading onto GE Healthcare MegaBACE 4000 capillary DNA sequencers. Clones from each library were replicated into glycerol stocks and stored at -80°C.
ESTs were clustered using Paracel Transcript Assembler 3.0 (Paracel Inc., Pasadena, CA), which is based on the CAP4 clustering algorithm . Annotation was performed using AutoFACT  and the default parameters with UniProt's UniRef90, NCBI's nr, KEGG, COG, PFAM, LSU, SSU and, for contigs, est_others. AutoFACT summary results are stored in our database . Each sequence has one or more AutoFACT results associated with it. Each AutoFACT result is related to the most informative BLAST hit from each of the queried set of databases. Those hits are also stored in the Pleurogene database. GO annotations associated with either the contributing hits or the AutoFACT results are stored in a separate table.
Due to the low level of GO annotation obtained with AutoFACT (1640 sequences out of 7710), we chose to run the unannotated sequences through Goblet  using the vertebrate database and a BLAST cutoff of e-10. This increased the annotation level by 1736 sequences. Because the default criteria for a match in Goblet is lower than in AutoFACT and these sequences had already passed through AutoFACT, these GO annotations should be regarded as less reliable. GO terms were also identified for an additional 502 sequences by searching InterPro. Functionally annotated ESTs with GO annotations were classified using GOSlim and each category with more than 5 hits was plotted. Individual ESTs in each library (13,000 total) were also searched for GOSLIM terms and compiled by library.
Short tandem repeats (2 - 5 bp) were detected using Tandem Repeats Finder .
This project entitled: PLEUROGENE :Flatfish genomics - Enhancing commercial culture of Atlantic halibut and Senegal sole, was funded by the Genome Canada-Genome España joint program. We thank Debbie Martin-Robichaud (St. Andrews Biological Station) and Scotian Halibut Ltd., Clark's Harbour, Nova Scotia for tissue and larval samples of Atlantic halibut and Jeff Gallant and Harry Murray for storage and inventory of samples. Sequencing by The Atlantic Genome Centre, Halifax, Nova Scotia, a partnership between Genome Atlantic and the Canadian National Research Council Institute for Marine Biosciences, is gratefully acknowledged. This is NRC publication number 2007-42694.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.