Generation of a large scale repertoire of Expressed Sequence Tags (ESTs) from normalised rainbow trout cDNA libraries
© Govoroun et al; licensee BioMed Central Ltd. 2006
Received: 19 April 2006
Accepted: 03 August 2006
Published: 03 August 2006
Within the framework of a genomics project on livestock species (AGENAE), we initiated a high-throughput DNA sequencing program of Expressed Sequence Tags (ESTs) in rainbow trout, Oncorhynchus mykiss.
We constructed three cDNA libraries including one highly complex pooled-tissue library. These libraries were normalized and subtracted to reduce clone redundancy. ESTs sequences were produced, and 96 472 ESTs corresponding to high quality sequence reads were released on the international database, currently representing 42.5% of the overall sequence knowledge in this species. All these EST sequences and other publicly available ESTs in rainbow trout have been included on a publicly available Website (SIGENAE) and have been clustered into a total of 52 930 clusters of putative transcripts groups, including 24 616 singletons. 57.1% of these 52 930 clusters are represented by at least one Agenae EST and 14 343 clusters (27.1%) are only composed by Agenae ESTs. Sequence analysis also reveals that normalization and especially subtraction were effective in decreasing redundancy, and that the pooled-tissue library was representative of the initial tissue complexity.
Due to present work on the construction of rainbow trout normalized cDNA libraries and their extensive sequencing, along with other large scale sequencing programs, rainbow trout is now one of the major fish models in term of EST sequences available in a public database, just after Zebrafish, Danio rerio. This information is now used for the selection of a non redundant set of clones for producing DNA micro-arrays in order to examine global gene expression.
Rainbow trout, Oncorhynchus mykiss, is an important fish species for aquaculture and has been introduced throughout the world. It is also probably one of the most widely studied fish species with a long history of research carried out in physiology, nutrition, ecology, genetics, pathology, carcinogenesis and toxicology (reviewed in ). Its relatively large size compared to model fish like zebrafish or medaka, makes rainbow trout a particularly suited alternative model to carry out biochemical and molecular studies on specific tissues or cells that are impossible to decipher in small fish models. The genomic resources in rainbow trout are now being extensively developed and a few high-throughput DNA sequencing programs of ESTs have been recently initiated [2, 3]. AGENAE (Analyse du GENome des Animaux d'Elevage)  is a project led by the French National Institute for Agricultural Research (INRA), that focuses on genomics of several livestock species (cattle, pigs, chickens and rainbow trout). The objectives of this program are the identification and characterization of the expressed part of genomes, the mapping of entire genomes, and the study of genetic diversity in animal populations. As a first step for the characterization of the expressed part of the genome of rainbow trout, we initiated a high-throughput EST sequencing program. Among other interests, this resource will allow large scale expression profiling experiments using microarrays based on a well characterized cDNA clone collection.
Results and discussion
cDNA libraries construction and characterization
We constructed three directionally cloned rainbow trout cDNA libraries: two from reproductive tissues i.e., ovarian (previtellogenesis) and testicular (gonial proliferations) tissues, and one highly complex pooled tissue cDNA library. The pooled tissue library was made in order to be as representative as possible of the entire expressed genome of rainbow trout. For this purpose, mRNA from 14 different tissues (liver, kidney, adipose tissue, gills, intestine, pituitary, brain, ovary, testes, differentiating male and female gonads, muscle, interrenal and blood cells), sampled at different developmental stages or in different physiological conditions, and mRNA from entire eyed-stage embryos and hatching larvae, were used for this pooled-tissue library construction. The three resulting libraries displayed a high initial clone complexity (>1 × 106 colony-forming units). Approximately 98% of the cDNA inserts were larger than 450 bp and the average insert size ranged between 1.3 and 1.5 kb depending on the library. Each of the 3 libraries was normalized according to previously described protocols [5, 6], in order to decrease the representation of abundant mRNA. All normalized libraries were subsequently submitted to one (testis library) or two (pooled-tissue library) runs of subtraction with the already sequenced clones in order to decrease redundancy.
Summary of the numbers of sequenced and released ESTs in the different AGENAE trout cDNA libraries.
Number of sequenced Clones
Number of Sequences
In order to provide an important set of well annotated clones, the 5' end sequencing strategy was favoured. However, due to the use of an excess of oligo(dT) during the first reverse transcription of the library construction, the polyA sequence remained short enough to allow sequencing of the cDNA 3' ends. A 3' end sequencing strategy was therefore carried out on 23 040 (27.1%) of the sequenced clones with a good success rate (83.1% of good quality released sequences). This 3' strategy is a useful way to distinguish genes in a closely related family using the more divergent 3' end non coding region.
Influence of normalization/subtraction on the pooled-tissue library
Redundancy and quality of the libraries
Top 20 most redundant EST clusters.
Best swissprot hit
Over-expressed in Agenae libraries
Zona pellucida sperm-binding protein 3 precursor
Actin, alpha sarcomeric/cardiac (Actin alpha 2)
Prolactin precursor (PRL)
Hemoglobin beta-4 subunit
ES1 protein homolog, mitochondrial precursor (Protein KNP-I)
Trypsin I precursor
Somatotropin 2 precursor (Growth hormone 2)
60S ribosomal protein L12
Myosin regulatory light chain 2, skeletal muscle isoform (G2)
Somatotropin precursor (Growth hormone)
Sarcoplasmic/endoplasmic reticulum calcium ATPase 1
60S ribosomal protein L11
60S ribosomal protein L13
Nitrogen regulation protein NR(II)
Glutathione S-transferase P
60S ribosomal protein L18a
60S ribosomal protein L13a (Transplantation antigen P198 homolog)
Collagen alpha 1(I) chain precursor
Top 20 most redundant Agenae specific EST clusters.
Best swissprot hit
Over-expressed in a specific Agenae library
Zinc finger protein 318 (Testicular zinc finger protein)
NADH-ubiquinone oxidoreductase 51 kDa subunit
VEG136 protein (Fragment)
Very low-density lipoprotein receptor precursor
Complement C1q-like protein 3 precursor (Gliacolin)
Chondroitin beta-1, 4-N-acetylgalactosaminyltransferase 2
Sodium- and chloride-dependent creatine transporter 1
Carnitine O-palmitoyltransferase I, mitochondrial liver isoform
Baculoviral IAP repeat-containing protein 6
Metastasis-associated protein MTA2
Regulator of G-protein signaling 2 (RGS2)
Endonuclease III (DNA-(apurinic or apyrimidinic site) lyase)
Contribution of the Agenae EST collections
Tissue representation in the pooled-tissue cDNA library.
Testis Creatine kinase
Factor in germ line alpha (Figa)
Liver-basic fatty acid binding protein
Growth Hormone factor 1 (Pit-1)
Brain cell membrane protein 1 (BCMP1)
Intestinal mucin-like peptide
Muscle LIM protein
Although cDNA library construction and EST sequencing is a time and money consuming task, the most common strategy still consists in sequencing numerous tissue specific libraries in order to provide a large number of clusters. For instance, in the medaka, Oryzias latipes, 26 689 clusters were generated from 147 802 EST obtained from 29 different tissue specific cDNA libraries (TIGR gene Index, Release 5.0, May 17, 2004). In trout, with slightly more ESTs (157 116) coming mainly from 2 pooled-tissue libraries (AGENAE and 1RT-NCCCWA USDA), the last TIGR clustering (Release 5.0, January 31, 2005) detected twice as many clusters (50 773). The pooled-tissue libraries strategy combined with normalization/subtraction methods may thus be a better approach for enrichment of different transcripts. Actually, some recent EST projects rely on pooled-tissue libraries [2, 12, 13]. However the pooled-tissue strategy suffers from a lack of information concerning mRNA tissue origin and it is then not possible to carry out in silico analysis of tissue differential expression . A strategy based on pooled-tissue library with a tissue specific DNA identification tag, has recently been proposed . This would combine the advantages of the pooled-tissue library with keeping the information on the tissue origin of each EST.
In conclusion our rainbow trout cDNA libraries provided a large set of well characterized clones for future studies. The Agenae sequencing project together with ongoing collaborative efforts of the ARS-USDA program  and the Genome BC project  now places rainbow trout in the position of being one of the major fish models, in terms of ESTs present, in public databases just after the zebrafish, Danio rerio (for instance 24 466 clusters in UniGene Build #17 09 Feb 2006 for rainbow trout and 32 400 clusters for zebrafish in UniGene Build #89, 05 Dec 2005). We are now using this important sequence information and our corresponding clone collections for producing DNA arrays in order to examine global gene expression in rainbow trout [15, 16]. A micro-array, containing 9 000 well annotated and unique cDNAs chosen for their informative annotation and their low redundancy, is now produced in large numbers in our resource centre (CRB-GADIE) , and used for gene expression profiling by several research teams.
Tissues samples and RNA preparation
Research involving animal experimentation has been approved by the authors' institution (authorization number 35-14) and conforms to the principles for the use and care of laboratory animals in compliance with French and European regulations on animal welfare. Rainbow trout were obtained from the Drennec experimental farm (Drennec, France). For the pooled-tissue cDNA library, more than 30 different individual fish of both sexes, issued from 3 different strains (autumnal, spring and winter spawning strains) were used; these strains themselves originated from at least 3 different French or Belgium regions. The following tissues, obtained at different stages of their development for several of them, were sampled and stored at -80°C before RNA purification: liver, kidney, adipose tissue, gills, intestine, pituitary, brain, ovary, testes, early differentiating male and female gonads, muscle, interrenal, leucocytes, blastula embryos, eyed-stage and hatching larvae, skin and blood cells. For the testis and ovary libraries, testes contained only spermatogonia (Stage I and II according to Billard's classification  ), and the ovary was at the previtellogenesis stage. Total RNA was extracted from each frozen tissue using TRIzol® reagent (Gibco BRL, Gaithersburg, MD). The quality of total RNA was first checked by electrophoresis on a 1% agarose gel, then by a reverse transcription test using trace amounts of [α-32P] dCTP . The radioactive cDNA obtained was analyzed by autoradiography after electrophoresis on a denaturing alkaline agarose gel. Some total RNA samples (originating from blastula embryos, leucocytes, and skin) were found to be unsuitable for oligo(dT) primed reverse transcription and were not incorporated into the pool of total RNA used for the final construction of the pooled-tissue cDNA library. Total RNAs from the 14 tissues (liver, kidney, adipose tissue, gills, pituitary, brain, ovary, testes, differentiating gonads, muscles, intestine, interrenal and blood cells) plus entire eyed stage embryo, and hatching larvae RNAs were pooled in equal proportions. Poly-A-selected mRNA was prepared by purification of pooled total RNA on a oligo(dT) – cellulose column as previously described . Quality of mRNA purification was checked by electrophoresis of a small aliquot on 1% agarose and by a reverse transcription test using trace amounts of [α-32P] dCTP.
cDNA libraries were constructed in the pT7T3-Pac vector as initially described by B. Soares, M. Bonaldo and collaborators [5, 6, 19]. Briefly, starting from the mRNA, cDNA synthesis was carried out with a NotI-dT18 primer to allow directional cloning. After size selection chromatography (≥ 500 bp), the double-stranded cDNA were ligated to EcoRI adapters, digested with NotI, and directionally cloned into the NotI and EcoRI digested pT7T3-Pac vector. The library was electroporated and then amplified in DH10B competent cells (Invitrogen). Normalization and subtraction was carried out according to . Briefly, single strand DNA circles were produced from the directional cDNA libraries (tester DNA). These single strand DNA circles were also used to produce doubled strands DNA (driver DNA) corresponding to the inserts, by PCR using vector primers T7 and T3, flanking the insertion sites. Tester DNA was then melted and reannealed with an excess of driver DNA and the remaining single strand driver DNA (normalized or subtracted library) was then purified by hydroxyapatite chromatography. These single strands DNA molecules were then converted to partial duplex by random priming and electroporated into bacteria to produce the final normalized or subtracted library (see  for additional details).
The cDNA mean insert sizes of the libraries have been estimated on 50 individual clones by PCR using T3 and T7 as primers flanking the inserts.
The libraries were plated onto 2xYT medium and arrayed robotically into 96 well plates at the INRA National Biological Resources Centre for Animal Genomics (CRB GADIE) . Plates were then sent to a sequencing company , and bacterial clones were sequenced following plasmid DNA purification with T7 primer for 5' end sequencing and T3 primer for 3' end sequencing.
Sequence analysis and EST clustering
EST sequences were cleaned from vector and adaptor sequences and sequences containing contaminants such as E. Coli, Yeast, Mitochondria, Ribosome or Univec were removed from the analysis. Only sequences with a PHRED score over 20 on at least 100 bp were released in the EST division of the EMBL-EBI (European Molecular Biology Laboratory – European Bioinformatic Institute) Nucleotide Sequence Database. The calculation of the redundancy and proportion of clusters generated by the different EST sequencing projects was carried out using the SIGENAE trout clustering version V3 . The percentage of novelty in the pooled-tissue cDNA library was calculated as follows: knowing that some clones have been sequenced at both ends (5' and 3') one representative sequence was selected for each clone (the selected sequence was the 3' end if it existed). Then the clones were ordered by name for each block of clones [using an incremental step of 400 clones] and the number of clusters was counted. The figures shown in the graph are therefore the number of new clusters generated by the 400 next sequenced clones. This work was done using an R  home made routine extracting data from a PostgreSQL database. Sequences corresponding to putative "tissue specific" proteins in the pooled-tissue cDNA library were found using a best blast hit strategy for the approximation of orthologs rainbow trout ESTs. Tissue specific proteins were chosen according to their description as "tissue specific" in the literature and their amino-acid sequence was used to search at NCBI  for a putative orthologs in rainbow trout using a TblastN algorithm on Database "EST-others" with a query limited by the term "Oncorhynchus" and other parameters set to default. The best hit sequence was then double checked by a blastx query on a non-redundant Database. For already known tissue specific genes in rainbow trout a blastn query was carried out and EST sequences showing 100% identity were selected.
Authors would like to thank Dr MF. Bonaldo and Dr MB. Soares (University of Iowa) in acknowledgement of their help in the construction and normalization of the cDNA libraries. We also thank all the colleagues that contributed in providing tissues and vectors for the libraries construction. This work was part of the French national program AGENAE. All steps from clone picking to plate storage were carried out at the INRA Resources Centre for Animal Genomics (CRB GADIE, Jouy en Josas, France). All sequence analysis was conducted in fruitful collaboration with the AGENAE bioinformatics team (SIGENAE). This program was supported by INRA (National Institute for Agricultural Research) funds, the French Ministry of Research, and a European community IFOP grant INRA/CIPA/OFIMER. Specific requests for clones should be addressed to Karine.Hugot@jouy.inra.fr, and specific requests for EST sequence chromatograms should be addressed at firstname.lastname@example.org.
- Thorgaard GH, Bailey GS, Williams D, Buhler DR, Kaattari SL, Ristow SS, Hansen JD, Winton JR, Bartholomew JL, Nagler JJ, Walsh PJ, Vijayan MM, Devlin RH, Hardy RW, Overturf KE, Young WP, Robison BD, Rexroad C, Palti Y: Status and opportunities for genomics research with rainbow trout. Comp Biochem Physiol B Biochem Mol Biol. 2002, 133 (4): 609-46. 10.1016/S1096-4959(02)00167-7.PubMedView ArticleGoogle Scholar
- Rexroad CE, Lee Y, Keele JW, Karamycheva S, Brown G, Koop B, Gahr SA, Palti Y, Quackenbush J: Sequence analysis of a rainbow trout cDNA library and creation of a gene index. Cytogenet Genome Res. 2003, 102 (1–4): 347-54. 10.1159/000075773.PubMedView ArticleGoogle Scholar
- Rise ML, von Schalburg KR, Brown GD, Mawer MA, Devlin RH, Kuipers N, Busby M, Beetz-Sargent M, Alberto R, Gibbs AR, Hunt P, Shukin R, Zeznik JA, Nelson C, Jones SR, Smailus DE, Jones SJ, Schein JE, Marra MA, Butterfield YS, Stott JM, Ng SH, Davidson WS, Koop BF: Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics. Genome Res. 2004, 14 (3): 478-90. 10.1101/gr.1687304.PubMedPubMed CentralView ArticleGoogle Scholar
- AGENAE. [http://www.inra.fr/agenae/]
- Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, Efstratiadis A: Construction and characterization of a normalized cDNA library. Proc Natl Acad Sci USA. 1994, 91 (20): 9228-32. 10.1073/pnas.91.20.9228.PubMedPubMed CentralView ArticleGoogle Scholar
- Bonaldo MF, Lennon G, Soares MB: Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 1996, 6 (9): 791-806.PubMedView ArticleGoogle Scholar
- Conner SJ, Hughes DC: Analysis of fish ZP1/ZPB homologous genes – evidence for both genome duplication and species-specific amplification models of evolution. Reproduction. 2003, 126: 347-352. 10.1530/rep.0.1260347.PubMedView ArticleGoogle Scholar
- Zeng S, Gong Z: Expressed sequence tag analysis of expression profiles of zebrafish testis and ovary. Gene. 2002, 294 (1–2): 45-53. 10.1016/S0378-1119(02)00791-6.PubMedView ArticleGoogle Scholar
- Chang H, Gilbert W: A Novel Zebrafish Gene Expressed Specifically in the Photoreceptor Cells of the Retina. Biochem Biophys Res Commun. 1997, 237: 84-89. 10.1006/bbrc.1997.7081.PubMedView ArticleGoogle Scholar
- Davey GC, Caplice NC, Martin SA, Powell R: A survey of genes in the Atlantic salmon (Salmo salar) as identified by expressed sequence tags. Gene. 2001, 263: 121-130. 10.1016/S0378-1119(00)00587-4.PubMedView ArticleGoogle Scholar
- SIGENAE. [http://www.sigenae.org/]
- Smith TP, Grosse WM, Freking BA, Roberts AJ, Stone RT, Casas E, Wray JE, White J, Cho J, Fahrenkrug SC, Bennett GL, Heaton MP, Laegreid WW, Rohrer GA, Chitko-McKown CG, Pertea G, Holt I, Karamycheva S, Liang F, Quackenbush J, Keele JW: Sequence evaluation of four pooled-tissue normalized bovine cDNA libraries and construction of a gene index for cattle. Genome Res. 2001, 11 (4): 626-30. 10.1101/gr.170101.PubMedPubMed CentralView ArticleGoogle Scholar
- Gavin AJ, Scheetz TE, Roberts CA, O'Leary B, Braun TA, Sheffield VC, Soares MB, Robinson JP, Casavant TL: Pooled library tissue tags for EST-based gene discovery. Bioinformatics. 2002, 18 (9): 1162-6. 10.1093/bioinformatics/18.9.1162.PubMedView ArticleGoogle Scholar
- Brown AC, Kai K, May ME, Brown DC, Roopenian DC: ExQuest, a novel method for displaying quantitative gene expression from ESTs. Genomics. 2004, 83 (3): 528-39. 10.1016/j.ygeno.2003.09.012.PubMedView ArticleGoogle Scholar
- Baron D, Houlgatte R, Fostier A, Guiguen Y: Large-scale temporal gene expression profiling during gonadal differentiation and early gametogenesis in rainbow trout. Biol Reprod. 2005, 73: 959-966. 10.1095/biolreprod.105.041830.PubMedView ArticleGoogle Scholar
- Mazurais D, Montfort J, Delalande C, Le Gac F: Transcriptional analysis of testis maturation using trout cDNA macroarrays. Gen Comp Endocrinol. 2005, 142: 143-154. 10.1016/j.ygcen.2005.02.018.PubMedView ArticleGoogle Scholar
- GADIE Biologicals Resources Centre. [http://w3.jouy.inra.fr/unites/lreg/CRB/BRC/index.html]
- Billard R: Spermatogenesis and spermatology of some teleost fish species. Reproduction Nutrition Development. 1986, 26: 877-920.View ArticleGoogle Scholar
- Soares M, Bonaldo M: Constructing and screening normalized cDNA libraries. Genome analysis: a laboratory manual: detecting genes. Edited by: Birren B, Green E, Klapholz S, Myers R, Roskams A. 2000, Cold Spring Harbor. Laboratory Press, 49-157.Google Scholar
- Millegen. [http://www.millegen.com/]
- The Comprehensive R Archive Network. [http://cran.r-project.org/]
- The National Center for Biotechnology Information Basic Local Alignment Search Tool. [http://www.ncbi.nlm.nih.gov/BLAST/]
- Garber AT, Winkfein RJ, Dixon GH: A novel creatine kinase cDNA whose transcript shows enhanced testicular expression. Biochim Biophys Acta. 1990, 1087 (2): 256-8.PubMedView ArticleGoogle Scholar
- Liang L, Soyal SM, Dean J: FIGalpha, a germ cell specific transcription factor involved in the coordinate expression of the zona pellucida genes. Development. 1997, 124 (24): 4939-4947.PubMedGoogle Scholar
- Nishiu J, Tanaka T, Nakamura Y: Isolation and chromosomal mapping of the human homolog of perilipin (PLIN), a rat adipose tissue-specific gene, by differential display method. Genomics. 1998, 48 (2): 254-7. 10.1006/geno.1997.5179.PubMedView ArticleGoogle Scholar
- Zhang H, Wada J, Hida K, Tsuchiyama Y, Hiragushi K, Shikata K, Wang H, Lin S, Kanwar YS, Makino H: Collectrin, a collecting duct-specific transmembrane glycoprotein, is a novel homolog of ACE2 and is developmentally regulated in embryonic kidneys. J Biol Chem. 2001, 276 (20): 17132-9. 10.1074/jbc.M006723200.PubMedView ArticleGoogle Scholar
- Marchand O, Govoroun M, D'Cotta H, McMeel O, Lareyre J, Bernot A, Laudet V, Guiguen Y: DMRT1 expression during gonadal differentiation and spermatogenesis in the rainbow trout, Oncorhynchus mykiss. Biochim Biophys Acta. 2000, 1493 (1–2): 180-7.PubMedView ArticleGoogle Scholar
- Denovan-Wright EM, Pierce M, Sharma MK, Wright JM: cDNA sequence and tissue-specific expression of a basic liver-type fatty acid binding protein in adult zebrafish (Danio rerio). Biochim Biophys Acta. 2000, 1492 (1): 227-232.PubMedView ArticleGoogle Scholar
- Mistry AC, Kato A, Tran YH, Honda S, Tsukada T, Takei Y, Hirose S: FHL5, a novel actin fiber-binding protein, is highly expressed in gill pillar cells and responds to wall tension in eels. Am J Physiol Regul Integr Comp Physiol. 2004, 287 (5): R1141-54.PubMedView ArticleGoogle Scholar
- Ono M, Takayama Y: Structures of cDNAs encoding chum salmon pituitary-specific transcription factor, Pit-1/GHF-1. Gene. 1992, 116 (2): 275-279. 10.1016/0378-1119(92)90525-T.PubMedView ArticleGoogle Scholar
- Xu H, Foltz L, Sha Y, Madlansacay MR, Cain C, Lindemann G, Vargas J, Nagy D, Harriman B, Mahoney W, Schueler PA: Cloning and characterization of human erythroid membrane-associated protein, human ERMAP. Genomics. 2001, 76 (1–3): 2-4. 10.1006/geno.2001.6600.PubMedView ArticleGoogle Scholar
- Christophe-Hobertus C, Szpirer C, Guyon R, Christophe D: Identification of the gene encoding brain cell membrane protein 1 (BCMP1), a putative four-transmembrane protein distantly related to the peripheral myel in protein 22/epithelial membrane proteins and the claudins. BMC Genomics. 2001, 2 (3): 1471-2164.Google Scholar
- Xu G, Huan LJ, Khatri IA, Wang D, Bennick A, Fahim RE, Forstner GG, Forstner JF: cDNA for the carboxyl-terminal region of a rat intestinal mucin-like peptide. J Biol Chem. 1992, 267 (8): 5401-5407.PubMedGoogle Scholar
- Arber S, Halder G, Caroni P: Muscle LIM protein, a novel essential regulator of myogenesis, promotes myogenic differentiation. Cell. 1994, 79 (2): 221-231. 10.1016/0092-8674(94)90192-9.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.