An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags

Lee, Bo-Young; Howe, Aimee E; Conte, Matthew A; D'Cotta, Helena; Pepey, Elodie; Baroiller, Jean-Francois; di Palma, Federica; Carleton, Karen L; Kocher, Thomas D

doi:10.1186/1471-2164-11-278

Research article
Open access
Published: 30 April 2010

An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags

Bo-Young Lee¹,
Aimee E Howe¹,
Matthew A Conte¹,
Helena D'Cotta²,
Elodie Pepey²,
Jean-Francois Baroiller²,
Federica di Palma³,
Karen L Carleton¹ &
…
Thomas D Kocher¹

BMC Genomics volume 11, Article number: 278 (2010) Cite this article

5716 Accesses
40 Citations
3 Altmetric
Metrics details

Abstract

Background

Large collections of expressed sequence tags (ESTs) are a fundamental resource for analysis of gene expression and annotation of genome sequences. We generated 116,899 ESTs from 17 normalized and two non-normalized cDNA libraries representing 16 tissues from tilapia, a cichlid fish widely used in aquaculture and biological research.

Results

The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp. Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10^-10) to the UniProt database.

Conclusion

Normalization of the cDNA pools with double-stranded nuclease allowed us to efficiently sequence a large collection of ESTs. These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.

Background

Tilapia is the common name for a group of 40-50 nominal species of cichlid fishes (Order Perciformes) native to Africa and the Middle East. Tilapias have been an important aquaculture species for thousands of years. Sometimes called the 'aquatic chicken', tilapia are sturdy and adaptable fish, and are now cultured in more than 100 countries in Asia and the Americas [1]. Tilapias are a cornerstone of future aquaculture production, and genetic improvement of this species is needed to increase growth rate and improve disease resistance. Cichlid fishes are also important models for research on vertebrate physiology, behavior, and evolutionary biology. Because of their close evolutionary relationship, genetic resources developed for tilapia are also useful for studying the extraordinary radiation of haplochromine cichlids in the lakes of East Africa [2].

Genetic resources for tilapia are relatively well developed, and include a microsatellite-based genetic map [3] and a physical map based on BAC fingerprints [4]. In contrast, EST resources for these species are limited. Modest EST projects have been published for some haplochromine cichlids [5–7]. Although several labs have constructed cDNA libraries for tilapia, and a few thousand ESTs have been characterized [8–10], until now large-scale sequencing of ESTs has not been pursued in tilapia.

The applications of a high-quality EST resource are manifold. Many researchers are interested in constructing microarrays to study changes in gene expression during development [11] and in response to environmental stressors including handling [12], temperature [13], salinity [14], disease [15, 16] and environmental contaminants [17–19]. ESTs will provide an important resource for annotating the forthcoming tilapia genome sequence. A major effort is needed to enlarge the database of cichlid ESTs to facilitate analysis of gene expression and contribute to genome annotation. Here we describe the sequencing of almost 117,000 ESTs from highly normalized libraries constructed from 16 tissues of the Nile tilapia, Oreochromis niloticus.

Results

cDNA libraries and EST sequences

We constructed a total of two non-normalized and 17 normalized libraries from 16 tissues or developmental time points (Table 1). Three normalized libraries captured gene expression during early development (pooled whole animals at 0-4 days, 5-15 days, and 16-40 days post fertilization). Normalized libraries were also constructed for brain, heart, kidney, liver, olfactory epithelium, skin, skeletal muscle, spleen and stomach. Normalized libraries were constructed separately for adult ovary and testis, along with a normalized mixed library of differentiating ovary and testis from 14 - 56 days post-fertilization. For gill and retina, paired libraries (non-normalized and normalized) were compared.

Table 1 cDNA libraries from Oreochromis niloticus.

Full size table

The normalization procedure was highly effective. Figure 1 presents Southern blots for non-normalized and normalized retinal libraries probed for rhodopsin. Each panel contains approximately 9,000 clones. The non-normalized panel shows approximately 360 clones positive for rhodopsin. The normalized panel shows only about 9 positive clones. For this gene then, the number of duplicate clones has been reduced approximately 40-fold.

We sequenced approximately 96 clones from each library to check the size and quality of the inserts. We then submitted approximately 5,000 clones from each library for high-throughput sequencing. After analysis of these sequences, we selected some libraries for additional sequencing. We obtained at least partial sequence data from 137,654 clones. The number of sequences obtained from each library is shown in Table 1. The sequences were then passed through the EST2uni analysis pipeline [20]. This pipeline includes a preprocessing step that includes vector trimming and masking of repetitive and low complexity sequences. The total number of reads passing the trimming step was 116,899 (84.9%).

Clustering of tilapia ESTs

We performed separate Cap3 assemblies to assess the rate of sequence discovery for each library (Table 2). After sequencing 5,000 clones from each library, the rate of discovery ranged from 1.1 to 1.6 reads/discovery (Figure 2). This quantification allowed us to select the least redundant libraries for further sequencing.

Table 2 Clustering statistics for each library.

Full size table

As expected, the number of reads required to discover a new sequence increased over time. Figure 2 plots the discovery statistic for the brain3 and retina4 libraries at different levels of sequence coverage. Both libraries showed a gradual increase in the number of reads required to discover a new sequence.

Final assembly

After sequencing of all the libraries was complete, we analyzed all of the sequences together to produce a single assembly consisting of 56,218 unigenes averaging 625.6 bp in length, for a total assembled length of 35,168,415 bp. The distribution of contig lengths is shown in Figure 3.

The average number of ESTs per unique sequence (contigs + singletons) was 2.1. The average number of ESTs per contig was 4.0, and the maximum number of ESTs in a contig was 404. The nine contigs with more than 80 sequences each were mostly genes that we expect to be highly expressed, including hemoglobin, rhodopsin, parvalbumin, MHC class I, and mitochondrial cytochrome c oxidase subunits 1, 2 and 3. Two of these exceptionally large contigs appeared to be artifacts representing assembly of repetitive sequences.

Figure 4 shows the number of libraries that contributed sequences to each contig. The modal value is two, and very few contigs contain sequences from more than five libraries. This is partly explained by the small number of sequences contributing to each contig, but also indicative of the low redundancy across libraries.

Sequence similarity and functional annotation

41,129 (35%) of the unassembled ESTs and 17,722 (32%) of the assembled unigenes found a significant (e^-10) blast match in the UniProt database. We calculated the fraction of ESTs and Unigenes that were complete on the 5' and 3' ends by scoring the number of sequences that matched to within 10 amino acids of the 5' and 3' ends of each UniProt entry (Figure 5). This proportion varied with the length of the UniProt entry (Table 3). Sixty-five percent of the ESTs matching UniProts <250aa were complete on the 5' end. For UniProts between 251 and 500 amino acids this fraction dropped to 36%. Similarly, 68% of the unigenes matching UniProts <250aa were complete on the 5' end, and the fraction dropped to 36% for UniProts between 251 and 500 amino acids. Overall 13.9% of the ESTs and 13.1% of the unigenes were complete on both the 5' and 3' ends. 17,505 (31%) of the unigenes were mapped to candidate Gene Ontology (GO) terms. Following the application of the annotation rules, 12,792 (22.8%) of the unigenes were annotated with GO terms. The proportion of unigenes annotated with GO terms can be seen for each of the three GO functional categories (biological process, molecular function, and cellular component) in Additional file 1: Figure S1. 10,527 (18.7%) of the unigenes had a single-directional best hit to the KEGG pathway database.

Table 3 Distribution of ESTs and Unigenes on Uniprot entries. The criteria for completeness of the EST was whether it reached within 10 amino acids of the end of the Uniprot entry. Values indicate the proportions within each size class.

Full size table

Microsatellites and SNPs

We identified 1,108 microsatellites in the assembled sequences. These include 592 dinucleotide, 319 trinucleotide and 197 tetranucleotide repeat loci. The most abundant repeats were AC dinucleotides (511 instances) followed by AAT trinucleotides (99 instances). Most of the microsatellites identified were in the 3'UTR (46%) or the 5'UTR (30%). Only 24% were found within the open reading frame.

The EST2uni pipeline identified 48,309 candidate SNPs in the assembled contigs. For 2,542 of these candidate SNPs each of the alternative bases was represented at least twice. Of these, 1329 (52%) represented transition substitutions and 1213 (48%) represented transversions.

Discussion

Effectiveness of normalization

Normalization of copy number during library construction produces a more even distribution of clone frequencies, increasing the rate of gene discovery [21]. When very large numbers of ESTs are needed, serial-subtractive hybridization is an effective, but technically demanding, method for removing duplicate clones [22, 23]. An alternative is cherry-picking of unique clones identified by filter hybridization [24].

We normalized the cDNA population before cloning using a duplex-specific nuclease from the Kamchatka crab [25]. This enzyme can be used to degrade rapidly hybridizing components of a cDNA mixture, reducing the representation of highly expressed genes. This method proved highly effective, reducing the representation of the more abundant transcripts by 40-fold. We further increased our chances of detecting rare transcripts by constructing libraries from a large number of different tissues and developmental stages. The overall rate of gene discovery for this project compares favorably with projects of similar scope.

To further study the effectiveness of normalization, we used BLAST to screen the normalized retinal library for genes expected from rod and cone phototransduction. We identified rhodopsin as well as 6 of the 7 cone opsins in these libraries (all but RH2B). We further identified numerous genes from the rod phototransduction pathway, including all three subunits of the G protein (GNAT1, GNB1, GNGT1), phosphodiesterase subunits (PDE6a, PDE6b, PDE6g), the sodium/calcium exchanger (NCKX1) and G protein kinase (GRK1). For the cone pathway, we were able to find arrestin (Arr3), but did not identify other phototransduction proteins. In most vertebrates, rod photoreceptors are more prevalent than cone photoreceptors. In fish the ratio or rods to cones is typically 50:1 [26, 27]. The expression of the various phototransduction genes is less than opsin by factors of 10 (G protein) to 100 (PDE) or more [28]. Our normalization scheme reduced the representation of the most highly expressed genes by a factor of 40. If expression of genes in the cone pathway is 50× less frequent than that of the rod, and if the components of the rest of the phototransduction pathway are 10× less frequent, it is perhaps not surprising that we only observed cone opsins and cone arrestin from the cone pathway. While this confirms the effectiveness of our normalization methods, it suggests that additional genes would be discovered if we continued to sequence these highly normalized libraries.

Anchors for comparative mapping

Microsatellites associated with genes provide an efficient means to construct comparative maps among species [29]. In channel catfish, as many as 12% of ESTs contain microsatellite sequences [30]. Using roughly the same search parameters, we found microsatellites in only 870 of 56,218 (1.5%) of tilapia unigenes. Nevertheless, these 870 microsatellites provide a useful starting point for comparative mapping [31]. The subset of high confidence SNPs is another starting point for constructing a comparative genetic map of tilapia with other fish species.

Conclusion

Our project has significantly enhanced the EST resources available for cichlid fishes. While the total number of ESTs for tilapia still lags behind that for other fish species, our use of multiple highly normalized libraries has contributed to a high level of gene discovery. The number of UniProt hits for our EST collection rivals that for other model fish species [32, 33]. The total length of the assembled transcriptome (35 Mb) is similar to the total length of the annotated regions of the Tetraodon (33.9 Mb) and Takifugu (29.1 Mb) genomes [34]. This annotated collection of clustered ESTs will be a key resource for annotation of cichlid genome sequences for studies of cichlid physiology, development and evolution.

Methods

Construction of the 19 tilapia cDNA libraries

cDNA library construction

Total and poly(A)+ RNA were extracted from homogenized tilapia tissues using the RNAeasy and Oligotex mRNA kits (Qiagen) respectively. Full-length cDNAs were prepared using the SMART technology [35]. First strand and second strand cDNA was synthesized from 0.5 to 1 μg of mRNA using the "Creator SMART cDNA Library Construction Kit" (Clontech) according to the manufacturers instructions. The terminal transferase activity of MMLV reverse transcriptase was used to add untemplated C residues at the 3' end of the first strand cDNA. The second strand is primed with a SMART oligo sequence ending with three G residues (dAAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG) which pair with the added C's. The SMART anchor oligo, together with a modified oligo(dT) primer (dATTCTAGAGGCCGAGGCGGCCGACATG(T)₃₀NN) or (dAAGCAGTGGTATCAACGCAGAGTGGCCGAGGCGGCC(T)₂₀VN) was then used to amplify double-stranded cDNA by long distance (LD) PCR. Only those cDNAs with the SMART anchor at the 5' end should be amplified.

DSN normalization of SMART amplified cDNA

Normalization of the SMART amplified double-stranded cDNA was achieved using the Kamchatka crab duplex-specific nuclease [28]. In brief, double-stranded cDNA was denatured and allowed to reassociate under second order reaction kinetics. The ds cDNA fraction formed by abundant transcripts during reassociation is then degraded by DSN which displays a strong preference for cleaving ds DNA in both DNA-DNA and DNA-RNA hybrids compared to ssDNA or RNA. The resulting equalized single-stranded fraction was then amplified using long-distance PCR. Enzyme dilution was optimized and number of PCR cycles was minimized for each tissue in order to preserve the quality of the equalized cDNA.

Ligation and cloning

The incorporation of asymmetrical SfI sites in the multiple cloning site of the pDNR-LIB cloning vector allowed directional cloning of SMART SfI restricted dscDNA. Since SfI sites are extremely rare in vertebrate DNA, almost all SMART cDNAs remain intact after SfI digestion. Double stranded SfI restricted cDNA was size fractionated using CHROMA SPIN-400 columns (Clonetech, Mountain View, CA). Sixteen fractions were collected and analyzed by agarose gel electrophoresis. Those fractions containing the highest molecular weight cDNAs (longer than 0.5 kb) were cloned into the pDNR-lib plasmid vector.

EST sequencing

The majority of the single pass 5' Sanger sequence reads were contracted to Agencourt Bioscience (Beverly MA). Bacterial clones were inoculated and grown overnight at 37°C. High-copy plasmid templates were purified using a streamlined SprintPrep™ SPRI^® protocol [36, 37]. This procedure harvests plasmid DNA directly from lysed bacterial cultures by trapping both plasmid and genomic DNA to the functionalized bead particles and selectively eluting only the plasmid DNA. DNA templates were sequenced in 384-well format using BigDye^® Version 3.1 reactions and purified using Agencourt's CleanSeq^® dye-terminator removal kit before separation on ABI3730xl instruments (Applied Biosystems, Foster City, CA). All reads were processed using Phred base calling software. A passing read was defined as an average high quality PHRED score of 20 or higher for at least 100 bases.

For the normalized and non-normalized retinal libraries, both 5' and 3' reads were contracted to SymBio Corporation (Menlo Park CA). Clones were amplified by Templiphi (GE Healthcare Life Sciences, Piscataway, NJ) and then sequenced in 384 well format on a MegaBACE sequencer (Molecular Dynamics now GE Healthcare). Base calling and quality scores were generated with PHRED.

Data extraction and quality filtering

Trimming

Components of the EST2uni analysis pipeline were compiled and run on an Intel Mac OS X server. Default parameters provided by each of the EST2uni components were used except where noted henceforth. Base calling was performed by Phred [38], known vector trimming was performed by Lucy [39], repeat masking was performed with RepeatMasker, version open-3.2.7, with database version 20090120 [40]. Unknown vector and low complexity masking was performed by SeqClean. Default settings of the vector and quality-trimming program Lucy were modified in order to adequately trim our reads. The max_avg_error defined in the Lucy manual corresponds to the quality score of a base and was lowered from 0.02 (Q17) to 0.001 (Q30) for the -bracket setting which controls the removal of low quality bases from both ends of a sequence. The -window setting uses a sliding window algorithm to check if regions of a specified window size are within a max_avg_error. Multiple windows can be specified and the default window of length 10 and max_avg_error of 0.3 was changed to a max_avg_error of 0.063 (while keeping the length of 10) to increase the stringency of quality trimming.

Redundancy

After each round of sequencing we performed a statistical analysis to estimate the complexity and coverage of each library. The statistical methodology for this analysis was developed by Susko and Roger [40]. For a given number of reads from a single library, we can estimate the depth of sequencing coverage of the library as:

where n₁ is the number of genes that appear just once in the sample and n is the total number of reads. From this estimate of coverage we can also calculate the number of additional reads required to discover a new gene:

Assembly

CAP3 [41] was used to perform assembly of the processed ESTs. The -p setting, representing overlap percent identity cut-off, was set to 90. The -d setting, specifying max qscore sum at differences, was set to 110. The est_clustering.pm script in the EST2uni package had to be slightly modified to parse the assembly output correctly.

Annotation

BLAST

Assembled ESTs were annotated by running BLASTX against UniProt release 15.2 with an E value cutoff of e^-10 to designate similar sequences and e^-25 to designate highly similar sequences.

Position of ESTs along Uniprot entries

Perl scripts utilizing BioPerl modules were written to parse the BLASTX output produced in the EST2uni pipeline to determine the completeness of ESTs relative to matching Uniprot sequences (Table 3). Sequences were considered complete on their 5' or 3' end if the EST matched within ten amino acids of the corresponding end of the Uniprot entry.

Identification of microsatellites

The EST2uni pipeline identifies di-, tri- and tetra-nucleotide microsatellites using a modified version of Sputnik (C. Abajian, http://espressosoftware.com/sputnik/index.html. The minimum repeat unit length was set to 2 and the maximum repeat unit length to 4. The minimum microsatellite length was set to 24.

Functional Classification and Pathway Annotation

The Blast2GO [42] program was used to assign functional Gene Ontology (GO) annotations [43]. Assembled unigenes that had significant (e^-5) BLASTX matches to the nr database were mapped to candidate GO terms. Annotation assignment is described in the Blast2GO paper and in the tutorial provided with the software. An e-value-hit-filter of 1.0e^-6, an annotation cutoff of 55, a GO Weight of 5, and a Hsp-Hit Coverage Cutoff of 0 were used as Blast2GO annotation parameters.

Using the single-directional best hit (SBH) method as recommended by the KEGG Automatic Annotation Server (KAAS) [44] for sets of ESTs allowed us to assign unigene pathway annotations.

Mapping onto fish genome sequences

Assembled ESTs were mapped onto the genomes of tetraodon, medaka and stickleback using BLAT [45]. The BLAT manual suggests that for mapping ESTs to a genome across species one use a translated query against a translated database (-q = dnax and -t = dnax). The BLAT results were saved as blast tabular with comment lines (-out = blast9). The tabular blast output was then converted to a GFF3 formatted file using a slightly modified version of blast92gff3.pl (D. Gilbert, http://eugenes.org/gmod/genogrid/scripts/blast92gff3.pl). The GFF3 file is then loaded into a MySQL database, which can then be viewed as a track in GBrowse [46].

Online access to the resource

Individual ESTs have been deposited with GenBank (Acc# GR588780-705678). The assembled contigs can be viewed through a variety of interfaces at http://www.bouillabase.org.

References

Coward K, Little DC: Culture of the 'aquatic chicken': present concerns and future prospects. Biologist (London). 2001, 48: 12-16.
CAS Google Scholar
Kocher TD: Adaptive evolution and explosive speciation: the cichlid fish model. Nat Rev Genet. 2004, 5: 288-98. 10.1038/nrg1316.
Article CAS PubMed Google Scholar
Lee BY, Lee WJ, Streelman JT, Carleton KL, Howe AE, Hulata G, Slettan A, Stern JE, Terai Y, Kocher TD: A second-generation genetic linkage map of tilapia (Oreochromis spp.). Genetics. 2005, 170: 237-44. 10.1534/genetics.104.035022.
Article CAS PubMed Central PubMed Google Scholar
Katagiri T, Kidd C, Tomasino E, Davis JT, Wishon C, Stern JE, Carleton KL, Howe AE, Kocher TD: A BAC-based physical map of the Nile tilapia genome. BMC Genomics. 2005, 6: 89-10.1186/1471-2164-6-89.
Article PubMed Central PubMed Google Scholar
Watanabe M, Kobayashi N, Shin-i T, Horiike T, Tateno Y, Kohara Y, Okada N: Extensive analysis of ORF sequences from two different cichlid species in Lake Victoria provides molecular evidence for a recent radiation event of the Victoria species flock: identity of EST sequences between Haplochromis chilotes and Haplochromis sp. "Redtailsheller". Gene. 2004, 343: 263-9. 10.1016/j.gene.2004.09.013.
Article CAS PubMed Google Scholar
Salzburger W, Renn SC, Steinke D, Braasch I, Hofmann HA, Meyer A: Annotation of expressed sequence tags for the East African cichlid fish Astatotilapia burtoni and evolutionary analyses of cichlid ORFs. BMC Genomics. 2008, 9: 96-10.1186/1471-2164-9-96.
Article PubMed Central PubMed Google Scholar
Kobayashi N, Watanabe M, Horiike T, Kohara Y, Okada N: Extensive analysis of EST sequences reveals that all cichlid species in Lake Victoria share almost identical transcript sets. Gene. 2009, 441: 187-91. 10.1016/j.gene.2008.11.023.
Article CAS PubMed Google Scholar
Shiue YL, Wang LH, Chao TY, Lin CH, Tsai CL: EST-based identification of genes expressed in the hypothalamus of adult tilapia, Oreochromis mossambicus. Biochem Biophys Res Commun. 2004, 316: 523-7. 10.1016/j.bbrc.2004.02.079.
Article CAS PubMed Google Scholar
Tsai CL, Wang LH, Shiue YL, Chao TY: Influence of temperature on the ontogenetic expression of neural development-related genes from developing tilapia brain expressed sequence tags. Mar Biotechnol (NY). 2007, 9: 243-61. 10.1007/s10126-006-6089-2.
Article CAS Google Scholar
Tine M, de Lorgeril J, D'Cotta H, Pepey E, Bonhomme F, Baroiller JF, Durand JD: Transcriptional responses of the black-chinned tilapia Sarotherodon melanotheron acclimatised to salinity exremes. MarineGenomicsl. 2008, 1: 37-46.
Google Scholar
Watanabe M, Kobayashi N, Shin-i T, Horiike T, Tateno Y, Kohara Y, Okada N: Extensive analysis of ORF sequences from two different cichlid species in Lake Victoria provides molecular evidence for a recent radiation event of the Victoria species flock: identity of EST sequences between Haplochromis chilotes and Haplochromis sp. "Redtailsheller". Gene. 2004, 343: 263-9. 10.1016/j.gene.2004.09.013.
Article CAS PubMed Google Scholar
Pepels PP, Balm PH: Ontogeny of corticotropin-releasing factor and of hypothalamic-pituitary-interrenal axis responsiveness to stress in tilapia (Oreochromis mossambicus ; Teleostei). Gen Comp Endocrinol. 2004, 139: 251-65. 10.1016/j.ygcen.2004.09.013.
Article CAS PubMed Google Scholar
Chen WH, Sun LT, Tsai CL, Song YL, Chang CF: Cold-stress induced the modulation of catecholamines, cortisol, immunoglobulin M, and leukocyte phagocytosis in tilapia. Gen Comp Endocrinol. 2002, 126: 90-100. 10.1006/gcen.2001.7772.
Article CAS PubMed Google Scholar
Fiol DF, Kultz D: Rapid hyperosmotic coinduction of two tilapia (Oreochromis mossambicus) transcription factors in gill cells. Proc Natl Acad Sci USA. 2005, 102: 927-32. 10.1073/pnas.0408956102.
Article CAS PubMed Central PubMed Google Scholar
Cnaani A, Zilberman N, Tinman S, Hulata G, Ron M: Genome-scan analysis for quantitative trait loci in an F2 tilapia hybrid. Mol Genet Genomics. 2004, 272: 162-72. 10.1007/s00438-004-1045-1.
Article CAS PubMed Google Scholar
Pepels PP, Bonga SE, Balm PH: Bacterial lipopolysaccharide (LPS) modulates corticotropin-releasing hormone (CRH) content and release in the brain of juvenile and adult tilapia (Oreochromis mossambicus; Teleostei). J Exp Biol. 2004, 207: 4479-88. 10.1242/jeb.01316.
Article CAS PubMed Google Scholar
Almeida JA, Diniz YS, Marques SF, Faine LA, Ribas BO, Burneiko RC, Novelli EL: The use of the oxidative stress responses as biomarkers in Nile tilapia (Oreochromis niloticus) exposed to in vivo cadmium contamination. Environ Int. 2002, 27: 673-9. 10.1016/S0160-4120(01)00127-1.
Article CAS PubMed Google Scholar
Oruc EO, Uner N: Combined effects of 2,4-D and azinphosmethyl on antioxidant enzymes and lipid peroxidation in liver of Oreochromis niloticus. Comp Biochem Physiol C Toxicol Pharmacol. 2000, 127: 291-6.
CAS PubMed Google Scholar
Wang YC, Chaung RH, Tung LC: Comparison of the cytotoxicity induced by different exposure to sodium arsenite in two fish cell lines. Aquat Toxicol. 2004, 69: 67-79. 10.1016/j.aquatox.2004.04.007.
Article CAS PubMed Google Scholar
Forment J, Gilabert F, Robles A, Conejero V, Nuez F, Blanca JM: EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration. BMC Bioinformatics. 2008, 9: 5-10.1186/1471-2105-9-5.
Article PubMed Central PubMed Google Scholar
Carninci P, Shibata Y, Hayatsu N, Sugahara Y, Shibata K, Itoh M, Konno H, Okazaki Y, Muramatsu M, Hayashizaki Y: Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 2000, 10: 1617-30. 10.1101/gr.145100.
Article CAS PubMed Central PubMed Google Scholar
Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, Efstratiadis A: Construction and characterization of a normalized cDNA library. Proc Natl Acad Sci USA. 1994, 91: 9228-32. 10.1073/pnas.91.20.9228.
Article CAS PubMed Central PubMed Google Scholar
Bonaldo MF, Lennon G, Soares MB: Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 1996, 6: 791-806. 10.1101/gr.6.9.791.
Article CAS PubMed Google Scholar
Goldby P, Stephens SK, Rast JP, Burke JF: Identification of low-abundance differentially expressed transcripts using arrayed cDNA clones. Comparative Biochemistry and Physiology Part B. 2002, 133: 537-42. 10.1016/S1096-4959(02)00146-X.
Article Google Scholar
Zhulidov PA, Bogdanova EA, Shcheglov AS, Vagner LL, Khaspekov GL, Kozhemyako VB, Matz MV, Meleshkevitch E, Moroz LL, Lukyanov SA, Shagin DA: Simple cDNA normalization using Kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 2004, 32: e37-10.1093/nar/gnh031.
Article PubMed Central PubMed Google Scholar
Walls GL: The vertebrate eye and its adaptive radiation. 1963, New York: Hafner Publishing
Google Scholar
Tachibanaki S, Tsushima S, Kawamura S: Low amplification and fast visual pigment phosphorylation as mechanisms characterizing cone photoresponses. Proc Natl Acad Sci USA. 2001, 98: 14044-9. 10.1073/pnas.241396898.
Article CAS PubMed Central PubMed Google Scholar
Rodieck RW: The first steps in seeing. 1998, Sunderland: Sinauer Associates
Google Scholar
Kucuktas H, Wang S, Li P, He C, Xu P, Sha Z, Liu H, Jiang Y, Baoprasertkul P, Somridhivej B, Wang Y, Abernathy J, Guo X, Liu L, Muir W, Liu Z: Construction of genetic linkage maps and comparative genome analysis of catfish using gene-associated markers. Genetics. 2009, 181: 1649-60. 10.1534/genetics.108.098855.
Article CAS PubMed Central PubMed Google Scholar
Serapion J, Kucuktas H, Feng J, Liu Z: Bioinformatic mining of type I microsatellites from expressed sequence tags of channel catfish (Ictalurus punctatus). Mar Biotechnol (NY). 2004, 6: 364-77. 10.1007/s10126-003-0039-z.
Article CAS Google Scholar
Sarropoulou E, Nousdili D, Magoulas A, Kotoulas G: Linking the genomes of nonmodel teleosts through comparative genomics. Mar Biotechnol (NY). 2008, 10: 227-33. 10.1007/s10126-007-9066-5.
Article CAS Google Scholar
Rise ML, von Schalburg KR, Brown GD, Mawer MA, Devlin RH, Kuipers N, Busby M, Beetz-Sargent M, Alberto R, Gibbs AR, Hunt P, Shukin R, Zeznik JA, Nelson C, Jones SR, Smailus DE, Jones SJ, Schein JE, Marra MA, Butterfield YS, Stott JM, Ng SH, Davidson WS, Koop BF: Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics. Genome Research. 2004, 14: 478-90. 10.1101/gr.1687304.
Article PubMed Central PubMed Google Scholar
Li P, Peatman E, Wang S, Feng J, He C, Baoprasertkul P, Xu P, Kucuktas H, Nandi S, Somridhivej B, Serapion J, Simmons M, Turan C, Liu L, Muir W, Dunham R, Brady Y, Grizzle J, Liu Z: Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs. BMC Genomics. 2007, 8: 177-10.1186/1471-2164-8-177.
Article PubMed Central PubMed Google Scholar
Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431: 946-57. 10.1038/nature03025.
Article PubMed Google Scholar
Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD: Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques. 2001, 30: 892-897.
CAS PubMed Google Scholar
DeAngelis MM, Wang DG, Hawkins TL: Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res. 1995, 23: 4742-3. 10.1093/nar/23.22.4742.
Article CAS PubMed Central PubMed Google Scholar
Hawkins TL, McKernan KJ, Jacotot LB, MacKenzie JB, Richardson PM, Lander ES: A magnetic attraction to high-throughput genomics. Science. 1997, 276: 1887-9. 10.1126/science.276.5320.1887.
Article CAS PubMed Google Scholar
Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research. 1998, 8: 175-185.
Article CAS PubMed Google Scholar
Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17: 1093-104. 10.1093/bioinformatics/17.12.1093.
Article CAS PubMed Google Scholar
Susko E, Roger AJ: Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. Bioinformatics. 2004, 20: 2279-87. 10.1093/bioinformatics/bth239.
Article CAS PubMed Google Scholar
Huang X, Madan A: CAP 3: A DNA sequence assembly program. Genome Res. 1999, 9: 868-77. 10.1101/gr.9.9.868.
Article CAS PubMed Central PubMed Google Scholar
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.
Article CAS PubMed Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000, 25: 25-29. 10.1038/75556.
Article CAS PubMed Central PubMed Google Scholar
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M: an automatic genome annotation and Pathway reconstruction server. Nucleic Acids Res. 2007, 35: W182-W185. 10.1093/nar/gkm321.
Article PubMed Central PubMed Google Scholar
Kent JW: BLAT - The BLAST-Like Alignment Tool. Genome Res. 2002, 12: 656-64.
Article CAS PubMed Central PubMed Google Scholar
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The generic genome browser: a building block for a model organism system database. Genome Res. 2002, 12: 1599-610. 10.1101/gr.403602.
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgements

Thanks to Kelly O'Quin for help with the R statistical package. This work was supported by grants from the NIH (1R15EY016721-01) to K.L.C. and from the USDA-NRICGP (#2006-04830) and US-Israeli BARD (IS-3995-07) to T.D.K.

Author information

Authors and Affiliations

Department of Biology, University of Maryland, College Park, Maryland, 20742, USA
Bo-Young Lee, Aimee E Howe, Matthew A Conte, Karen L Carleton & Thomas D Kocher
CIRAD-PERSYST, Aquaculture Research Unit, TA B-20/A, Campus International de Baillarguet, 34398, Montpellier cedex 5, France
Helena D'Cotta, Elodie Pepey & Jean-Francois Baroiller
Broad Institute, Cambridge, Massachusetts, 02142, USA
Federica di Palma

Authors

Bo-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Aimee E Howe
View author publications
You can also search for this author in PubMed Google Scholar
Matthew A Conte
View author publications
You can also search for this author in PubMed Google Scholar
Helena D'Cotta
View author publications
You can also search for this author in PubMed Google Scholar
Elodie Pepey
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Francois Baroiller
View author publications
You can also search for this author in PubMed Google Scholar
Federica di Palma
View author publications
You can also search for this author in PubMed Google Scholar
Karen L Carleton
View author publications
You can also search for this author in PubMed Google Scholar
Thomas D Kocher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas D Kocher.

Additional information

Authors' contributions

BYL, HD, EP and FD constructed the various libraries. AEH prepared the libraries for sequencing. MC, JFB, KLC and TDK performed the bioinformatic analyses and prepared the manuscript. All authors read and approved the final paper.

Electronic supplementary material

12864_2009_2872_MOESM1_ESM.PDF

Additional file 1: Figure S1: GO annotation. Proportion of unigenes annotated with GO terms for each of the three GO functional categories (biological process, molecular function, and cellular component). (PDF 164 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Lee, BY., Howe, A.E., Conte, M.A. et al. An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags. BMC Genomics 11, 278 (2010). https://doi.org/10.1186/1471-2164-11-278

Download citation

Received: 26 October 2009
Accepted: 30 April 2010
Published: 30 April 2010
DOI: https://doi.org/10.1186/1471-2164-11-278

An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags

Abstract

Background

Results

Conclusion

Background

Results

cDNA libraries and EST sequences

Clustering of tilapia ESTs

Final assembly

Sequence similarity and functional annotation

Microsatellites and SNPs

Discussion

Effectiveness of normalization

Anchors for comparative mapping

Conclusion

Methods

Construction of the 19 tilapia cDNA libraries

cDNA library construction

DSN normalization of SMART amplified cDNA

Ligation and cloning

EST sequencing

Data extraction and quality filtering

Trimming

Redundancy

Assembly

Annotation

BLAST

Position of ESTs along Uniprot entries

Identification of microsatellites

Functional Classification and Pathway Annotation

Mapping onto fish genome sequences

Online access to the resource

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

12864_2009_2872_MOESM1_ESM.PDF

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us