Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)
© Andreassen et al; licensee BioMed Central Ltd. 2009
Received: 10 December 2008
Accepted: 30 October 2009
Published: 30 October 2009
Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small.
High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences.
This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well.
Atlantic salmon (Salmo salar) is an important aquaculture species, and there is also a considerable commercial harvesting of wild salmon. As a consequence of the economic interest in salmon, various genomic resources have been developed to identify genes and genomic mechanisms responsible for commercially important traits. These resources include a BAC library, the corresponding physical map and several linkage maps [1–5]. In addition several cDNA libraries have been constructed [6–10] and at present 494,094 ESTs have been submitted to GenBank .
The salmon genome is complex due to a relatively recent genome duplication believed to have occurred between 25 and 120 million years ago in the common salmonid ancestor . Analysis of segregation ratios in salmonids has revealed disomic inheritance in females while there is a mixture of disomic and tetrasomic inheritance in males [12, 13]. Divergence of loci duplicated via tetraploidy is depending on complete reestablishment of disomic inheritance, and present salmonids appear to have retained more than 50% of loci as duplicates . This suggests that salmonids are in the process of re-establishing disomic inheritance. Studies of the salmon genome might therefore contribute important biological knowledge on the evolution of ohnologs (duplicate sequences that originate from a whole genome duplication) .
Transcript sequences that represent the coding regions of genes may be predicted based on consensus assemblies of overlapping ESTs such as the gene indices in TIGR database . These clusters of tentative consensus sequences (TCs) serve as a valuable resource for putative gene products. However, these reconstructions are prone to error caused by low quality of single-pass sequences, alternative splice forms, expressed pseudogenes and sequence similarities between transcripts within gene families. One would expect that the large number of almost identical ohnologous sequences would make such gene transcript predictions particularly challenging in salmonid species. In agreement with this, results from studies using salmon EST data as a source for SNP discovery show that when clustering EST-sequences into consensus sequences there is a high frequency of "SNPs" with heterozygote excess. This indicates that a large amount of the ESTs in such clusters are derived from different loci [5, 17].
The most useful transcript sequences are derived from high quality full-length sequencing of inserts from cDNA clones (FLIcs) that contain the complete protein coding sequence (cCDS). By determining the cCDS from one single clone the errors caused by incorrect clustering of non-allelic sequences are omitted. High quality sequences based on multi-pass reads of the CDS from FLIcs are therefore the most reliable source for transcript prediction. In addition to representing the most suitable mean to predict protein sequences, the data from FLIcs might also be used to identify splice variants as well as to differ between closely related paralogs. Complete CDS FLIcs are also important in genome clustering and annotation. Genomic sequencing of Atlantic salmon is being organised by an international consortium and due to the problems related to the recent duplication of the salmon genome clustering and annotation of the sequence might prove difficult. Thus, a large number of high quality cCDS FLIcs would therefore be of great value in a salmon genome sequencing project. Finally, in full-length insert sequences, where the boundaries of the coding sequences are defined, the additional transcript sequences provide sequence information from 5' and 3'UTRs. Within these non-coding mRNA segments there are sequence motifs that are important in regulation of gene expression. Access to reliable sequence information from UTRs is a precondition to identify such functional motifs. Together, the above mentioned use of cCDS FLIcs and their source cDNA clones has led to large scale sequencing of full-length inserts in several species [18–23]. Despite the apparent usefulness of cCDS FLIcs few salmon FLIcs were available in public databases at the time this study was initiated.
The aim of this study has been to determine the full sequence of the inserts in a set of selected Atlantic salmon cDNA clones to provide a larger amount of high quality sequenced transcripts with complete CDSs from a single tissue and developmental stage. Clones were selected from a white muscle tissue specific library from pre-smolt developmental stage since our research at the time this study was initiated also focused on discovering genes that might be important to the ability of depositing dietary cartenoid pigments in muscle tissue. The results from the annotation of full-length sequenced inserts and identification of cDNA transcripts likely to contain complete CDSs are presented. We also describe some general characteristics of Salmo salar transcripts such as the Kozak consensus sequence, polyadenylation signal variation and we identify conserved, putatively functional, elements in the UTRs.
Results and Discussion
Full- length insert cDNA sequences and gene ontology annotation
The cDNA clones chosen for full-length insert sequencing were from a white muscle and pre-smolt developmental stage specific library. The library consisted originally of 6,656 ESTs that were assembled in 567 contigs and 1,091 singlets . One representative clone from each of the contigs as well as all singlets was initially included in our material. The 5' single pass sequences from the selected cDNA clones were then used as queries in similarity searches against publicly available databases (see methods). The purpose of this exercise was to remove cDNA clones not likely to contain inserts with a complete coding sequence. The exercise reduced the number of clones with putative cCDSs to 560 (273 selected from contigs and 287 singlets). The full-length inserts (FLIcs) from these cDNA clones were sequenced using a primer walking strategy. Manual inspection of all assembled sequences was performed to ensure a high quality on the consensus sequence data from each of the FLIcs. All sequences within the FLIcs likely to represent coding sequences (see below) were covered by at least two reads. The average number of reads (total number of bases sequenced/total number of bases of all FLIcs) in the finished sequences was 4.1. The FLIc consensus sequences from the 560 cDNA clones were submitted to GenBank Aug. 20th, 2008 [GenBank: BT043497 - BT044056].
Summary of complete CDS FLIcs (cCDS FLIcs)
Number of cCDS FLIcs
Average length (bp)1
Average length (bp)2
Size range (bp) of cCDS FLIcs
Hundred and five of the remaining FLIcs are, based on results from the BLASTX analysis, likely to contain partial CDSs of protein coding genes. They revealed E-values below the threshold, but lacked the 5' coding sequence corresponding to the first few amino acids of the matching protein. The putative function of a few of the FLIcs remains unknown. The Blast2GO annotation tool was used to assign GO terms to the FLIcs with partial CDSs. The complete dataset with GenBank accession numbers, sequence descriptions, cDNA clone reference numbers, GO terms and enzyme codes assigned to each of the 105 partial CDS FLIcs is given in additional file 2. The insert sequences from the partial CDS FLIcs lacked some of the 5' coding sequences, but they are high quality sequences and with a mean average FLIc size of 1,450 bp. Each of the sequences are derived from a single clone and thus, they represent more valuable data for transcript prediction than data originating from low quality and short EST sequences.
Paralogous sequence, splice variants and gene prediction from TCs
Several cCDS FLIcs were, based on sequence similarities in the CDS as well as on GO annotation, likely to be closely related members of gene families (paralogs). There was a total of 54 cases were two or more transcripts were likely to represent paralogs. A summary of these sequences, including GenBank numbers, is given in additional file 3. Some of these FLIcs, e.g. the two transcripts annotated as psmd13 genes [GenBank: BT043856 and BT043857], were highly similar duplicate sequences (98.5% sequence identity). The similarity is, however, lower than expected from allelic sequences . Such highly similar transcripts may represent duplicate sequences originating from the salmonid specific genome duplication. Identification of single locus polymorphisms in this kind of sequences followed by linkage mapping could provide further evidence that these sequences are ohnologs. Sequence analysis of a number of ohnologs could contribute important knowledge on how genes, which originate from genome duplications evolve.
Pairs of splice variants
Number of gaps
bp size of gap
Prediction of transcript sequence based on clustering overlapping EST data is potentially more error prone in salmon than most other species given the large amount of almost identical duplicate sequences present in the salmon genome. The cCDS FLIcs offers a first opportunity to evaluate the robustness of such salmon transcript consensus sequences using high quality reference data. In particular, we wanted to examine the degree of erroneous tentative consensus sequences in databases caused by clustering highly similar ESTs originating from different loci. To test this we used the cCDS FLIc data from the group of closely related genes in our material as reference sequences. This material was compared to the tentative consensus sequence (TC) in the DFCI salmon gene index  that contained the EST from the same source clone as our reference sequence. A direct comparison of the two sequences could then reveal the amount of sequence difference. Thirty-five such pairwise comparisons were performed (see methods). The mean size of the sequences compared was 1,200 bp. All comparisons that revealed more than four inconsistencies in every 100 bp compared were defined as non-matches. There were eleven such non-matches in the 35 comparisons. Measurements of SNP density in the salmon genome suggest that there is less than one SNP in every 600 bp in introns, and less than one in every 1,400 bp in exons [27, 29]. Thus, the amount of sequence difference in the pairs of non-matches was far higher than expected if the references and the TCs were allelic sequences. Some of the sequence differences observed in the pairs defined as non-matches might be explained by the low quality of EST data, however, we do not believe that the sequence differences in non-matches are caused by low quality of the EST data alone. In support of this, a measurement of EST quality by comparing the full-length insert sequence, not to the complete TC, but to the original single pass EST sequence within the TC that where from the FLIc source clone, shows that the errors (a different base in the ESTs when aligned with the FLIc) were less than 1 in every 200 bp. Together this indicate that the TCs defined as non-matches consist of one or more ESTs that are from another locus than that of the FLIc source clone. Approximately one third of the TCs were non-matches when compared to the references. Thus, the results confirm suspicions that highly similar duplicate sequences represent a substantial problem when predicting tentative consensus transcripts from salmon ESTs.
Kozak motif and polyadenylation signal (PAS) variation in Salmo salar transcripts
In 13 cCDS FLIcs the 3'end of the insert sequences terminated in a polyadenylation tail (poly A-tail) of less than 10 adenosine residues. These cases may represent FLIcs with incomplete 3'UTRs due to mispriming from internal sites in the library construction. In the remaining 427 cCDS FLIcs there was a poly A-tail consisting of ten or more consecutive adenosine residues adjacent to the 3' vector sequence. We considered these cCDS FLIcs as transcripts with complete 3'UTRs. The size of 3'UTRs in transcripts annotated as ribosomal mRNA (all less than 80 bp) was considerably shorter than the 3'UTRs in the other transcripts. The average size of the 3'UTR in the cCDS FLIcs, not including ribosomal mRNAs, was 517 bp.
Most common hexamers1
Conserved candidate regulatory motifs in Salmo salar transcripts
UTRs are usually rich in motifs (target sequences) that contribute in modification of gene expression by binding proteins or microRNAs (miRNAs). The UTRs from this study provides a first opportunity to identify candidate functional motifs in salmon transcripts. To find such motifs we searched the salmon UTRs for matches to evolutionary conserved motifs. The UTR database collection (UTRdb) is a collection of functional sequence patterns located in 5' or 3' UTR sequences . By use of the pattern matcher program UTRscan we searched the UTRdb using the 5' and 3'UTRs as queries to identify putative regulatory motifs in the salmon transcripts. The analysis of the 5'UTRs revealed two transcripts [GenBank: BT043818 and BT043819] with motifs that were matches to the iron responsive element (IRE). IRE is a particular hairpin structure located in 5'UTRs of mRNAs involved in regulation of cellular iron metabolism . The salmon transcripts were annotated as two variants of the ferritin heavy polypeptide. The observation that this motif, known to be evolutionary conserved and present in the ferritin genes of vertebrates , were located in the 5'UTR of these salmon ferritin transcripts supports that the matches were not by chance, but that the motifs are true functional IREs. In the analysis of 3'UTRs one transcript with a sequence that matched the SECIS element was revealed. The SECIS element is a 60 bp stem-loop structure located in the 3'UTR that directs the cell to translate the UGA codons as selenocystein instead of terminating the translation . The finding that the salmon transcript with the match to this motif encodes a glutathione peroxidase [GenBank: BT044014], a gene that in other species has been shown to be a selenium-containing protein that translate UGA codons, supports the fact that also this motif match reveals a true functional element.
Conserved motifs in salmon 3'UTRs
39 +/- 6
36 +/- 4
20 +/- 6
miR-101, miR-199, miR-144
32 +/- 4
34 +/- 5
33 +/- 5
31 +/- 6
37 +/- 7
28 +/- 4
16 +/- 6
24 +/- 4
miR-425-5, miR-731, miR-489
Conservation of particular sequences could itself indicate some kind of function. To elucidate whether any of these motifs were miRNA target sequences we compared the sequence of the eleven conserved motifs to the seven first nucleotides (the seed) of miRNAs in the miRBase as well as the seeds from miRNAs reported in a recent study of rainbow trout [44–46]. This comparison revealed that four of the motifs (motif 2,3,5 and 11 in table 4) were complementary sequences to seeds of one or more miRNAs in the miRBase. The miRNAs with complementary seeds to a given motif are listed in table 4. The finding that four of these conserved 7-mers are identical to target sequences of evolutionary conserved miRNAs further suggested that they were true functional miRNA target sequences. The putative target sequences were present in more than hundred of the 3'UTRs and if they are true target sequences, this indicates that the expression of more than 20% of the genes in this material may be regulated by miRNAs. Further experimental validation studies of these genes could confirm that they are true miRNA targets.
Two other conserved 7-mers (motif 6 and 7 in table 4) are matches to the PUF consensus sequence. The Puf family of proteins is an evolutionary conserved family of proteins that, in a similar manner to miRNAs, bind target sequences in 3'UTRs and repress gene expression . The complete consensus PUF sequence is an 8-mer (TGTAHATA), and a search for this motif revealed 63 matches in the 3'UTRs. This suggests that these conserved motifs represent binding sites for homologous Puf proteins in salmon. There were five conserved 7-mers with no match to a 7-mer (or larger) sequence with a known function. We note that two 7-mers (motif 8 and 9) are matches to the shorter CFI target sequence TGTAN that participate in polyadenylation site recognition . We also note that the four first bases of motif 4 in table 4 is a match to the common PAS, and in 16 sequences the motif is located in overlap with the PAS. This could possibly explain why these three motifs are conserved. The putative causes of conservation of the remaining two motifs are unknown. Experimental validation of the motifs could provide further knowledge about their function. A complete list of transcripts with the different putative target sequences is available from author upon request.
The results from this study demonstrate that a large number of the full-length sequenced cDNA clones from the white muscle tissue and developmental stage specific library contained complete coding sequences. We estimated that at least 79% (440 out of 560) of the clones selected for sequencing represent transcripts with complete CDSs. This represents about a quarter of all different contigs and singlets in the library. The selection and validation methods used in this study are conservative since they filter out clones with ORFs that has not been discovered in other species. Thus, the percentage of clones with complete cCDSs in the library may be slightly larger due to the possible presence of clones with cCDSs not yet described in other species or unique to salmon. The salmon genome resources produces by the Salmon Genome Project [9, 10] include another 23 tissue and developmental stage specific cDNA libraries that were constructed by use of a non full-length enriched method. We have shown by this study that, although our library was constructed with the same method such cDNA libraries represent a valuable source for attaining transcripts with complete coding sequences. This is important knowledge since the additional libraries are likely to include transcripts from genes, or particular splice variants of genes, that are tissue and/or developmental stage specific expressed.
The analysis of salmon UTRs revealed several motifs that were significantly over-represented in the 3'UTRs. The fact that a subset of these motifs was identical to miRNA target sequences suggests that they are true miRNA targets. Although further experimental validation are needed to confirm the putative function of these motifs, the results indicate that the salmon UTR sequences retrieved from full-length insert sequencing of cDNA clones may be used to identify functional motifs important in regulation of gene expression. Salmon has been suggested as a model species to study the evolution of coding sequence divergence in ohnologs. If salmon miRNAs and their target transcripts are identified the salmon may also represent a model species for studying the evolution of divergent gene expression in ohnologs.
The cDNA clones selected for full-length sequencing were from a non full-length enriched white muscle tissue specific library constructed as described in Adzhubei et al. . All 6,656 cDNA clones in this library were from the pre-smolt developmental stage. After sequence processing and clustering the 6,656 cDNA clones had been assembled in 567 contigs and 1,091 singlets .
Selection of cDNA clones by homology searches in databases
To select which cDNA clones from 567 contigs and 1,091 singlets to undergo full-length sequencing we used homology analysis. Lack of similarity to the N-terminal end of proteins in the database was used as a mean for removing cDNA clones not likely to contain complete coding sequences. Consensus sequences from the contigs and the 5'EST sequences from the singlets were used as input (query) and compared to data in PDB, Swiss-Prot and nr protein sequence databases using BLASTX . Threshold for a significant match was defined as expectation value < 10-15. Only cDNAs that contributed input-sequences providing a match to the N-terminal end of proteins in the databases were considered for full-length sequencing. In the cases where the query was a contig consensus sequence one cDNA clone contributing a sequence that aligned at the 5' end of the contig was chosen for full-length sequencing. Based on these simple selection criteria's 560 cDNA clones, 273 selected from contigs and 287 singlets, were chosen for full-length sequencing.
DNA sequencing of cDNA clones
Sequencing was performed as described in Adzhubei et al. 2007 using T3 and -21 m13F sequencing primers for sequencing from 5' and 3' end of the transcripts, respectively . Additional custom primers for primer walking were designed using MacVector software. All chromatograms were manually inspected, and sequences from individual clones were assembled into full-length contigs using Sequencher 4.7 software. The sequence data representing the CDS in each of the insert sequences were generated from at least two reads. The finished full-length sequenced inserts were covered by an average of 4.1 reads (total number of bases sequenced/total number of bases in all FLIcs).
Characterization of coding sequences (CDSs) and UTRs
To evaluate whether a FLIc contained a complete CDS we performed BLASTX similarity searches against the RefSeq protein database . Expectation value <10-15 and a positive score of at least 60% were used as threshold values for a significant match. Significant matches were further validated by manual inspection of the alignments. We considered a FLIc to contain a complete CDS if the ORF from the FLIc revealed a startcodon and stopcodon in agreement with the match in the database. The start and stop codons of CDSs were used to define the boundary between the coding sequence and the 5' and 3'UTRs. If a significant match did not contain a startcodon in the 5' end of the coding sequence and the pairwise alignment indicated that the transcript lacked some 5' coding sequence it was considered to be a FLIc with a partial coding sequence.
Gene ontology annotation
FLIcs were assigned gene ontology terms by use of Blast2GO . The annotation was performed with default Blast2GO conditions.
Comparison of cCDS FLIcs with TCs in the DFCI salmon gene index
Transcripts (two or more) that were likely to be members of the same gene family (based on CDS sequence similarity and annotation) were grouped together. There were a total of 54 such small groups in our material. Thirty-five randomly chosen FLIc from each of these small groups, not including any of the small FLIcs from groups annotated as ribosomal proteins, was used in the comparison with a "matching" (see below) tentative consensus sequence (TC) from the Dana-Farber Cancer Institute (DFCI) salmon gene index . The salmon TCs in the DFCI gene index are constructed by use of a large number of ESTs (432,243 EST, October 2008) including the ESTs submitted from the Salmon Genome Project [9, 10]. This allowed a comparison between the cCDS FLIcs and the TCs containing the EST from the same source clone as the cCDS FLIcs. The aligned pairs of TC sequences and cCDS FLIcs were manually inspected. All comparisons that showed four or more differences in every 100 bp of the aligned sequences were classified as non-matches.
Search for frequently occurring motifs
Searches for most frequently occurring motifs in the UTRs were performed by use of the TEIRESIAS-based pattern discovery tool . A search for frequently occurring 6-mers (putative PAS motifs) was performed in the 29 transcripts without the two most common PAS (AAUAAA, AUUAAA). The 35 bp immediate upstream of the poly A-tail from these transcripts was used as input. Pattern discovery tool conditions were: exact discovery, L = 6 and W = 6. A search for most frequently occurring 7-mer (putative target sequences) was performed in 3'UTRs from 469 FLIcs (427 cCDS FLIcs and 42 partial CDS FLIcs) with conditions: exact discovery, L = 7 and W = 7.
Testing for statistical over-representation of motifs in 3'UTRs
To test whether the 7-mers identified by use of the TEIRESIAS-based pattern discovery tool were significantly over-represented in 3'UTRs we compared the number of times a given motif was present at least once in a 3'UTR sequence to the number of times this motif was observed in three different kinds of reference sequences (controls). One control was a genomic DNA sequence from a salmon BAC clone [GenBank: DQ156149]. The size of this sequence was 219,899 bp. A second control was generated by pooling all 3'UTR sequences into one large continuous sequence (226,643 bp), and then create ten separate randomized controls using the shuffle DNA tool at the sequence manipulation site http://www.bioinformatics.org/sms/. The mean number of occurrences of each of the motifs from these ten shufflings was used for comparison. A third control was the reverse strand of all 3'UTR sequences. The significance of the difference in number of a given 7-mer observed between 3'UTRs and the "random shuffle control" was examined using a 2 × 2 contingency table. A chi-square test was used to test the null hypothesis that motifs were evenly distributed among the two groups. There are more intricate methods to test for significant over-representation of motifs . However, since we revealed a significant difference between 3'UTRs and the control in all eleven cases using the simple method described here, more advanced methods was not applied.
Identification of miRNAs with "seeds" that was complementary to conserved 3'UTR motifs
The reverse complementary sequence of the eleven most common 7-mer motifs in the 3'UTRs was compared to the seed region (first seven bases in the miRNAs) of known miRNAs. The miRNA sequences were from the miRBase version 5 http://microrna.sanger.ac.uk/targets/v5/. Only perfect matches to miRNA seed regions were reported in table 4.
This work was supported by grant 177036/S10 CIGENE and 139617/140 "Salmon Genome Project (SGP)" by the Research Council of Norway.
- Thorsen J, Zhu B, Frengen E, Osoegawa K, de Jong PJ, Koop BF, Davidson WS, Hoyheim B: A highly redundant BAC library of Atlantic salmon (Salmo salar): an important tool for salmon projects. BMC Genomics. 2005, 6 (1): 50-10.1186/1471-2164-6-50.PubMed CentralView ArticlePubMedGoogle Scholar
- Ng SH, Artieri CG, Bosdet IE, Chiu R, Danzmann RG, Davidson WS, Ferguson MM, Fjell CD, Hoyheim B, Jones SJ, et al: A physical map of the genome of Atlantic salmon, Salmo salar. Genomics. 2005, 86 (4): 396-404. 10.1016/j.ygeno.2005.06.001.View ArticlePubMedGoogle Scholar
- Gilbey J, Verspoor E, McLay A, Houlihan D: A microsatellite linkage map for Atlantic salmon (Salmo salar). Anim Genet. 2004, 35 (2): 98-105. 10.1111/j.1365-2052.2004.01091.x.View ArticlePubMedGoogle Scholar
- Moen T, Hoyheim B, Munck H, Gomez-Raya L: A linkage map of Atlantic salmon (Salmo salar) reveals an uncommonly large difference in recombination rate between the sexes. Anim Genet. 2004, 35 (2): 81-92. 10.1111/j.1365-2052.2004.01097.x.View ArticlePubMedGoogle Scholar
- Moen T, Hayes B, Baranski M, Berg PR, Kjoglum S, Koop BF, Davidson WS, Omholt SW, Lien S: A linkage map of the Atlantic salmon (Salmo salar) based on EST-derived SNP markers. BMC Genomics. 2008, 9: 223-10.1186/1471-2164-9-223.PubMed CentralView ArticlePubMedGoogle Scholar
- Davey GC, Caplice NC, Martin SA, Powell R: A survey of genes in the Atlantic salmon (Salmo salar) as identified by expressed sequence tags. Gene. 2001, 263 (1-2): 121-130. 10.1016/S0378-1119(00)00587-4.View ArticlePubMedGoogle Scholar
- Martin SA, Caplice NC, Davey GC, Powell R: EST-based identification of genes expressed in the liver of adult Atlantic salmon (Salmo salar). Biochem Biophys Res Commun. 2002, 293 (1): 578-585. 10.1016/S0006-291X(02)00263-2.View ArticlePubMedGoogle Scholar
- Rise ML, von Schalburg KR, Brown GD, Mawer MA, Devlin RH, Kuipers N, Busby M, Beetz-Sargent M, Alberto R, Gibbs AR, et al: Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics. Genome Res. 2004, 14 (3): 478-490. 10.1101/gr.1687304.PubMed CentralView ArticlePubMedGoogle Scholar
- Hagen-Larsen H, Laerdahl JK, Panitz F, Adzhubei A, Hoyheim B: An EST-based approach for identifying genes expressed in the intestine and gills of pre-smolt Atlantic salmon (Salmo salar). BMC Genomics. 2005, 6: 171-10.1186/1471-2164-6-171.PubMed CentralView ArticlePubMedGoogle Scholar
- Adzhubei AA, Vlasova AV, Hagen-Larsen H, Ruden TA, Laerdahl JK, Hoyheim B: Annotated expressed sequence tags (ESTs) from pre-smolt Atlantic salmon (Salmo salar) in a searchable data resource. BMC Genomics. 2007, 8: 209-10.1186/1471-2164-8-209.PubMed CentralView ArticlePubMedGoogle Scholar
- NCBI: Database of expressed sequence tags. [http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html]
- Allendorf FW, Thorgaard GH: Tetraploidy and the evolution of salmonid fishes. 1984, New York Plenum PressView ArticleGoogle Scholar
- Allendorf FW, Danzmann RG: Secondary tetrasomic segregation of MDH-B and preferential pairing of homeologues in rainbow trout. Genetics. 1997, 145 (4): 1083-1092.PubMed CentralPubMedGoogle Scholar
- Bailey GS, Poulter RT, Stockwell PA: Gene duplication in tetraploid fish: model for gene silencing at unlinked duplicated loci. Proc Natl Acad Sci USA. 1978, 75 (11): 5575-5579. 10.1073/pnas.75.11.5575.PubMed CentralView ArticlePubMedGoogle Scholar
- Wolfe KH: Yesterday's polyploids and the mystery of diploidization. Nat Rev Genet. 2001, 2 (5): 333-341. 10.1038/35072009.View ArticlePubMedGoogle Scholar
- Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J: The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 2001, 29 (1): 159-164. 10.1093/nar/29.1.159.PubMed CentralView ArticlePubMedGoogle Scholar
- Hayes B, Laerdahl JK, Lien S, Moen T, Berg P, Hindar K, Davidson WS, Koop BF, Adzhubei A, Hoyheim B: An extensive resource of single nucleotide polymorphism markers associated with Atlantic salmon (Salmo salar) expressed sequences. Aquaculture. 2007, 265 (1-4): 82-90. 10.1016/j.aquaculture.2007.01.037.View ArticleGoogle Scholar
- Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, et al: The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res. 2004, 14 (10B): 2121-2127. 10.1101/gr.2596504.View ArticlePubMedGoogle Scholar
- Castelli V, Aury JM, Jaillon O, Wincker P, Clepet C, Menard M, Cruaud C, Quetier F, Scarpelli C, Schachter V, et al: Whole genome sequence comparisons and "full-length" cDNA sequences: a combined approach to evaluate and improve Arabidopsis genome annotation. Genome Res. 2004, 14 (3): 406-413. 10.1101/gr.1515604.PubMed CentralView ArticlePubMedGoogle Scholar
- Stapleton M, Carlson J, Brokstein P, Yu C, Champe M, George R, Guarin H, Kronmiller B, Pacleb J, Park S, et al: A Drosophila full-length cDNA resource. Genome Biol. 2002, 3 (12): RESEARCH0080-10.1186/gb-2002-3-12-research0080.PubMed CentralView ArticlePubMedGoogle Scholar
- Uenishi H, Eguchi-Ogawa T, Shinkai H, Okumura N, Suzuki K, Toki D, Hamasima N, Awata T: PEDE (Pig EST Data Explorer) has been expanded into Pig Expression Data Explorer, including 10 147 porcine full-length cDNA sequences. Nucleic Acids Res. 2007, D650-653. 10.1093/nar/gkl954. 35 Database
- Harhay GP, Sonstegard TS, Keele JW, Heaton MP, Clawson ML, Snelling WM, Wiedmann RT, Van Tassell CP, Smith TP: Characterization of 954 bovine full-CDS cDNA sequences. BMC Genomics. 2005, 6: 166-10.1186/1471-2164-6-166.PubMed CentralView ArticlePubMedGoogle Scholar
- Li P, Peatman E, Wang S, Feng J, He C, Baoprasertkul P, Xu P, Kucuktas H, Nandi S, Somridhivej B, et al: Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs. BMC Genomics. 2007, 8: 177-10.1186/1471-2164-8-177.PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005, D39-45. 33 Database
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMedGoogle Scholar
- Ryynanen HJ, Primmer CR: Single nucleotide polymorphism (SNP) discovery in duplicated genomes: intron-primed exon-crossing (IPEC) as a strategy for avoiding amplification of duplicated loci in Atlantic salmon (Salmo salar) and other salmonid fishes. BMC Genomics. 2006, 7: 192-10.1186/1471-2164-7-192.PubMed CentralView ArticlePubMedGoogle Scholar
- DFCI gene index. Atlantic salmon 4.0. [http://compbio.dfci.harvard.edu/tgi/tgipage.html]
- Rengmark AH, Slettan A, Skaala O, Lie O, Lingaas F: Genetic variability in wild and farmed Atlantic salmon (Salmo salar) strains estimated by SNP and microsatellites. Aquaculture. 2006, 253 (1-4): 229-237. 10.1016/j.aquaculture.2005.09.022.View ArticleGoogle Scholar
- Kozak M: An Analysis of 5'-Noncoding Sequences from 699 Vertebrate Messenger-Rnas. Nucleic Acids Research. 1987, 15 (20): 8125-8148. 10.1093/nar/15.20.8125.PubMed CentralView ArticlePubMedGoogle Scholar
- Pesole G, Gissi C, Grillo G, Licciulli F, Liuni S, Saccone C: Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs. Gene. 2000, 261 (1): 85-91. 10.1016/S0378-1119(00)00471-6.View ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMed CentralView ArticlePubMedGoogle Scholar
- MacDonald CC, Redondo JL: Reexamining the polyadenylation signal: were we wrong about AAUAAA?. Mol Cell Endocrinol. 2002, 190 (1-2): 1-8. 10.1016/S0303-7207(02)00044-8.View ArticlePubMedGoogle Scholar
- Graber JH, Cantor CR, Mohr SC, Smith TF: In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species. Proc Natl Acad Sci USA. 1999, 96 (24): 14055-14060. 10.1073/pnas.96.24.14055.PubMed CentralView ArticlePubMedGoogle Scholar
- Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D: Patterns of variant polyadenylation signal usage in human genes. Genome Res. 2000, 10 (7): 1001-1010. 10.1101/gr.10.7.1001.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu D, Brockman JM, Dass B, Hutchins LN, Singh P, McCarrey JR, MacDonald CC, Graber JH: Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis. Nucleic Acids Res. 2007, 35 (1): 234-246. 10.1093/nar/gkl919.PubMed CentralView ArticlePubMedGoogle Scholar
- Legendre M, Ritchie W, Lopez F, Gautheret D: Differential repression of alternative transcripts: a screen for miRNA targets. PLoS Comput Biol. 2006, 2 (5): e43-10.1371/journal.pcbi.0020043.PubMed CentralView ArticlePubMedGoogle Scholar
- Rigoutsos I, Floratos A: Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics. 1998, 14 (1): 55-67. 10.1093/bioinformatics/14.1.55.View ArticlePubMedGoogle Scholar
- Mignone F, Grillo G, Licciulli F, Iacono M, Liuni S, Kersey PJ, Duarte J, Saccone C, Pesole G: UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res. 2005, D141-146. 33 Database
- Kaldy P, Menotti E, Moret R, Kuhn LC: Identification of RNA-binding surfaces in iron regulatory protein-1. EMBO J. 1999, 18 (21): 6073-6083. 10.1093/emboj/18.21.6073.PubMed CentralView ArticlePubMedGoogle Scholar
- Thomson AM, Rogers JT, Leedman PJ: Iron-regulatory proteins, iron-responsive elements and ferritin mRNA translation. Int J Biochem Cell Biol. 1999, 31 (10): 1139-1152. 10.1016/S1357-2725(99)00080-1.View ArticlePubMedGoogle Scholar
- Walczak R, Westhof E, Carbon P, Krol A: A novel RNA structural motif in the selenocysteine insertion element of eukaryotic selenoprotein mRNAs. RNA. 1996, 2 (4): 367-379.PubMed CentralPubMedGoogle Scholar
- He L, Hannon GJ: MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet. 2004, 5 (7): 522-531. 10.1038/nrg1379.View ArticlePubMedGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol. 2005, 3 (3): e85-10.1371/journal.pbio.0030085.PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008, D154-158. 36 Database
- Ramachandra RK, Salem M, Gahr S, Rexroad CE, Yao J: Cloning and characterization of microRNAs from rainbow trout (Oncorhynchus mykiss): their expression during early embryonic development. BMC Dev Biol. 2008, 8: 41-10.1186/1471-213X-8-41.PubMed CentralView ArticlePubMedGoogle Scholar
- Galgano A, Forrer M, Jaskiewicz L, Kanitz A, Zavolan M, Gerber AP: Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system. PLoS ONE. 2008, 3 (9): e3164-10.1371/journal.pone.0003164.PubMed CentralView ArticlePubMedGoogle Scholar
- Venkataraman K, Brown KM, Gilmartin GM: Analysis of a noncanonical poly(A) site reveals a tripartite mechanism for vertebrate poly(A) site recognition. Genes Dev. 2005, 19 (11): 1315-1327. 10.1101/gad.1298605.PubMed CentralView ArticlePubMedGoogle Scholar
- Frith MC, Fu Y, Yu L, Chen JF, Hansen U, Weng Z: Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res. 2004, 32 (4): 1372-1381. 10.1093/nar/gkh299.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.