Quality assessment parameters for EST-derived SNPs from catfish
© Wang et al; licensee BioMed Central Ltd. 2008
Received: 19 June 2008
Accepted: 30 September 2008
Published: 30 September 2008
SNPs are abundant, codominantly inherited, and sequence-tagged markers. They are highly adaptable to large-scale automated genotyping, and therefore, are most suitable for association studies and applicable to comparative genome analysis. However, discovery of SNPs requires genome sequencing efforts through whole genome sequencing or deep sequencing of reduced representation libraries. Such genome resources are not yet available for many species including catfish. A large resource of ESTs is to become available in catfish allowing identification of large number of SNPs, but reliability of EST-derived SNPs are relatively low because of sequencing errors. This project was designed to answer some of the questions relevant to quality assessment of EST-derived SNPs.
wo factors were found to be most significant for validation of EST-derived SNPs: the contig size (number of sequences in the contig) and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contigs contain four or more EST sequences with the minor allele sequence being represented at least twice in the contigs. Sequence quality surrounding the SNP under test is also crucially important. PCR extension appeared to be limited to a very short distance, prohibiting successful genotyping when an intron was present, a surprising finding.
Stringent quality assessment measures should be used when working with EST-derived SNPs. In particular, contigs containing four or more ESTs should be used and the minor allele sequence should be represented at least twice. Genotyping primers should be designed from a single exon, completely avoiding introns. Application of such quality assessment measures, along with large resources of ESTs, should provide effective means for SNP identification in species where genome sequence resources are lacking.
Most performance traits of agricultural relevance are complex traits that are governed by multiple genes. Due to the large number of genes underlying a single trait and their complex interactions, direct genetic analysis of such traits has been difficult. In the past decade, genetic mapping has demonstrated great promise for the analysis of complex traits. In particular, wide applications of microsatellite markers in animal genome studies have allowed major progress in understanding of genes underlying major performance traits [1, 2]. However, as larger genome datasets have become available recently, it is clear that microsatellites are not sufficiently dense to provide the genome coverage necessary for the dissection of many of the highly complex traits such as disease resistance, feed conversion efficiency, growth, and carcass traits. In addition, large-scale automated genotyping of microsatellites has not been possible. Recently, much excitement was generated with the ability to analyze complex traits with new types of polymorphic markers. Efforts have shifted to single nucleotide polymorphisms (SNPs).
SNPs are the most abundant type of genetic variation. Theoretically, SNPs can have four alleles, but they have been regarded as bi-allelic as most often, only two alleles have been observed at a given position . SNPs are estimated to occur once every 500 to 1,000 bp in humans when any two chromosomes are compared [4–6] and their frequencies have been estimated to be higher in other organisms [7, 8]. This would make it possible to construct genetic maps with extremely high marker densities allowing identification of haplotypes using SNPs, especially for the species with a draft genome sequence . In addition, SNPs offer several other advantages over other molecular markers. First, SNPs themselves are the fundamental causes of the genetic variation and their mapping would provide potential for the identification of the "causing" SNPs as well as the "associated" SNPs with specific and complex traits [10–13]. Second, many technologies have been developed to genotype SNPs cost-effectively in an automated fashion [14–16]. Third, SNPs are sequence-tagged markers with codominant inheritance, suitable for comparative genome analysis [8, 17]; and finally, SNPs are highly stable genetic markers compared to tandem repeat markers where the high mutation rates can confound genetic analysis in populations [18, 19].
In most cases, genome-wide SNP discovery has relied on the availability of a draft genome sequence where SNPs can be detected from sequences of the two chromosomes of a diploid organism in the sequence assembly. This approach was initially feasible only for humans and model species. However, as the cost of genome sequencing decreases, now draft genome sequences have become available for several agriculturally important species including bovine, chickens, horses, and soon the swine and tilapia. However, for most aquaculture species, it may take a while for the generation of entire genome draft sequences. Facing the equal challenge, alternative approaches must be sought. Hayes et al. (2007) was able to identify a large number of SNPs from EST resources in Atlantic salmon, and they recently demonstrated mapping of EST-derived SNPs to genetic linkage map [20, 21]. Their pioneering work with an aquaculture species set a great model for use of ESTs for the identification of SNPs, especially in non-model species [20–22]. In addition, BAC end sequences (BES) can also serve as sources for the identification of SNPs, and the combination of EST and BES could improve the SNP discovery accuracy comparing using only EST sequences .
Catfish is the most important aquaculture species in the United States representing over 60% US aquaculture production. Much progress has been made  in its genome resource development including a large number of polymorphic markers [25–27], construction of genetic linkage maps [28, 29], construction and characterization of BAC libraries [30, 31], and construction of BAC contig-based physical maps [32, 33]. Particularly, in relation to SNP discovery, a number of genome resources have been developed including approximately 60,000 BAC end sequences [26, 27], over 55,000 ESTs  and over 400,000 ESTs are being generated by the Joint Genome Institute of the Department of Energy. Such EST sequences will provide an enormous resource for SNP identification. However, as most researchers have experienced, identification of SNPs using ESTs is not without problems. The most frequent problem is the high rate of sequencing errors that lead to the identification of pseudo-SNPs, leading to subsequently great efforts and expense. The objective of this project was to develop a strategy for rapid and reliable identification and evaluation of qualities of EST-derived SNPs to reduce the rate of pseudo-SNPs resulted from sequence errors typically found in single-pass EST datasets, especially those deposited in NCBI where sequence trace files may or may not be available.
EST clustering and contig assembly
All catfish EST sequences were downloaded from NCBI dbEST database, including those of blue catfish and channel catfish. CAP3 was used to assemble the contigs with the parameters set at "minmatch 50, overlap similarity 0.95", to have a minimal overlap of 50 bases and a minimal similarity of 95% . For each contig generated from the CAP3 assembly, BLASTX was conducted against the non-redundant NR database to assist identification of any related ESTs in different contigs. A significant hit was defined as having an E-value below e-10 and 100 minimum of alignment length for all sequences. Following initial gene identification, related ESTs were further evaluated by manual inspection of the alignments.
SNP identification using EST resources
The autoSNP program was used to detect putative SNPs from the EST sequences . The program utilized the CAP3 output files as input to detect SNPs based on the base redundancy in the sequence alignments. The autoSNP program generated two text files, a contig file including contig ID, consensus length, number of sequences in the contig, and the number of SNPs, a SNP file including Contig ID, SNP position, minor allele frequency, SNP allele, mutation type, and base alignment in the SNP position. The program also generated an html file for each contig, including the alignment information and SNP information. With the autoSNP program, the parameters for minimum minor allele frequency for SNP detection varied with the contig size (the number of sequences in the contig): 1) a sequence variation is declared as a SNP whenever a mismatch is identified within contigs with four or fewer sequences; 2) a sequence variation is declared as a SNP when the minor allele sequence existed at least twice within contigs with 5–6 sequences; 3) a sequence variation is declared as a SNP when the minor allele sequence existed at least three times within contigs with 7–8 sequences; 4) similarly a sequence variation is declared as a SNP when the minor allele sequence existed at least four times within contigs with 9–12 sequences, and 5) when the minor allele sequence existed at least five times within contigs with 13–16 sequences and so on.
Selection of SNPs for this project
To evaluate the effect of contig size and minor allele sequence frequency on SNP reliability, the SNPs with different contig sizes and minor allele frequencies were selected for SNP validation. After initial submission of a set of SNPs to Illumina, GoldenGate assay functionality and designability scores were given by Illumina. SNPs with a range of functionality and designability scores were chosen for evaluation in this project. A total of 384 SNPs were selected for this project. Hot spots of SNP occurrence that may have been caused by low sequence quality were selected to test how sequence quality affects SNP genotyping and validation rates. In addition to sequence quality, the effect of intron presence on genotyping and validation rates was tested by including SNPs with four known genomic sequences.
Fish samples used for validation of SNPs
A panel of 192 samples were used for genotyping and validation of SNPs including 66 fish from our interspecific mapping resource family F1-2 × Channel catfish-6 (64 backcross progenies plus their two parents), and 21 fish each from three wild channel catfish populations and three domestic channel catfish populations .
SNP genotyping assay
Genomic DNA (250 ng per sample) was used as template for SNP genotyping using the Illumina's bead array technology according to the manufacturer's protocol for GoldenGate assay . Briefly, two allele-specific primers labeled with Cy3 (P1) or Cy5 (P2) and a third locus-specific primer (P3) with an address sequence were first hybridized to the template and allele-specific primers were extended to cross the SNP site to reach the locus-specific primer. After this allele-specific extension, ligation was conducted between allele-specific primer(s) and the locus-specific primer, creating a PCR template. PCR reaction was conducted using both allele-specific primers and the locus-specific primer. The PCR reaction products were hybridized onto a chip (Illumina Inc., San Diego) containing bead types coated with oligo-nucleotides complementary to the locus-specific primer address on the PCR product. Each bead type is represented with an average redundancy of 30× on the array to optimize the accuracy of the final genotype signal. Following hybridization, the bead array signal was determined using a bead array reader, which could convert images to intensity data. The intensity data for each SNP for each sample was normalized and assigned a cluster position (and resulting genotype), and a quality score for each genotype was generated. Final genotyping results were automatically generated for downstream analysis using the BeadStudio software (Illumina Inc., San Diego).
The BeadStudio software was used to analyze the SNPs data. The dye intensities are examined by the software to determine the genotype of each sample for that locus. A locus returning predominantly signal from Cy3 is AA, Cy5 is BB and an equal signal of Cy3 and Cy5 represents a heterozygous individual. Data is returned with the allele call for each locus as well as a Gentrain score, a measure that represents the reliability of that genotyping call. GenTrain scores was used to measure the reliability of SNP detection based on the distribution of genotypic classes, and the calling frequency was used to measure the successful SNP calling rate from all samples . For this study, GenTrain score of 0.4, call rate of 90%, and minor allele frequency of 0.05 was used. After removing failed SNPs, the remaining SNPs were identified as successful SNPs in genotyping. Successful genotypes were used further for the analysis of minor allele frequencies, and for the calculation of SNP validation rate. Heterozygosity is defined with the formula H = 1-(pa2+pb2) where Pa is the allele frequency of the major allele and Pb is the allele frequency of the minor allele .
Summary of the EST Assembly
Number of sequences for assembly
Number of contigs
Number of singletons
Number of putative transcripts
Average contig size
Average contig length (bp)
No of contig with:
> 50 ESTs
Putative SNP discovery
Initial identification of SNPs as detected by AutoSNP software
No. of Sequences in each contig
No. of contigs with SNPs
No. of total SNPs
Total Consensus Length (bp)
SNP frequency (per 100 bp)
Validation of SNPs
The Illumina's Quality Scores of SNPs did not affect SNP validation rates
Overall summary of the EST-derived SNP genotyping using the Illumina Bead Array technology
Number of SNPs
Average Quality Score
Successful genotype calling
Total number of loci tested
Contig size and minor sequence allele frequency were the major determinants on SNP validation rates
SNP polymorphic rates as a function of contig size and minor sequence allele frequency
# of sequences in the contig
# Successful Loci
Minimal Minor Sequence Frequency
Polymorphic rate (%)
2:3 & 2:4 & 3:3
3:4 & 3:5 & 4:4
4:5 & 4:6 & 4:7 & 4:8 & 5:5 & 5:6 & 5:7 & 6:6
5:7 & 6:6 & 5:8 & 6:7......... & 12:57
Quality of sequences flanking SNPs is important
Flanking sequence quality greatly affected the SNP success rate. Among the contigs with SNPs, we identified 28 contigs with hot spots of SNP occurrence where a region of sequence was highly variable with many "SNPs" detected. Sequence quality examination suggested low quality scores in the sequencing reactions. We intentionally included these SNPs in this project. Of the 28 SNPs tested, 14 (50%) failed in genotyping, suggesting that high sequence quality is required in the SNP region as they are involved in the genotyping primer binding regions (data not shown).
The presence of intron(s) was the major cause for SNP genotyping failures
The presence of introns greatly reduced the SNP genotyping success rate. Among the contigs containing SNPs, 4 had genomic DNA information that allowed us to test if the involvement of introns has any effect on SNP genotyping and validation rates. All four SNPs failed to provide genotypes. Clearly, The Bead Array technology depends on very short extension and subsequent ligation for success.
Effect of low sequence quality (as defined by the presence of hot spots of SNP occurrence) and the presence of predicted intron on success rate of SNP genotyping
Number of loci with SNP located in regions containing low quality sequences
Number of loci with known introns
Number of failed loci without gene information
With Significant Blast hits
SNP positions can be located by similarity comparisons with zebrafish genome
Number of Loci with SNP predicted to be positioned at exon-intron border
Total number of loci potentially with SNP positioned at exon-intron border
ESTs were proven to be efficient resources for putative SNP identification [20, 22, 39–41]. This study provides an assessment of nucleotide diversity in available catfish EST resources for putative SNP identification. Since our goal was to make quality assessment for the EST-derived SNPs, we designed this project to provide some answers as to how the sequence context (Illumina's Quality Score), contig size, minor sequence allele frequency, sequence quality flanking SNPs, and the distance between the SNP genotyping primers affect the SNP validation rates.
As compared with SNPs identified from genomic sequences, EST-derived SNPs have several advantages. Since ESTs are transcribed sequences, EST-derived SNPs are associated with actual genes allowing use of gene-associated SNPs for mapping and subsequent use in comparative genome studies . This is particularly important for species without a genome sequence such as aquaculture species. In addition to be used as markers for mapping, SNPs are also considered a rich source of candidate polymorphisms underlying important traits leading to the identification of causative genes or quantitative trait nucleotide (QTN) . However, there are several important factors needed to be considered when using EST-derived SNPs. The major issue for development of SNPs from EST resources is not whether SNPs can readily be identified, but to what degree these SNPs would be reliable, because parameters for quality assessment of EST-derived SNPs simply do not exist. This reliability issue was mostly due to sequence errors; assembled contigs with sequence variation could simply be sequence errors. Additionally, since SNPs derived from ESTs can only be identified from EST contigs where the same gene transcripts were sequenced at least twice and sequencing frequency of ESTs is not random, large scale sequencing is required to identify SNP's from rarely expressed genes. Moreover, SNP rates could be lower in coding regions because of evolutionary restraints of selection pressure.
In this study, over 33,000 putative SNPs were identified from 55,000 catfish ESTs and 384 of these SNPs were tested using 192 catfish samples. We have found that the contig size (number of sequences in the contig) and minor sequence allele frequency were the two major factors affecting the validation rates of EST-derived SNPs. Small contigs had much lower SNP validation rates. Obviously, in small contigs with 2 or 3 sequences, the alternative base is represented only once, and this could be due to sequencing errors. Similarly, in contigs with 4 sequences when the minor sequence allele is represented only once, it is highly likely that the minor allele is due to sequencing errors. We cannot determine the quality of these SNPs without the sequence trace files. Contigs of 4 or more sequences with the minor sequence allele frequency being present at least twice in the contig provided high levels of SNP validation rates (average 70.9% and up to 89.2%). This makes good sense because it is highly unlikely that sequencing errors of two independently sequenced ESTs to occur at the same base location. When at least two ESTs exhibit an alternative base at the putative SNP sites, it is highly likely that such sequence variations are real. All these findings were not unexpected, but for the first time, we provide experimental data to demonstrate the importance of contig size and minor sequence allele frequency. It is noteworthy that even though the larger contigs provided even greater SNP validation rates, contigs of four sequences with even sequence allele distribution (2:2) provided similarly high validation rates. A minimum of two sequences in the contigs representing the minor allele was required to provide a high SNP validation rate [20, 22].
Another advantage of the SNP identification from EST sequences is its ability to identify the uncommon sequence variants . The monomorphic SNP rate was highly related to the number of samples tested, since the uncommon sequence variants possess very low minor allele frequency, which required a large number of samples. According to our results, the monomorphic SNPs accounted for 28% of tested SNPs. However, these monomorphic SNPs could be false SNPs caused by sequencing errors. The SNPs derived from contigs with four sequences with only one minor allele sequence had the highest monomorphic rate, and the SNPs derived from more than 10 sequences contigs had the lowest monomorphic rate, suggesting that most of the monomorphic SNPs could have also derived from false SNPs, not uncommon sequences variants. In addition, much smaller fish samples (10 fish) were used to construct the EST libraries than the number fish samples used here to validate the SNPs, further supporting the possibility of sequencing errors related to monomorphic SNPs.
Sequence quality flanking the SNP sites was found to be important for successful SNP genotyping using Illumina's Bead Array technology, but not the flanking sequence context as referred to as the Quality Score by Illumina when above 0.5. It is probably true that SNP genotyping primers would have worked properly for the most part even if the sequence context was somewhat simple or A/T-rich, or G/C-rich. However, sequence errors in the SNP region could directly affect the base pairing of the SNP genotyping primers. Low quality sequences could easily generate false SNPs, especially at the beginning or at the end of the sequence. Therefore, sequence quality surrounding the SNP site should be used as one parameter to identify reliable SNPs. However, many EST sequences retrieved from NCBI do not have quality scores or trace files. In such cases, greater caution should be exercised. In particular, hot spot of SNP occurrence should be avoided if possible.
In this project, we demonstrated that ESTs are powerful resources for SNP identification. In spite of the development of highly efficient methods for SNP identification from genomic sequences, such as using deep sequencing of reduced representation libraries , SNP discovery from EST resources does not require any additional bench work. These existing resources need to be enabled by bioinformatics analysis. EST-derived SNPs are from transcribed genes, and therefore they are applicable to the identification of quantitative trait nucleotide (QTN) as well as to comparative genome analysis .
The key to the success of SNP genotyping using EST-derived SNPs is the avoidance of introns. The most important factors for high rate of SNP validation are the contig size (number of sequences in the contig) and the minor sequence allele frequency. Presence of minor sequence allele at least twice in the contig is crucial for EST-derived SNP validation. Use of contigs with at least four sequences, when coupled to the presence of minor sequence allele at least twice in a contig should provide a high level of confidence for the validation of EST-derived SNPs. Quality assessment measures for the EST-derived SNPs presented here should be applicable to EST-derived SNPs from all species. Application of such guidelines, along with the availability of large numbers of ESTs, should make it effective to identify and apply EST-derived SNPs for genome-scale analysis.
This project was supported by a grant from USDA NRI Animal Genome Basic Genome Reagents and Tools Program (USDA/NRICGP award # 2006-35616-16685), and partially by an AAES Ag Initiatives grant.
- Ron M, Weller JI: From QTL to QTN identification in livestock – winning by points rather than knock-out: a review. Anim Genet. 2007, 38 (5): 429-439.PubMedView ArticleGoogle Scholar
- Rothschild MF: Porcine genomics delivers new tools and results: this little piggy did more than just go to market. Genet Res. 2004, 83 (1): 1-6.PubMedView ArticleGoogle Scholar
- Krawczak M: Informativity assessment for biallelic single nucleotide polymorphisms. Electrophoresis. 1999, 20 (8): 1676-1681.PubMedView ArticleGoogle Scholar
- Cooper DN, Smith BA, Cooke HJ, Niemann S, Schmidtke J: An estimate of unique DNA sequence heterozygosity in the human genome. Hum Genet. 1985, 69 (3): 201-205.PubMedView ArticleGoogle Scholar
- Harding RM, Fullerton SM, Griffiths RC, Bond J, Cox MJ, Schneider JA, Moulin DS, Clegg JB: Archaic African and Asian lineages in the genetic ancestry of modern humans. Am J Hum Genet. 1997, 60 (4): 772-789.PubMedPubMed CentralGoogle Scholar
- Li WH, Sadler LA: Low nucleotide diversity in man. Genetics. 1991, 129 (2): 513-523.PubMedPubMed CentralGoogle Scholar
- Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y, Jindo T, Kobayashi D, Shimada A, Toyoda A, Kuroki Y, Fujiyama A, Sasaki T, Shimizu A, Asakawa S, Shimizu N, Hashimoto S, Yang J, Lee Y, Matsushima K, Sugano S, Sakaizumi M, Narita T, Ohishi K, Haga S, Ohta F: The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007, 447 (7145): 714-719.PubMedView ArticleGoogle Scholar
- Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC, Mauceli E, Xie X, Breen M, Wayne RK, Ostrander EA, Ponting CP, Galibert F, Smith DR, DeJong PJ, Kirkness E, Alvarez P, Biagi T, Brockman W, Butler J, Chin CW, Cook A, Cuff J, Daly MJ, DeCaprio D, Gnerre S: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005, 438 (7069): 803-819.PubMedView ArticleGoogle Scholar
- Salisbury BA, Pungliya M, Choi JY, Jiang R, Sun XJ, Stephens JC: SNP and haplotype variation in the human genome. Mutat Res. 2003, 526: 53-61.PubMedView ArticleGoogle Scholar
- Butcher LM, Davis OS, Craig IW, Plomin R: Genome-wide quantitative trait locus association scan of general cognitive ability using pooled DNA and 500K single nucleotide polymorphism microarrays. Genes Brain Behav. 2007Google Scholar
- Kiyohara C, Yoshimasu K: Genetic polymorphisms in the nucleotide excision repair pathway and lung cancer risk: a meta-analysis. Int J Med Sci. 2007, 4: 59-71.PubMedPubMed CentralView ArticleGoogle Scholar
- Lazarus R, Vercelli D, Palmer LJ, Klimecki WJ, Silverman EK, Richter B, Riva A, Ramoni M, Martinez FD, Weiss ST, Kwiatkowski DJ: Single nucleotide polymorphisms in innate immunity genes: abundant variation and potential role in complex human disease. Immunol Rev. 2002, 190: 9-25.PubMedView ArticleGoogle Scholar
- Rafalski A: Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol. 2002, 5 (2): 94-100.PubMedView ArticleGoogle Scholar
- Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS: SNP discovery via 454 transcriptome sequencing. Plant J. 2007, 51 (5): 910-918.PubMedPubMed CentralView ArticleGoogle Scholar
- Fan JB, Oliphant A, Shen R, Kermani BG, Garcia F, Gunderson KL, Hansen M, Steemers F, Butler SL, Deloukas P, Galver L, Hunt S, McBride C, Bibikova M, Rubano T, Chen J, Wickham E, Doucet D, Chang W, Campbell D, Zhang B, Kruglyak S, Bentley D, Haas J, Rigault P, Zhou L, Stuelpnagel J, Chee MS: Highly parallel SNP genotyping. Cold Spring Harb Symp Quant Biol. 2003, 68: 69-78.PubMedView ArticleGoogle Scholar
- Shen R, Fan JB, Campbell D, Chang W, Chen J, Doucet D, Yeakley J, Bibikova M, Wickham Garcia E, McBride C, Steemers F, Garcia F, Kermani BG, Gunderson K, Oliphant A: High-throughput SNP genotyping on universal bead arrays. Mutat Res. 2005, 573: 70-82.PubMedView ArticleGoogle Scholar
- Moreno-Vazquez S, Ochoa OE, Faber N, Chao S, Jacobs JM, Maisonneuve B, Kesseli RV, Michelmore RW: SNP-based codominant markers for a recessive gene conferring resistance to corky root rot (Rhizomonas suberifaciens) in lettuce (Lactuca sativa). Genome. 2003, 46 (6): 1059-1069.PubMedView ArticleGoogle Scholar
- Hastbacka J, de la Chapelle A, Kaitila I, Sistonen P, Weaver A, Lander E: Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet. 1992, 2 (3): 204-211.PubMedView ArticleGoogle Scholar
- Marshall B, Leelayuwat C, Degli-Esposti MA, Pinelli M, Abraham LJ, Dawkins RL: New major histocompatibility complex genes. Hum Immunol. 1993, 38 (1): 24-29.PubMedView ArticleGoogle Scholar
- Hayes B, Laerdahl JK, Lien S, Moen T, Berg P, Hindar K, Davidson WS, Koop BF, Adzhubei A, Hoyheim B: An extensive resource of single nucleotide polymorphism markers associated with Atlantic salmon (Salmo salar) expressed sequences. Aquaculture. 2007, 265 (1–4): 82-90.View ArticleGoogle Scholar
- Moen T, Hayes B, Baranski M, Berg P, Kjoglum S, Koop B, Davidson W, Omholt S, Lien S: A linkage map of the Atlantic salmon (Salmo salar) based on EST-derived SNP markers. BMC Genomics. 2008, 9 (1): 223-PubMedPubMed CentralView ArticleGoogle Scholar
- Pavy N, Pelgas B, Beauseigle S, Blais S, Gagnon F, Gosselin I, Lamothe M, Isabel N, Bousquet J: Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce. BMC Genomics. 2008, 9 (1): 21-PubMedPubMed CentralView ArticleGoogle Scholar
- Guryev V, Koudijs MJ, Berezikov E, Johnson SL, Plasterk RH, van Eeden FJ, Cuppen E: Genetic variation in the zebrafish. Genome Res. 2006, 16 (4): 491-497.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu ZJ: A review of catfish genomics: progress and perspectives. Comparative and Functional Genomics. 2003, 4: 259-265.PubMedPubMed CentralView ArticleGoogle Scholar
- Serapion J, Kucuktas H, Feng J, Liu ZJ: Bioinformatic mining of type I microsatellites from expressed sequence tags of channel catfish (Ictalurus punctatus). Mar Biotechnol (NY). 2004, 6 (4): 364-377.View ArticleGoogle Scholar
- Somridhivej B, Wang S, Sha Z, Liu H, Quilang J, Xu P, Li P, Hu Z, Liu ZJ: Characterization, polymorphism assessment, and database construction for microsatellites from BAC end sequences of channel catfish (Ictalurus punctatus): A resource for integration of linkage and physical maps. Aquaculture. 2008, 275 (1–4): 76-80.View ArticleGoogle Scholar
- Xu P, Wang S, Liu L, Peatman E, Somridhivej B, Thimmapuram J, Gong G, Liu ZJ: Channel catfish BAC-end sequences for marker development and assessment of syntenic conservation with other fish species. Anim Genet. 2006, 37 (4): 321-326.PubMedView ArticleGoogle Scholar
- Liu ZJ, Karsi A, Li P, Cao D, Dunham R: An AFLP-based genetic linkage map of channel catfish (Ictalurus punctatus) constructed by using an interspecific hybrid resource family. Genetics. 2003, 165 (2): 687-694.PubMedPubMed CentralGoogle Scholar
- Waldbieser GC, Bosworth BG, Nonneman DJ, Wolters WR: A microsatellite-based genetic linkage map for channel catfish, Ictalurus punctatus. Genetics. 2001, 158 (2): 727-734.PubMedPubMed CentralGoogle Scholar
- Quiniou SM, Katagiri T, Miller NW, Wilson M, Wolters WR, Waldbieser GC: Construction and characterization of a BAC library from a gynogenetic channel catfish Ictalurus punctatus. Genet Sel Evol. 2003, 35 (6): 673-683.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang S, Xu P, Thorsen J, Zhu B, de Jong PJ, Waldbieser G, Kucuktas H, Liu ZJ: Characterization of a BAC library from channel catfish Ictalurus punctatus: indications of high levels of chromosomal reshuffling among teleost genomes. Mar Biotechnol (NY). 2007, 9 (6): 701-711.View ArticleGoogle Scholar
- Quiniou SM, Waldbieser GC, Duke MV: A first generation BAC-based physical map of the channel catfish genome. BMC Genomics. 2007, 8: 40-PubMedPubMed CentralView ArticleGoogle Scholar
- Xu P, Wang S, Liu L, Thorsen J, Kucuktas H, Liu ZJ: A BAC-based physical map of the channel catfish genome. Genomics. 2007, 90 (3): 380-388.PubMedView ArticleGoogle Scholar
- Li P, Peatman E, Wang S, Feng J, He C, Baoprasertkul P, Xu P, Kucuktas H, Nandi S, Somridhivej B, Simmons M, Turan C, Liu L, Muir W, Dunham R, Brady Y, Grizzle J, Liu ZJ: Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs. BMC Genomics. 2007, 8: 177-PubMedPubMed CentralView ArticleGoogle Scholar
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877.PubMedPubMed CentralView ArticleGoogle Scholar
- Barker G, Batley J, H OS, Edwards KJ, Edwards D: Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics. 2003, 19 (3): 421-422.PubMedView ArticleGoogle Scholar
- Simmons M, Mickett K, Kucuktas H, Li P, Dunham R, Liu ZJ: Comparison of domestic and wild channel catfish (Ictalurus punctatus) populations provides no evidence for genetic impact. Aquaculture. 2006, 252 (2–4): 133-146.View ArticleGoogle Scholar
- Liu ZJ: Microsatellite markers and assessment of marker utility. Aquaculture Genome Technologies. Edited by: Liu ZJ. 2007, Blackwell Publishing, Ames, IA, Chapter 5: 43-58.View ArticleGoogle Scholar
- Hayes BJ, Nilsen K, Berg PR, Grindflek E, Lien S: SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates. Bioinformatics. 2007, 23: 1692-1693.PubMedView ArticleGoogle Scholar
- He C, Chen L, Simmons M, Li P, Kim S, Liu ZJ: Putative SNP discovery in interspecific hybrids of catfish by comparative EST analysis. Anim Genet. 2003, 34 (6): 445-448.PubMedView ArticleGoogle Scholar
- Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M: Mining SNPs from EST databases. Genome Res. 1999, 9 (2): 167-174.PubMedPubMed CentralGoogle Scholar
- Sarropoulou E, Nousdili D, Magoulas A, G K: Linking the genomes of nonmodel teleosts through comparative genomics. Mar Biotechnol (NY). 2008, 10 (3): 227-233.View ArticleGoogle Scholar
- Jalving R, van't Slot R, BA vO: Chicken single nucleotide polymorphism identification and selection for genetic mapping. Poult Sci. 2004, 83 (12): 1925-1931.PubMedView ArticleGoogle Scholar
- Van Tassell CP, Smith TP, Matukumalli LK, Taylor JF, Schnabel RD, Lawley CT, Haudenschild CD, Moore SS, Warren WC, Sonstegard TS: SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods. 2008, 5 (3): 247-252.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.