Generation of genome-scale gene-associated SNPs in catfish for the construction of a high-density SNP array
© Liu et al; licensee BioMed Central Ltd. 2011
Received: 29 October 2010
Accepted: 21 January 2011
Published: 21 January 2011
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies. In order to provide the best genome coverage for the analysis of performance and production traits, a large number of relatively evenly distributed SNPs are needed. Gene-associated SNPs may fulfill these requirements of large numbers and genome wide distribution. In addition, gene-associated SNPs could themselves be causative SNPs for traits. The objective of this project was to identify large numbers of gene-associated SNPs using high-throughput next generation sequencing.
Transcriptome sequencing was conducted for channel catfish and blue catfish using Illumina next generation sequencing technology. Approximately 220 million reads (15.6 Gb) for channel catfish and 280 million reads (19.6 Gb) for blue catfish were obtained by sequencing gene transcripts derived from various tissues of multiple individuals from a diverse genetic background. A total of over 35 billion base pairs of expressed short read sequences were generated. Over two million putative SNPs were identified from channel catfish and almost 2.5 million putative SNPs were identified from blue catfish. Of these putative SNPs, a set of filtered SNPs were identified including 342,104 intra-specific SNPs for channel catfish, 366,269 intra-specific SNPs for blue catfish, and 420,727 inter-specific SNPs between channel catfish and blue catfish. These filtered SNPs are distributed within 16,562 unique genes in channel catfish and 17,423 unique genes in blue catfish.
For aquaculture species, transcriptome analysis of pooled RNA samples from multiple individuals using Illumina sequencing technology is both technically efficient and cost-effective for generating expressed sequences. Such an approach is most effective when coupled to existing EST resources generated using traditional sequencing approaches because the reference ESTs facilitate effective assembly of the expressed short reads. When multiple individuals with different genetic backgrounds are used, RNA-Seq is very effective for the identification of SNPs. The SNPs identified in this report will provide a much needed resource for genetic studies in catfish and will contribute to the development of a high-density SNP array. Validation and testing of these SNPs using SNP arrays will form the material basis for genome association studies and whole genome-based selection in catfish.
Single nucleotide polymorphisms (SNPs) are alternative bases at any given position of DNA. They are among the most abundant type of genetic variations and widely distributed within genomes. Theoretically, SNPs can have four alleles in the population, but they most often exist as bi-allelic markers. Because of their potential for high genotyping efficiency, automation, data quality, genome-wide coverage and analytical simplicity , SNPs have rapidly become the marker of choice for many applications in genetics and genomics. In particular, SNPs are most suitable for whole genome association studies because linkage disequilibrium can be detected with high density SNP coverage of the genome when working with performance and production traits. For instance, simultaneous analysis of thousands of SNPs have enabled genome-wide association studies for complex traits in chicken , pig [3, 4] cattle [5–7] horse  and sheep [9, 10]. However, such studies have not been possible with most aquaculture species including catfish because large numbers of SNPs have not been available.
In species where the whole genome has been sequenced, SNPs have been identified from genome sequencing efforts. In most cases, SNPs were identified by sequence variations between the two alleles of a single diploid individual whose genome was sequenced . More recently, the identification of SNPs in non-model species has been fuelled by mining large numbers of expressed sequence tags (ESTs) available in many species. Likewise, gene-associated SNPs derived from ESTs have been identified in several fish species, including Atlantic salmon , Atlantic cod  and catfish [14–16]. In spite of being relatively effective, SNP identification from ESTs is limited by sequence coverage and depth. For instance, of the 303,000 putative SNPs identified from catfish ESTs, only 48,594 were identified from contigs containing at least four ESTs and at least two sequences bearing the minor allele. A majority of the catfish EST contigs (56% of 45,306) contain only two or three sequences . Putative SNPs identified from such contigs would have the minor alleles represented by only one sequence. Such SNPs could represent sequence errors and therefore, are not reliable .
To identify larger numbers of gene-associated SNPs, higher throughput expressed sequence reads are needed to increase coverage and depth and ensure sequence accuracy. Next generation sequencing technologies such as Roche/454, Illumina/Solexa, and ABI/SOLiD sequencing platforms are particularly adapted to producing high coverage of expressed sequences within contigs . Transcriptome analysis using next generation sequencing with multiple individuals has been demonstrated to be very effective for SNP identification . Recently, 454 sequencing was applied for the identification of gene-derived SNPs in a number of species such as eucalyptus grandis , pine tree , butterfly , lake sturgeon  and coral .
While the 454 sequencing technology has been widely used for transcriptome analysis, Illumina sequencing technology is being gradually accepted for its dramatically improved sequencing throughput and quality [23, 24]. Paired-end sequencing technology along with the longer sequence reads make it possible to assemble contigs of transcripts from Illumina short reads. Such assemblies are aided by the presence of reference genome and/or reference transcriptome sequences [19, 25]. In this context, a large number of ESTs of catfish are available. The objective of this study is to conduct transcriptome sequencing from multiple individuals of both channel catfish (Ictalurus punctatus) and blue catfish (I. furcatus) in order to identify gene-associated SNPs for the development of SNP arrays in catfish.
Generation of expressed short reads
Summary of Illumina expressed short reads production and filtration
No. of tissues
No. of fish
Bases sequenced (X109)
Reads after trimming (X106)
Bases after trimming (X109)
Assembly of the expressed short reads
Summary of reference assembly of expressed short reads of channel catfish and blue catfish
No. of reads used
No. of reads
No. of contigs
218.8 × 106
152.6 × 106
274.6 × 106
183.8 × 106
Summary of de novo assembly of the unassembled expressed short reads from reference assembly of channel catfish and blue catfish
No. of reads used for assembly
No. of reads assembled
% sequences assembled
No. of contigs
Average contig length (bp)
Average contig size*
66.2 × 106
46.8 × 106
90.8 × 106
64.3 × 106
Summary of assembly of all catfish expressed short reads
No. of reads used for assembly
No. of reads assembled
% sequences assembled
No. of contigs
Avg. contig length
No. of large contigs (>1 kb)
Avg. contig size*
493.4 × 106
336.0 × 106
De novo 2
157.4 × 106
107.2 × 106
Putative gene identity and annotation
Summary of BLASTX searches to annotated protein databases
Contigs hit Uniprot
% contigs with hits
Unique protein hits
Contigs hit zebrafish Refseq
% contigs with hits
Unique zebrafish Refseq hits
To assess the coverage of the catfish transcriptome achieved by our sequencing effort, the distribution of gene ontology (GO) annotations in catfish was compared with that of zebrafish.
Summary of putative SNP identification from the catfish expressed short reads assembly
Contigs under analysis
Quality SNPs selected from the putative SNPs with a set of criteria as described in the Methods section
No. of contigs with SNPs
No. of contigs with Uniprot hits & SNPs
No. of unique known genes containing SNPs
While the number of SNPs is important, their distribution in contigs and genes within the genome is also important when used for genetic analysis. A total of 168,458 channel catfish contigs and 190,197 blue catfish contigs were found to contain putative filtered SNPs, of which 13,414 contigs contain SNPs at same positions in both channel catfish and blue catfish. The number of unique Uniprot accessions hit by contigs containing SNPs was 16,562 for channel catfish, and 17,423 for blue catfish, suggesting that putative filtered SNPs were identified from the vast majority of catfish genes.
One important aspect of using the inter-specific hybrid system is to identify inter-specific SNPs. From this work, a total of 232,972 contigs were identified to contain 420,727 inter-specific SNPs, i.e., sequence variations between the two species, channel catfish and blue catfish. These SNPs were from at least 18,085 distinct genes as determined by unique hits to the Uniprot protein database (Table 7).
Microsatellite markers identification
Summary of microsatellite markers identification from the all catfish expressed short reads assembly
Number of contigs of sequences surveyed
Number of contigs containing microsatellites
Total number of microsatellites identified
Number of microsatellites with sufficient flanking sequences
Number of contigs containing microsatellites with sufficient flanking sequences
Assessment of SNP distribution
In this work, we have conducted RNA-Seq analysis with pooled RNA samples from multiple individuals of both channel catfish and blue catfish to develop large numbers of high-quality SNPs. A total of 493.4 million reads allowed generation of a total of over 35 billion base pairs of expressed sequences. Previous to this report, a total of approximately 290 million base pairs of expressed sequences of catfish had been generated using traditional Sanger sequencing. This work represents more than 100 times more transcript sequences than the total previously submitted to GenBank. Our results demonstrate the efficiency and cost-effectiveness of next generation sequencing technologies in generating expressed sequences.
One great challenge of using Illumina sequencing for transcriptome analysis is the short read length. In this study, we have used both the Illumina GA-II and HiSeq 2000 sequencing platforms that generated read lengths of 36 bp or 100 bp. De novo assembly of the expressed short reads proved to be problematic even with gene-associated sequences. For instance, a total de novo assembly of the 218.8 million short reads from channel catfish would lead to over 800,000 contigs. Similarly, de novo assembly of 274.6 million short reads from blue catfish would lead to over 1,000,000 contigs. Such large numbers of short contigs may make subsequent applications of the EST or SNP resources less effective. However, such challenges are significantly alleviated when a large EST resource is available, as demonstrated by drastic reduction of contig numbers with the reference assembly in this study.
Pooling of RNA samples from multiple individuals followed by transcriptome analysis using next generation sequencing is among the most efficient methods for SNP identification. Through many years of efforts, a total of approximately 303,000 putative catfish SNPs were previously identified . However, this study alone allowed identification of over 2 million SNPs from channel catfish and almost 2.5 million SNPs from blue catfish. This efficiency is even more obvious when considering filtered (high-quality) SNPs. While only 48,594 filtered SNPs were identified among all catfish ESTs , this work resulted in 342,104 filtered SNPs within channel catfish and 366,269 within blue catfish. In addition, more than 420,000 filtered SNPs were identified as inter-specific SNPs, and are valuable in genetics and breeding studies involving hybrid catfish.
One major challenge for SNPs is the problem caused by paralogous sequence variants (PSVs) and multisite sequence variants (MSVs) . Putative SNPs detected may be false positives, potentially arising from sequencing errors or misassembly of PSVs or MSVs. Paralogs that share high levels of sequence similarity may have been assembled in the same contig due to the short read length of Illumina reads. A higher stringency of assembly may better discriminate between paralogs, but complete discrimination may prove to be difficult due to the lack of a reference genome sequence. On the other hand, a higher stringency of assembly would lead to the separate assembly of haplotypes from highly polymorphic genes . Therefore, in order to select SNPs with high confidence, putative SNPs were screened based on several factors including surrounding sequence quality, absence of additional SNPs in the flanking regions, sequence depth and minor allele frequency. SNPs detected within contigs or regions of high sequence depth are more likely to be false positives. Therefore, setting a minimum minor allele frequency (e.g. 10%) for larger contigs may help reduce false SNP calling based on sequence errors. Additionally, multiple SNPs located close to one another (<15 bp) often represent sequence errors and prevent the design of primers and probes for SNP genotyping. A requirement of no additional SNPs in the 15-bp flanking region around a putative SNP was therefore applied.
Given the large numbers of SNPs generated that meet these minimal requirements, more stringent parameters can be applied in picking SNP sets for different applications. Average depth at putative SNP positions is greater than 100 sequences, providing high confidence in accuracy of identified SNPs within the pooled samples. Re-sequencing or limited validation of these samples by low-throughput SNP genotyping is costly and is unlikely to generate additional information. Ultimately, SNPs need to be validated by genotyping in a variety of reference mapping families and trait-selected populations using a high-density screening array. In catfish, the use of homozygous gynogenetic catfish  as controls will allow detection of false positives caused by PSVs or MSVs.
The approach to sample animals of diverse genetic backgrounds and sequence to sufficient depth for reliable SNP identification allowed the ability to detect many common SNPs across the entire genome. We have demonstrated that transcriptome analysis of pooled RNA samples from multiple individuals using Illumina sequencing technology is both technically efficient and cost-effective for generating expressed sequences. Such an approach is most effective when coupled to existing EST resources generated using traditional sequencing approaches because the reference ESTs facilitate effective assembly of the expressed short reads. The SNPs identified in this report will provide a much needed resource for genetic studies in the catfish scientific community and will contribute to the development of high density, cost-effective genotyping platforms. Validation and testing of SNPs using high-density arrays will subsequently lead to the production of a SNP array with well-spaced SNPs providing a powerful genotyping tool for the study of performance and production traits in catfish.
Sample and RNA isolation
Channel catfish of 47 individuals from five different aquaculture populations/fingerling sources (8 Marion Select, 10 Pearson, 11 Moyer, 10 Holland, 8 Noble) and blue catfish of 19 individuals from two different strains (7 Rio Grande and 12 D&B) were used for this study. Samples of 11 tissues including brain, gill, head kidney, intestine, liver, muscle, skin, spleen, stomach, heart, and trunk kidney were collected. The fish were euthanized with tricaine methanesulfonate (MS 222) at 300 mg/l before tissue collection. Tissue samples from each species were collected, pooled, immediately placed in 5 ml RNA later™ (Ambion, Austin, TX, USA) and kept at 4°C for 2-4 days until RNA extraction. Equal weight of each tissue from individuals of each species were combined, ground to a fine powder with mortar and pestle in the presence of liquid nitrogen and thoroughly mixed. A fraction of the tissue samples was used for RNA isolation. Total RNA was isolated using the RNeasy plus Mini Kit (Qiagen, Valencia, CA, USA) with DNase I (Invitrogen, USA) treatment following the manufacturer's protocol.
Sequencing was conducted commercially in HudsonAlpha Genomic Services Lab (Huntsville, AL, USA). Briefly, 100 ng of total RNA was used to prepare amplified cDNA using Ovation RNA-seq, a commercially available kit optimized for RNA sequencing (NuGEN Technologies, San Carlos, CA). The produced double-stranded cDNA was subsequently used as the input to the Illumina library preparation protocol starting with the standard end-repair step. The end-repaired DNA with a single 'A'-base overhang is ligated to the adaptors in a standard ligation reaction using T4 DNA ligase and 2 μM-4 μM final adaptor concentration, depending on the DNA yield following purification after the addition of the 'A'-base. Following ligation, the samples were purified and subjected to size selection via gel electrophoresis to isolate 350 bp fragments for ligation-mediated PCR (LM-PCR). Twelve cycles of LM-PCR were used to amplify the ligated material in preparation for cluster generation. For each species of channel catfish and blue catfish, the prepared cDNA library was sequenced with 36-bp paired-end reads on one flow cell lane of the Illumina Genome Analyzer II platform and 100-bp paired-end reads on one flow cell lane of the Hiseq 2000 platform, respectively. The image analysis, base calling and quality score calibration were processed using the Illumina Pipeline Software v1.4.1 according to the manufacturer's instructions. Reads were exported in the FASTQ format and has been deposited at the NCBI Sequence Read Archive (SRA) under accession number SRA025099.
Assembly of expressed short reads
Gene identification and annotation
Unique consensus sequences from the all catfish assembly were compared against the Uniprot database and the zebrafish Refseq protein database (NCBI) using BLASTX (cutoff E-value of 1E-10) to obtain the putative gene identity. To estimate the proportion of annotated contigs that matched to unique genes in the known protein database, all BLASTX hits were filtered for redundancy in protein accessions. Assignment of Gene Ontology terms to annotated unique sequences was conducted using the program Blast2GO . Ontology was categorized with respect to Biological Process, Molecular Function, and Cellular Component.
SNP and microsatellite markers identification
Assembled contigs were scanned for SNPs utilizing SNP detection software included in CLC Genomics Workbench (CLC bio, Aarhus, Denmark). The central base quality score of ≥25 and average surrounding base quality score of ≥20 were set to assess the quality of reads at positions for SNP detection. Under the criteria of minimum coverage (read depth) of four and the minimum variant frequency of two, the variations compared to the reference sequence were counted as SNPs. Three lists of SNPs were generated from channel catfish, blue catfish and all catfish assembly, respectively. The identification of intra-specific SNPs for both channel and blue catfish, and inter-specific SNP between channel and blue catfish was achieved by comparing these three lists of SNPs. Inter-specific SNPs were defined as those that have sequence variations between channel catfish and blue catfish, but no sequence variations within channel catfish or within blue catfish; similarly, intra-specific SNPs were identified within channel catfish or within blue catfish; and intra-specific SNPs for both channel catfish and blue catfish were identified within both channel catfish and blue catfish at the same SNP position.
All the unique sequences were used to search for microsatellite makers using Msatfinder  with a repeat threshold of eight di-nucleotide repeats or five tri-, tetra-, penta-, or hexa- nucleotide repeats. The presence of at least 50-bp sequence on both sides of the microsatellite repeats were considered sufficient for primer design [32, 33].
Quality SNP screening
In order to identify quality SNPs, putative SNPs identified as mentioned above were further screened following specific criteria based on the read depth, minor allele frequency, the quality of flanking regions and absence of other SNPs within 15-bp flanking regions: only those SNPs with minor allele sequences representing no less than 10% of the reads aligned at the polymorphic loci were declared as quality SNPs; no extra SNPs or indels within 15-bp flanking regions were allowed; SNPs located in repetitive regions were also not considered. Potential repetitive elements were detected by RepeatMasker , SNPs located in repetitive regions were checked and ruled out using custom scripts. For practical application in SNP genotyping assays, only bi-allelic SNPs were considered in this study. To get a snapshot of the SNP distribution across the catfish genome, SNP- containing contigs with BLAST hits to the Ensembl zebrafish transcripts database were plotted along the zebrafish chromosomes.
This project was supported by a grant from USDA AFRI Animal Genome Basic Genome Reagents and Tools Program (USDA/NRICGP award# 2009-35205-05101). We wish to thank Greg Whitis, Alabama Fish Farming Center (Greensboro, AL) for help in collecting representative fish from commercial populations. We are grateful for all help of collecting samples from Donghong Niu, Tingting Feng, Hao Zhang, Jiaren Zhang, Yu Zhang, Chao Li, Ruijia Wang, Parichart Ninwichian and Lilian Wong in our lab. Thanks also go to Dr. Shawn Levy and team at HudsonAlpha Institute for sequencing efforts. Many thanks are given to Dr. Zhiliang Hu and Dr. Thomas Parchman for the helpful discussion and assistance during the data analysis. We would also like to thank the anonymous reviewers for their insightful comments. SL is supported by a scholarship from the China Scholarship Council (CSC) for studying abroad.
- Morin PA, Luikart G, Wayne RK, Grp SW: SNPs in ecology, evolution and conservation. Trends Ecol Evol. 2004, 19 (4): 208-216. 10.1016/j.tree.2004.01.009.View ArticleGoogle Scholar
- Abasht B, Lamont SJ: Genome-wide association analysis reveals cryptic alleles as an important factor in heterosis for fatness in chicken F-2 population. Animal Genetics. 2007, 38 (5): 491-498. 10.1111/j.1365-2052.2007.01642.x.PubMedView ArticleGoogle Scholar
- Duijvesteijn N, Knol EF, Merks JWM, Crooijmans RPMA, Groenen MAM, Bovenhuis H, Harlizius B: A genome-wide association study on androstenone levels in pigs reveals a cluster of candidate genes on chromosome 6. BMC Genetics. 2010, 11: 42-10.1186/1471-2350-11-42.PubMed CentralPubMedView ArticleGoogle Scholar
- Du ZQ, Zhao X, Vukasinovic N, Rodriguez F, Clutter AC, Rothschild MF: Association and Haplotype Analyses of Positional Candidate Genes in Five Genomic Regions Linked to Scrotal Hernia in Commercial Pig Lines. PLoS One. 2009, 4 (3): e4837-10.1371/journal.pone.0004837.PubMed CentralPubMedView ArticleGoogle Scholar
- Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157 (4): 1819-1829.PubMed CentralPubMedGoogle Scholar
- Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, Schnabel RD, Taylor JF, Raadsma HW: Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genomics. 2008, 9: 187-10.1186/1471-2164-9-187.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim ES, Berger PJ, Kirkpatrick BW: Genome-wide scan for bovine twinning rate QTL using linkage disequilibrium. Animal Genetics. 2009, 40 (3): 300-307. 10.1111/j.1365-2052.2008.01832.x.PubMedView ArticleGoogle Scholar
- Brooks SA, Gabreski N, Miller D, Brisbin A, Brown HE, Streeter C, Mezey J, Cook D, Antczak DF: Whole-Genome SNP Association in the Horse: Identification of a Deletion in Myosin Va Responsible for Lavender Foal Syndrome. Plos Genet. 2010, 6 (4): e1000909-10.1371/journal.pgen.1000909.PubMed CentralPubMedView ArticleGoogle Scholar
- Kijas JW, Townley D, Dalrymple BP, Heaton MP, Maddox JF, McGrath A, Wilson P, Ingersoll RG, McCulloch R, McWilliam S, et al: A genome wide survey of SNP variation reveals the genetic structure of sheep breeds. PLoS One. 2009, 4 (3): e4668-10.1371/journal.pone.0004668.PubMed CentralPubMedView ArticleGoogle Scholar
- Becker D, Tetens J, Brunner A, Burstel D, Ganter M, Kijas J, Drogemuller C, Consortium ISG: Microphthalmia in Texel Sheep Is Associated with a Missense Mutation in the Paired-Like Homeodomain 3 (PITX3) Gene. PLoS One. 2010, 5 (1): e8689-10.1371/journal.pone.0008689.PubMed CentralPubMedView ArticleGoogle Scholar
- Eck SH, Benet-Pages A, Flisikowski K, Meitinger T, Fries R, Strom TM: Whole genome sequencing of a single Bos taurus animal for single nucleotide polymorphism discovery. Genome Biology. 2009, 10 (8): R82-10.1186/gb-2009-10-8-r82.PubMed CentralPubMedView ArticleGoogle Scholar
- Moen T, Hayes B, Baranski M, Berg PR, Kjoglum S, Koop BF, Davidson WS, Omholt SW, Lien S: A linkage map of the Atlantic salmon (Salmo salar) based on EST-derived SNP markers. BMC Genomics. 2008, 9: 223-10.1186/1471-2164-9-223.PubMed CentralPubMedView ArticleGoogle Scholar
- Hubert S, Higgins B, Borza T, Bowman S: Development of a SNP resource and a genetic linkage map for Atlantic cod (Gadus morhua). BMC Genomics. 11: 191-10.1186/1471-2164-11-191.
- He C, Chen L, Simmons M, Li P, Kim S, Liu ZJ: Putative SNP discovery in interspecific hybrids of catfish by comparative EST analysis. Anim Genet. 2003, 34 (6): 445-448. 10.1046/j.0268-9146.2003.01054.x.PubMedView ArticleGoogle Scholar
- Wang S, Sha Z, Sonstegard TS, Liu H, Xu P, Somridhivej B, Peatman E, Kucuktas H, Liu Z: Quality assessment parameters for EST-derived SNPs from catfish. BMC Genomics. 2008, 9: 450-10.1186/1471-2164-9-450.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang S, Peatman E, Abernathy J, Waldbieser G, Lindquist E, Richardson P, Lucas S, Wang M, Li P, Thimmapuram J, et al: Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies. Genome Biol. 2010, 11 (1): R8-10.1186/gb-2010-11-1-r8.PubMed CentralPubMedView ArticleGoogle Scholar
- Mardis ER: The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24 (3): 133-141.PubMedView ArticleGoogle Scholar
- Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9: 312-10.1186/1471-2164-9-312.PubMed CentralPubMedView ArticleGoogle Scholar
- Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 11: 180-10.1186/1471-2164-11-180.
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17 (7): 1636-1647. 10.1111/j.1365-294X.2008.03666.x.PubMedView ArticleGoogle Scholar
- Hale MC, McCormick CR, Jackson JR, DeWoody JA: Next-generation pyrosequencing of gonad transcriptomes in the polyploid lake sturgeon (Acipenser fulvescens): the relative merits of normalization and rarefaction in gene discovery. BMC Genomics. 2009, 10: 203-10.1186/1471-2164-10-203.PubMed CentralPubMedView ArticleGoogle Scholar
- Meyer E, Aglyamova GV, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV: Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 2009, 10: 219-10.1186/1471-2164-10-219.PubMed CentralPubMedView ArticleGoogle Scholar
- Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 20 (10): 1432-1440. 10.1101/gr.103846.109.
- Cirulli ET, Singh A, Shianna KV, Ge DL, Smith JP, Maia JM, Heinzen EL, Goedert JJ, Goldstein DB, Immunology CHAV: Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome Biology. 2010, 11 (5): R57-PubMed CentralPubMedView ArticleGoogle Scholar
- Trick M, Long Y, Meng JL, Bancroft I: Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J. 2009, 7 (4): 334-346. 10.1111/j.1467-7652.2008.00396.x.PubMedView ArticleGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Lomax J: Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005, 6 (3): 298-304. 10.1093/bib/6.3.298.PubMedView ArticleGoogle Scholar
- Gut IG, Lathrop GM: Duplicating SNPs. Nature Genetics. 2004, 36 (8): 789-790. 10.1038/ng0804-789.PubMedView ArticleGoogle Scholar
- Waldbieser GC, Bosworth BG, Quiniou SM: Production of viable homozygous, doubled haploid channel catfish (Ictalurus punctatus). Mar Biotechnol (NY). 12 (4): 380-385. 10.1007/s10126-009-9221-2.
- Zhang K, Sun F: Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet. 2005, 6: 51-10.1186/1471-2156-6-51.PubMed CentralPubMedView ArticleGoogle Scholar
- Thurston MI, Field D: Msatfinder: detection and characterisation of microsatellites. [http://www.genomics.ceh.ac.uk/msatfinder/]
- Somridhivej B, Wang SL, Sha ZX, Liu H, Quilang J, Xu P, Li P, Hue ZL, Liu ZJ: Characterization, polymorphism assessment, and database construction for microsatellites from BAC end sequences of channel catfish (Ictalurus punctatus): A resource for integration of linkage and physical maps. Aquaculture. 2008, 275 (1-4): 76-80. 10.1016/j.aquaculture.2008.01.013.View ArticleGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the www for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
- Smit AHR, Green P: RepeatMasker Open-3.2.2. [http://www.repeatmasker.org]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.