A pilot study for channel catfish whole genome sequencing and de novo assembly
© Jiang et al; licensee BioMed Central Ltd. 2011
Received: 30 June 2011
Accepted: 22 December 2011
Published: 22 December 2011
Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for de novo sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs.
A combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp.
A hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly.
Channel catfish, Ictalurus punctatus, is the major aquaculture species in the United States, accounting for over 60% of all U.S. aquaculture production. Channel catfish is regarded as one of the best characterized species serving as a model for teleost immune studies , and an important model species for study of toxicology and reproductive physiology . The channel catfish genome is estimated to be 1 Gb in size http://www.genomesize.com and is highly AT-rich, with 60.7% A+T . The catfish genome contains one main type of tandem repeats named as Xba elements  and several types of dispersed repetitive elements with the mariner/Tc1 DNA transposons as the leading type of dispersed repetitive elements (4-5% of the genome), followed by retrotransposons (3-4% of the genome), Mermaid, Merman, and other SINE elements (~1.5% of the genome), LINES (~1.5% of the genome) and various types of short sequence repeats such as microsatellites (~3% of the genome) [3, 5–8].
At present, a number of genomic tools and resources have been developed in catfish, including bacterial artificial chromosome (BAC) libraries [9, 10], BAC-based physical maps [11, 12], genetic linkage maps [13–15], a large number of ESTs [2, 16], over 1700 unique full length cDNAs , over 60,000 BAC end sequences [3, 7], and a large number of identified molecular markers such as microsatellites and single nucleotide polymorphisms [2, 18]. Whole genome sequencing of catfish is underway, and this project was conducted as a pilot study to define the parameters important for the generation of the whole genome sequence assembly.
A major limitation of eukaryotic genome sequencing is the costs involved in sequencing. In recent years, however, advances in sequencing technologies have allowed drastic reduction in sequencing costs. Among many sequencing platforms, the second generation of sequencing technologies such as 454 sequencing, Illumina sequencing, and SOLiD sequencing are the most commonly used sequencing platforms. A common feature of these sequencers is their relatively short sequencing reads, making subsequent sequence assembly a great challenge. Such challenges become even more significant when dealing with large and complex eukaryotic genomes. Teleost genome, known to have gone through a third round of whole genome duplication , poses additional challenge when coupled with the short sequencing reads. In consideration of such complexities, Quinn et al.  conducted a pilot study with eight pooled BAC clones covering approximately 1 Mb of the Atlantic salmon genome with 454 GS FLX pyrosequencing, and concluded that it was difficult to achieve good levels of genome sequence assembly with 454 sequencing with the tetraploid genome. However, with the diploid European sea bass, Kuhl et al.  was able to generate large superscaffolds (13.2-17.5 Mb) with 17-39X coverage of pooled BACs using pyrosequencing. Apparently, genome complexity as well as existing genome resources can influence the outcomes of assembly of sequences generated from next generation sequencing. In this study, based on the existing catfish BAC-based physical map, 24 pooled BAC clones covering around 1 Mb catfish genome were sequenced by using both pyrosequencing and Illumina sequencing technologies. In addition with the existing BAC end sequences, we aim to take full advantages of multiple sequencing technologies and existing genetic resources for the upcoming catfish whole genome sequencing.
Several de novo assembly software packages have been developed for assembling genome sequences generated from the next-generation sequencing platforms. Two basic graph based algorithms exist for assemblers: The first one is based on overlap-layout-consensus graphs, such as Newbler  and MIRA . In these packages, the algorithm computes all pair-wise overlaps between reads to build an overlap graph. Then, the overlap graph is used to compute a layout of reads and consensus sequence of contigs. The second algorithm is based on de-Bruijn graphs, such as the algorithm used in Velvet , ABySS , and several other packages in which sequence reads are broken into smaller sequences of DNA, referred to as K-mers, where K denotes the length of these smaller sequences . The de-Bruijn graph is built based on the overlaps of length K-1 between these K-mers rather than the actual reads. The assembly algorithms and their implementations become typically complex with large volumes of sequence data.
The objectives of this pilot study was to determine the framework for proper levels of genome coverage using Illumina sequencing, 454 sequencing, and a combination of 454 and Illumina sequencing when both technical factors and economic factors were considered, to compare the assembly of sequences using different software packages with various parameters, and to develop a cost-effective de novo assembly strategy for the upcoming catfish whole genome sequencing. Here we report sequencing and assembly of twenty-four BAC clones of the largest contig of catfish physical map using 454 sequencing, Illumina sequencing, and a combination of these sequencing technologies, and compare their assembly using several de novo assembly softwares. Based on the assembly, we attempted to identify conserved syntenies among catfish and several fish species whose genome sequences are available using BLAST  sequence similarity comparisons.
Results and discussion
Generation of short sequencing reads from pooled catfish BAC clones
Sequencing statistics using 454 and Illumina sequencing platforms.
Primary sequence data
Total number of reads
Total number of reads after trimmed
Average trimmed read-length (bp)
Total accumulative length of all trimmed reads (Mb)
Comparison of de novo assemblers for 454 reads and Illumina reads
A number of sequence assemblers have been developed recently to cope with sequences generated from the next generation sequencers. To determine the most appropriate assembler and its associated K-values (as appropriate) to use for the de novo assembly of 454 reads and Illumina reads, we compared several existing assemblers: Newbler, Velvet, ABySS, MIRA and a commercial software, CLC Genomics Workbench http://www.clcbio.com. The Celera Assembler V6.1 was also considered, but our Illumina sequences generated a while ago was not long enough for the assembler. The metrics used to evaluate the assembly included N50, the average contig size, the maximum contig size and the number of contigs.
Comparison of assembly output with 454 reads and Illumina reads using different assemblers with different K-values.
Maximum contig size (bp)
Average contig size (bp)
Number of contig
CLC Genomics Workbench
CLC Genomics Workbench
Assembly strategy for the combination of 454 and Illumina reads
The hybrid assembly is attempted by combining all reads from both 454 and Illumina platform. Using the combination of two types of sequencing reads, we tested Velvet, ABySS, Newbler and CLC for hybrid assembly. Using a single assembler, ABySS with K-value 60 provided a good quality assembly, with N50 value of approximately 9.1 Kb and a maximum contig size of approximately 26 Kb. Newbler v2.6 performed better for the assembly of the same sequence data that generated a N50 value of approximately 12.5 Kb and a maximum contig size of 39 Kb. However, assembly derived from 454 reads plus Illumina reads by using a single assembler did not provide much advantage as compared with the assembly of using only 454 reads. Rather than using a single assembler for the hybrid assembly, it is crucial to develop a suitable strategy by using several assemblers for the assembly of combination of different types of sequencing reads.
Several studies have utilized two-step assembly strategy for the assembly of Illumina and 454 sequences: In an effort to sequence the whole genome of a model fungi Sordaria macrospora, Nowrousian et al.  pre-assembled 454 reads first, and then constructed an assembly by using the combined raw data from both Illumina and 454 reads using Velvet assembler; In a different study, the whole genome sequence reads from a bacteria, Geobacter sulfurreducens, Illumina reads were alone pre-assembled first and the contigs generated from this assembly plus the singletons and the 454 reads were then assembled using Newbler .
Depth of sequence coverage required for effective de novo assembly
Comparison of assembly with Illumina paired-end and single-end reads and scaffolding
Comparison of assemblies from Illumina paired-end reads (PE) and single-end reads (SE).
Number of contig
The importance of paired reads is not limited to contig assembly. They are even more important in scaffolding by bringing separate contigs together into larger scaffolds. For instance, when the paired-end reads of Illumina sequences were used, the 193 contigs were brought together into 68 scaffolds. The scaffolding capacity of paired-end reads is under estimated in this study because we used a library with small insert size of 350 bp. The scaffolding capacity of paired-end reads should be much greater if larger insert libraries were used. The BAC end sequences helped in scaffolding, but were less effective than expected because of the small number of the available mate paired BAC end sequences within this contig. Of the 75 BAC end sequences falling within this contig, only 29 were mate paired reads. Along with the paired-end reads of Illumina sequences, these BAC end sequences allowed assembly of 193 contigs into 61 scaffolds with the largest scaffold of 684,936 bp.
Comparison of assembly statistics of fully sequenced fish genomes1 and some pilot studies2.
Number of contigs
Maximum contig/scaffold3 size (bp)
Average contig size (bp)
Atlantic cod 1
Atlantic salmon 2
Repetitive content analysis and annotation of the assembled region
Summary of repetitive elements in the assembled region of catfish genome.
Low complexity repeats
% of sequences
Overall, 98.5% of all reads were assembled. That is to say, 1.5% of the generated sequences were not assembled. Apparently, many unassembled sequences were very short reads, but they are probably also repetitive in nature so that prohibit themselves to be assembled. BLAST was used to assess the nature of the unassembled sequences against the nr database with a cutoff E-value of 1e-5. Most hits were repetitive elements such as transposable elements (~18% of unassembled reads), retrotransposons reverse transcriptase-like sequences (~8%), zinc-finger protein-like (~1%), recombinase-like protein (~1%), among others. In addition, the unassembled sequences were also used for BLAST search against themselves to determine if they are highly repetitive sequences themselves. Approximate 79% of the unassembled reads can hit more than 10 other reads, 55% of which hit more than 100 other reads. Taken together, these results indicated that the unassembled reads are located in the repetitive region of the genome, resulting in gaps when conducting de novo assembly. Therefore, long reads are crucially important to go through such repetitive regions of the genome.
After gene annotation, we attempted to identify conserved syntenies in the assembled region with the genomes of four fish model species, medaka, stickleback, Tetraodon and zebrafish. The order of the catfish genes was oriented by anchoring the scaffolds to the physical map by using BAC end sequences. One syntenic block containing seven genes (designated as J-K-L-M-N-I-H in Figure 5) and a second syntenic block containing four genes (designated as O-P-Q-R in Figure 5) are highly conserved, across most of these fish species (Figure 5). Several other smaller syntenic blocks were also somewhat conserved that contained 2-4 genes (Figure 5). However, the distance spanning these genes is different. All these 18 genes were found in catfish physical map contig of approximately 1 Mb, but these genes were found to be located in a genomic region of approximately 7 Mb in stickleback, 8 Mb in Tetraodon, over 24 Mb in medaka, and 44 Mb in zebrafish. Nonetheless, gene arrangements in the catfish genome are most similar to that in zebrafish, consistent with their phylogenetic relations .
The results of this study demonstrated that for the catfish genome, the hybrid sequencing strategy using both 454 and Illumina is more effective than either alone, especially when sequencing costs were also considered, as demonstrated with the potato genome project . Initially, the effectiveness of sequence assembly was almost linearly correlated with the depth of sequence coverage, but additional sequencing after a certain level (18 × 454 and 70 X Illumina) provided no additional power for effective sequence assembly. The best assembly software for 454 reads appears to be Newbler as assembly statistics resulted in the lowest number of contigs and highest values for the contig size as well as N50 value. Using the same set of assembly criteria, both ABySS and Velvet appear to be suitable for the assembly of Illumina reads. A two-step strategy, initially using Newbler for 454 reads and Velvet for Illumina reads followed by using MIRA, seemed to provide highly effective sequence assembly. Sequencing of this genomic region allowed identification of 18 protein encoding genes. Their genomic arrangements are highly conserved among catfish, zebrafish, medaka, Tetraodon, and stickleback.
It should be noted that this work dealt with only one million base pair region of the catfish genome. Therefore, while this work provide framework for planning of whole genome sequencing of the catfish genome, extension of technical parameters from this work to whole genome sequencing requires additional work. The combination of 454 and Illumina sequencing may not be effective in dealing with fish whose genomes are polyploidy or contain complex and long repeat structures. Even for the catfish genome, sequence assembly was attenuated with repeats, and therefore, an appropriate level of long reads, e.g., those produced by PacBio sequencing [35, 39], to pass through the repeat regions may prove to be very useful.
DNA preparation and sequencing
Twenty four clones ensuring the coverage of a minimum tiling path of the contig0241 from the CHORI-212 BAC library  were selected for sequencing. The BAC DNA isolation was conducted as previously described , with modifications. Briefly, BAC clones were transferred from 384-well plates to a 96-well culture block, which contained 1.5 ml of 2X YT medium with 12.5 μg/ml chloramphenicol and grown at 37°C overnight with shaking at 300 rpm. The block was centrifuged at 2000 × g for 10 min in an Eppendorf 5804R bench top centrifuge to collect bacteria. The culture supernatant was decanted and the block was inverted and tapped gently on paper towels to remove remaining liquid. BAC DNA was isolated using the Perfectprep™ BAC 96 kit (Eppendorf North America, Westbury, NY) according to the manufacturer's specifications. An equal amount of 1 μg DNA per BAC clone was pooled, followed by purification with phenol/chloroform. This DNA was used for 454 and Illumina sequencing in the Genomic Services Lab at HudsonAlpha Institute for Biotechnology (Huntsville, AL).
All raw 454 reads and Illumina reads were trimmed of BAC vector sequences and low quality reads were filtered before assembly. CLC Genomics Workbench was used to trim raw sequences with quality score limit of 0.01 (Q20). Illumina 72-bp paired-end reads and 454 reads shorter than 15 bp were discarded. Assembly of 454 reads alone was performed by Newbler v. 2.6, Velvet v. 1.0.01, MIRA v. 3.0.0 and CLC Genomics Work bench v. 4.0.0 (CLC Bio, Cambridge, MA). Newbler with seed length from 10 to 16, and Velvet with K-value from 25 to 95 were tested. For Illumina data, assembly of Illumina reads alone was performed by Velvet, ABySSv.1.2.1 and CLC Genomics Workbench. Velvet with K-value from 21 to 61, and ABySS with K-value from 30 to 64 were tested. The hybrid assembly with both 454 and Illumina data included three major steps: First, 454 reads alone was pre-assembled by Newbler with seed length 16, minimum overlap length 40 and minimum overlap identity 95%; second, Illumina reads were pre-assembled alone using Velvet with K-value 29. Although Velvet is able to generate scaffolded contigs, this function was turned off in this step to prevent influence of scaffolded contigs on the subsequent hybrid assembly step; third, pre-assembled 454 and Illumina contigs along with singeltons were assembled using MIRA, with minimum overlap 40.
Comparison of assembly of paired-end reads and single-end reads
In order to compare the effects of both paired-end and single-end reads on de novo assembly, the Illumina paired-end sequencing data set was assembled using ABySS with a K-value 60. The same paired-end reads data set was treated as single-end reads for the comparison, by neglecting the paired-end information.
Gene prediction and comparative analysis
Repetitive elements were masked using RepeatMasker against zebrafish repeat database. Repeat masked sequences were used for gene and syntenic identification by using both gene prediction algorithms and sequence similarity searches. GENSCAN gene model prediction algorithm was used to predict introns and exons. The resulting predictions were searched against NCBI nr database by using BLASTX with an E-value cutoff of 1e-10. The identified gene sequences were used for TBLASTX search against medaka, stickleback, Tetraodon and zebrafish peptide database with an E-value cutoff of 1e-10. The chromosomal positions of the homologous genes were identified from Ensembl database.
This project was supported by Agriculture and Food Research InitiativeCompetitive Grant no. 2010-65205-20356, and Grant no. 2009-35205-05101 from the USDA National Instituteof Food and Agriculture.
- Bengten E, Clem LW, Miller NW, Warr GW, Wilson M: Channel catfish immunoglobulins: repertoire and expression. Dev Comp Immunol. 2006, 30 (1-2): 77-92. 10.1016/j.dci.2005.06.016.PubMedView Article
- Wang S, Peatman E, Abernathy J, Waldbieser G, Lindquist E, Richardson P, Lucas S, Wang M, Li P, Thimmapuram J, Liu L, Vullaganti D, Kucuktas H, Murdock C, Small BC, Wilson M, Liu H, Jiang Y, Lee Y, Chen F, Lu J, Wang W, Xu P, Somridhivej B, Baoprasertkul P, Quilang J, Sha Z, Bao B, Wang Y, Wang Q, et al: Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies. Genome Biol. 2010, 11 (1): R8-10.1186/gb-2010-11-1-r8.PubMed CentralPubMedView Article
- Xu P, Wang S, Liu L, Peatman E, Somridhivej B, Thimmapuram J, Gong G, Liu Z: Channel catfish BAC-end sequences for marker development and assessment of syntenic conservation with other fish species. Anim Genet. 2006, 37 (4): 321-326. 10.1111/j.1365-2052.2006.01453.x.PubMedView Article
- Liu Z, Li P, Dunham R: Characterization of an A/T-rich family of sequences from the channel catfish (Ictalurus punctatus). Mol Mar Biol Biotechnol. 1998, 7: 232-9.PubMed
- Kim S, Karsi A, Dunham R, Liu Z: The skeletal muscle alpha-actin gene of channel catfish (Ictalurus punctatus) and its association with piscine specific SINE elements. Gene. 2000, 252: 173-181. 10.1016/S0378-1119(00)00198-0.PubMedView Article
- Nandi S, Peatman E, Xu P, Wang S, Li P, Liu Z: Repeat structure of the catfish genome: a genomic and transcriptomic assessment of Tc1-like transposon elements in channel catfish (Ictalurus punctatus). Genetica. 2007, 131 (1): 81-90. 10.1007/s10709-006-9115-4.PubMedView Article
- Liu H, Jiang Y, Wang S, Ninwichian P, Somridhivej B, Xu P, Abernathy J, Kucuktas H, Liu Z: Comparative analysis of catfish BAC end sequences with the zebrafish genome. BMC genomics. 2009, 10: 592-10.1186/1471-2164-10-592.PubMed CentralPubMedView Article
- Liu Z: Development of genomic resources in support of sequencing, assembly, and annotation of the catfish genome. Comp Biochem Physiol, Part D, Genomics and Proteomics. 2011, 6: 11-17. 10.1016/j.cbd.2010.03.001.View Article
- Quiniou SM, Katagiri T, Miller NW, Wilson M, Wolters WR, Waldbieser GC: Construction and characterization of a BAC library from a gynogenetic channel catfish Ictalurus punctatus. Genet Sel Evol. 2003, 35 (6): 673-683. 10.1186/1297-9686-35-7-673.PubMed CentralPubMedView Article
- Wang S, Xu P, Thorsen J, Zhu B, de Jong PJ, Waldbieser G, Kucuktas H, Liu Z: Characterization of a BAC library from channel catfish Ictalurus punctatus: indications of high levels of chromosomal reshuffling among teleost genomes. Mar Biotechnol (New York, NY). 2007, 9 (6): 701-711.View Article
- Quiniou SM, Waldbieser GC, Duke MV: A first generation BAC-based physical map of the channel catfish genome. BMC genomics. 2007, 8: 40-10.1186/1471-2164-8-40.PubMed CentralPubMedView Article
- Xu P, Wang S, Liu L, Thorsen J, Kucuktas H, Liu Z: A BAC-based physical map of the channel catfish genome. Genomics. 2007, 90 (3): 380-388. 10.1016/j.ygeno.2007.05.008.PubMedView Article
- Waldbieser GC, Bosworth BG, Nonneman DJ, Wolters WR: A microsatellite-based genetic linkage map for channel catfish, Ictalurus punctatus. Genetics. 2001, 158 (2): 727-734.PubMed CentralPubMed
- Liu Z, Karsi A, Li P, Cao D, Dunham R: An AFLP-based genetic linkage map of channel catfish (Ictalurus punctatus) constructed by using an interspecific hybrid resource family. Genetics. 2003, 165 (2): 687-694.PubMed CentralPubMed
- Kucuktas H, Wang S, Li P, He C, Xu P, Sha Z, Liu H, Jiang Y, Baoprasertkul P, Somridhivej B, Wang Y, Abernathy J, Guo X, Liu L, Muir W, Liu Z: Construction of genetic linkage maps and comparative genome analysis of catfish using gene- associated markers. Genetics. 2007, 181 (4): 1649-1660.View Article
- Li P, Peatman E, Wang S, Feng J, He C, Baoprasertkul P, Xu P, Kucuktas H, Nandi S, Somridhivej B, Serapion J, Simmons M, Turan C, Liu L, Muir W, Dunham R, Brady Y, Grizzle J, Liu Z: Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs. BMC Genomics. 2007, 8: 177-10.1186/1471-2164-8-177.PubMed CentralPubMedView Article
- Chen F, Lee Y, Jiang Y, Wang S, Peatman E, Abernathy J, Liu H, Liu S, Kucuktas H, Ke C, Liu Z: Identification and characterization of full-length cDNAs in channel catfish (Ictalurus punctatus) and blue catfish (Ictalurus furcatus). PLoS One. 2010, 5 (7): e11546-10.1371/journal.pone.0011546.PubMed CentralPubMedView Article
- Liu S, Zhou Z, Lu J, Sun F, Wang S, Liu H, Jiang Y, Kucuktas H, Kaltenboeck L, Peatman E, Liu Z: Generation of genome-scale gene-associated SNPs in catfish for the construction of a high-density SNP array. BMC Genomics. 2011, 12: 53-10.1186/1471-2164-12-53.PubMed CentralPubMedView Article
- Meyer A, Van de Peer Y: From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays. 2005, 27 (9): 937-945. 10.1002/bies.20293.PubMedView Article
- Quinn NL, Levenkova N, Chow W, Bouffard P, Boroevich KA, Knight JR, Jarvie TP, Lubieniecki KP, Desany BA, Koop BF, Harkins TT, Davidson WS: Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome. BMC Genomics. 2008, 9: 404-10.1186/1471-2164-9-404.PubMed CentralPubMedView Article
- Kuhl H, Tine M, Beck A, Timmermann B, Kodira C, Reinhardt R: Directed sequencing and annotation of three Dicentrarchus labrax L. chromosomes by applying Sanger- and pyrosequencing technologies on pooled DNA of comparatively mapped BAC clones. Genomics. 2011, 98 (3): 202-212. 10.1016/j.ygeno.2011.06.004.PubMedView Article
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMed CentralPubMed
- Chevreux B, Wetter T, Suhai S: Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB). 1999, 45-56.
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18 (5): 821-829. 10.1101/gr.074492.107.PubMed CentralPubMedView Article
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19 (6): 1117-1123. 10.1101/gr.089532.108.PubMed CentralPubMedView Article
- Miller JR, Koren S, Sutton G: Assembly algorithms for next-generation sequencing data. Genomics. 2010, 95 (6): 315-327. 10.1016/j.ygeno.2010.03.001.PubMed CentralPubMedView Article
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-10.PubMedView Article
- Lu J, Peatman E, Yang Q, Wang S, Hu Z, Reecy J, Kucuktas H, Liu Z: The catfish genome database cBARBEL: an informatic platform for genome biology of ictalurid catfish. Nucleic Acids Res. 2010, D815-821. 39 Database
- Lin Y, Li J, Shen H, Zhang L, Papasian CJ, Deng HW: Comparative Studies of de novo Assembly Tools for Next-generation Sequencing Technologies. Bioinformatics. 2011, Doi: 10.1093/bioinformatics/btr319
- Nowrousian M, Stajich JE, Chu M, Engh I, Espagne E, Halliday K, Kamerewerd J, Kempken F, Knab B, Kuo HC, Osiewacz HD, Pöggeler S, Read ND, Seiler S, Smith KM, Zickler D, Kück U, Freitag M: De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis. PLoS Genet. 2010, 6 (4): e1000891-10.1371/journal.pgen.1000891.PubMed CentralPubMedView Article
- Nagarajan H, Butler JE, Klimes A, Qiu Y, Zengler K, Ward J, Young ND, Methe BA, Palsson BO, Lovley DR, Barrett C: De Novo assembly of the complete genome of an enhanced electricity-producing variant of Geobacter sulfurreducens using only short reads. PloS one. 2010, 5 (6): e10922-10.1371/journal.pone.0010922.PubMed CentralPubMedView Article
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Cheetham RK, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038/nature07517.PubMed CentralPubMedView Article
- Schatz MC, Delcher AL, Salzberg SL: Assembly of large genomes using second-generation sequencing. Genome Res. 2010, 20 (9): 1165-1173. 10.1101/gr.101360.109.PubMed CentralPubMedView Article
- Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, Zhang Z, Zhang Y, Wang W, Li J, Wei F, Li H, Jian M, Li J, Zhang Z, Nielsen R, Li D, Gu W, Yang Z, Xuan Z, Ryder O, Leung FC, Zhou Y, Cao J, Sun X, Fu Y, et al: The sequence and de novo assembly of the giant panda genome. Nature. 2009, 463 (7279): 311-317.PubMed CentralPubMedView Article
- Schadt EE, Turner S, Kasarskis A: A window into third-generation sequencing. Hum Mol Genet. 2010, 19 (R2): R227-40. 10.1093/hmg/ddq416.PubMedView Article
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268 (1): 78-94. 10.1006/jmbi.1997.0951.PubMedView Article
- Steinke D, Salzburger W, Meyer A: Novel relationships among ten fish model species revealed based on a phylogenomic analysis using ESTs. J Mol Evol. 2006, 62: 772-784. 10.1007/s00239-005-0170-8.PubMedView Article
- Potato Genome Sequencing Consortium, Xu X, Pan S, Cheng S, Zhang B, Mu D, Ni P, Zhang G, Yang S, Li R, Wang J, Orjeda G, Guzman F, Torres M, Lozano R, Ponce O, Martinez D, De la Cruz G, Chakrabarti SK, Patil VU, Skryabin KG, Kuznetsov BB, Ravin NV, Kolganova TV, Beletsky AV, Mardanov AV, Di Genova A, Bolser DM, Martin DM, Li G, et al: Genome sequence and analysis of the tuber crop potato. Nature. 2011, 475 (7355): 189-195. 10.1038/nature10158.View Article
- McCarthy A: Third generation DNA sequencing: pacific biosciences' single molecule real time technology. Chem Biol. 2010, 17 (7): 675-6. 10.1016/j.chembiol.2010.07.004.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.