Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences
- Peng Xu†1Email author,
- Jiongtang Li†1,
- Yan Li1, 2,
- Runzi Cui1, 3,
- Jintu Wang1, 2,
- Jian Wang1,
- Yan Zhang1,
- Zixia Zhao1 and
- Xiaowen Sun1Email author
© Xu et al; licensee BioMed Central Ltd. 2011
Received: 30 December 2010
Accepted: 14 April 2011
Published: 14 April 2011
Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding.
To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp.
BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of the zebrafish genome. BES of common carp are tremendous tools for comparative mapping between the two closely related species, zebrafish and common carp, which should facilitate both structural and functional genome analysis in common carp.
Cyprininae carps are the most important cultured species, accounting for over 30% aquaculture production in the world. Common carp (Cyprinus carpio) is currently one of the top three cultured carps in China. Because of its importance, genetic studies have been conducted in the last several decades for cellular and molecular components of the carp genome. The common carp genome is composed of 100 chromosomes. It has been believed to be a tetroploid species with a physical size of approximately 1700 Mbp (2n).
Teleosts are widely believed to have gone through an additional round of whole genome duplication, i.e., the 3R hypothesis, as compared to mammals. Common carp is believed to have had another round of genome duplication (4R) and became a evolutionarily recent tetraploid fish . As such, it is widely used as model species for evolutionary studies such as fish specific genome duplication, gene loss after whole genome duplication, and functional partitioning of duplicated genes [2–4]. Much research efforts have been made for the understanding of the carp genome including development of polymorphic markers [5–7], linkage mapping [8, 9], and quantitative trait loci (QTL) analysis [10, 11]. However, such research has been limited by the lack of large-scale genomic resources.
Analysis of BES has proven to be an effective approach for development of markers that are not only useful for linkage mapping, but for the integration of genetic linkage and physical maps [12, 13]. In teleost fish, a large set of BES data had been developed in several economically important speices, including catfish [13, 14], rainbow trout [15, 16], Atlantic salmon , tilapia  and European sea bass . In order to provide initial insight into the carp genome and generate a large number of polymorphic markers for genetic and genomic analysis, and also to assess the repeat structure of the carp genome to provide information for whole genome sequencing and provide paired reads of large genomic clones for the whole genome assembly [13, 19–23], here we report the generation and analysis of 80,000 BAC end sequences (BES).
Result and Discussion
Generation of BAC-end sequences
Sequence statistics of the BES of common carp
BAC end sequencing reads
BES after trimming b
Redundant BES c
BES after filtering redudance
Average read length (bp)
BES mate-pairs d
Total bases sequenced
Vertebrates Repeat masked bases
Assessment of the repetitive elements in the carp genome
The proportion of the repetitive elements in the common carp genome was assessed by using RepeatMasker with Vertebrates Repeat Database. Repeatmasking of the 42,522,168 bp of the carp BES sequences resulted in the detection of 7,357,899 (17.3%) base pairs of repeated sequences. The classification and respective proportion of the identified repetitive elements are shown in Additional File 1. The most abundant type of repetitive element in the common carp genome was DNA transposons (6.67%), mostly hobo-Activator (2.25%), followed by retroelements (4.52%) including LINEs (2.33%), LTR elements (1.98%), and SINEs (0.2%). Various satellite sequences, low complexity and simple sequence repeats accounted for 2.46%, 1.98% and 1.64% of the base pairs, respectively. The repeats divergence rate of DNA transposons (percentage of substitutions in the matching region compared with consensus repeats in constructed libraries) showed a nearly normal distribution with a peak at 24%. A fraction of LTR retrotransposons, LINEs and SINEs had nearly the same divergence rates as DNA transposons (peaks at 30%, 28% and 22%, respectively), indicating relatively old origin (Additional File 2). Additional 518 BES that had not been masked by RepeatMasker were identified as homologs of proteins encoded by diverse families of transposable elements using transposonPSI.
To identify novel repetitive elements in the common carp genome, repeat libraries were constructed using multiple de novo methods and then combined into a non-redundant repeat library containing 1,940 sequences. The repeat library was then used for repeat annotation of the common carp BES. Additional total of 4,499,836 bp were identified, representing approximately 10.6% of the BES, as de novo repeats.
Identification of microsatellites from BES
Identification of protein-coding sequences and functional annotation
After repeat and transposon ORFs masking, 65,202 BES had greater than 50 bp of contiguous non-repetitive sequences. Protein-coding sequences were identified by homology searches with BLASTX against non-redundant protein database. A total of 9,443 BES had significant hits at the e-value cutoff of e-5 with 7,127 distinct gene hits. As expected, the vast majority, 5,146 (72.2%) of the best hits were zebrafish genes, indicating high levels of sequence similarity between the zebrafish and carp genomes.
Anchoring of carp BES to the Zebrafish Genome
Zebrafish is the most closely related species to common carp among teleost fishes with a draft whole genome sequence. They both belong to the same family of Cyprinidae. A large set of BES from common carp generated from this study allowed the possibility to conduct initial comparative genome analysis between zebrafish and common carp. In order to map common carp BES to zebrafish chromosomes, BLASTN searches of the common carp BES against zebrafish zv8 assembly were conducted, which resulted in significant hits (e-5 cutoff) by 39,335 query BES, of which 16,267 had unique hits to the zebrafish genome. The ratio of unique hits was much lower than that in cattle-human comparative analysis , which indicate that many BES of common carp have more than one homolog in zebrafish genome, implying the genome duplication status of Cyprinidae fish.
Using annotated protein-coding gene regions in the zebrafish genome, we found that carp BES located in exon regions of 5,857 zebrafish protein-coding genes, which are much more than the number of 5,146 zebrafish genes identified from NR database with BLASTX method as we reported above. Mostly likely, some of BES might be homolog to the UTR regions of zebrafish genes which could not identify zebrafish coding regions from protein NR database.
Summary of BES mapping
Number of BAC clones
Number of BES
Long BES pairsb
Estimated coverage of chromosomes by the common carp BACs. Zebrafish genome assembly 8 (zv8) were used for the calculation.
Number of clones*
Five categories of conserved microsyntenies
Number of BAC clones
Type1 (protein-coding vs protein-coding)
Type2 (protein-coding vs non-coding)
Type3 (non-coding vs non-coding)
Type4 (protein-coding vs intergenic)
Type5 (non-coding vs intergenic)
BAC end sequences were important resource for many genomic studies, especially for the whole genome sequencing and assembly of a large and complex genome. To better understanding of common carp genome, the large scale BAC end sequencing had been conducted on over 40,000 BAC clones. The first survey of common carp genome and the first genome wide comparative analysis of common carp and zebrafish genomes had been accomplished.
The information of repetitive elements in the carp genome is eager to know for upcoming whole genome sequencing and genome assembly. Multiple bioinformatic approaches had been employed and the known repetitive DNA similar to vertebrates was estimated to be approximate 17.3% of common carp genome, which is lower than another tetraploid teleost fish Atlantic salmon (30-35%) , but higher than catfish .
A total of 7,127 distinct homolog genes had been identified from surveyed BES of common carp. The vast majority were zebrafish genes, suggesting the high similarity of the zebrafish and carp genomes. Further comparative analysis mapped around 40,000 BES to zebrafish genome. With mate-paired BES, over 3100 microsyntenies had been constructed between common carp and zebrafish genome, covering over 50% of the zebrafish genome. As parts of "Common Carp Genome Project", both fingerprint-based physical map and high-density linkage map of common carp genome are ongoing and the completion is expected in 2011. Once the two maps are available, these BES and microsyntenies will be valuable resource to construct the genome scale zebrafish-common carp fine comparative map for the whole genome assembly and important traits localization of common carp.
The common carp BAC library, constructed with genomic DNA from a female individual, containing 92,160 BAC clones with an average insert size of 141 kb, was used for generating BAC-end sequences .
BAC Culture and End Sequencing
BAC clones were inoculated into deep 96-well culturing blocks containing 1.2 ml 2 × YT medium and 12.5 μg/ml chloramphenicol from 384-well stocking plates using 96-pin replicator (V&P Scientific, Inc., San Diego, CA). The culture blocks were sealed with an air permeable seal (Excel Scientific, Wrightwood, CA) and shaked at 37°C for 20 hours with the speed of 300 rpm. The bacteria were then collected by centrifugation at 2000 g for 10 min in a Beckman Avanti J-26 XP centrifuge. After carefully removing all liquid from the culture blocks, bacterial pellets were used for BAC DNA extraction by using an alkaline lysis protocol  with modification on lysate clarification. The fritted filter plates (NUNC, Roskilde, Denmark) were used for lysate filtration, which significantly increased the BAC DNA quality for BAC end sequencing. BAC DNA was precipitated with isopropanol and washed with 70% ethanol twice. BAC DNA was then eluted into 40 μl milliQ water and collected in 96 plates and stored in -20°C before use.
Sanger sequencing reactions were conducted in 96-well semi-skirt plates using the following ingredients: 2 μl 5X Sequencing Buffer, 2 μl sequencing primer (3 pmol/μl), 1 μl BigDye v3.1 Dye Terminator(Life Technology, Foster City, CA), and 5 μl BAC DNA. The sequencing reactions were conducted in ABI 9700 Thermal Cyclers (Life Technology) under the following conditions: initial 95°C for 5 min; then 99 cycles of 95°C for 30 sec, 55°C for 10 sec, 60°C for 4 min. The T7 and PIBRP primers were used for sequencing reactions (T7 primer: TAATACGACTCACTATAGGG; PIBRP primer: CTCGTATGTTGTGTGGAATTGTGAGC). The sequencing reactions were then precipitated with pre-chilled 100% ethanol and cleaned up with 70% ethanol. The samples were then analyzed with ABI 3730 XL (Life Technology).
Clone Tracking and Quality Control
In order to avoid any orientation mistake, eight clones were re-sequenced from each 384-plate from positions A1, A2, B1, B2, C1, C2, D1, and D2. The quality control sequences were then searched against all collected BAC end sequences with BLAST program. The re-sequencing data hit the BES with a same well position will assure the correct plate orientation.
The software Phred [30, 31] was used for the BAC end sequences base calling. Quality score of Q20 was used as a cutoff in base calling. Seqclean  in DFCI Gene Indices Software Tools was used for vector trimming against UniVec database  with default parameter values. The trimmed BES were searched against themselves with BLASTN and BES that have >95% identity with other BES and have full-length covered in the alignment were filtered out in the following analysis.
To detect known repeats in carp BES, we screened and masked BES using Repeatmasker software  againt Vertebrates Repeat library with default parameter values. Next, BES homology to proteins encoded by diverse families of transposable elements were searched using TransposonPSI , a program that performs tBLASTn searches using a set of position specific scoring matrices (PSSMs) specific for different transposon element families.
Two de novo software packages, PILER-DF  and RepeatScout , were used to search for de novo repeat sequences within carp BES and built two repeat libraries, respectively. The repeat sequences in one library were compared with those sequences in the other one using BLASTN. The shorter sequences were filtered when two repeats aligned with identity ≥ 95% and coverage ≥ 95% of full length. A non-redundant de novo repeat library of common carp was then constructed with those distinct repeat sequences. The BES that were neither masked with known vertebrates repeat library nor similar to TE, were then searched against the de novo repeat library with RepeatMasker.
Identification of Microsatellites
Microsatellites were identified in non-redundant BES by using the perl script Msatfinder which was specifically designed to identify and characterize microsatellites. Only the microsatellites of 2-6 nucleotide motifs with at least 5 repeat units were collected.
BLASTX searches of the repeat-masked BES were conducted against the Non-Redundant Protein database. A cut off e-value of e-5 was used as the significance similarity threshold for the comparison. The top BLASTX result of each BES query was collected.
To compare the similarity of common carp and zebrafish genomes and anchor common carp BACs to zebrafish genome, we assumed that the zebrafish genome assembly is correct and carp BES that were masked with repeats and transposons, were searched against zebrafish genome assembly 8 (zv8) by using the program BLASTN with e-value cutoff 10-5. The top hit of each BES were further analyzed.
The conserved microsyntenies were defined as the alignment regions where carp BAC clones had ends ≤ 300 kb apart on the same chromosome and with the same orientation. Conserved microsyntenies were then divided into five categories based on transcriptional signals in zebrafish homolog genome regions to carp BES. Zebrafish Refseq genes as transcriptional signals were downloaded from UCSC database  and divided into protein-coding genes and non-coding genes from their annotation.
This study was supported by the grants from National Department Public Benefit Research Foundation (No. 200903045), China Ministry of Science and Technology 863 Hi-Tech Research and Development Program (No. 2009AA10Z105), China Ministry of Agriculture "948" Program (No. 2010-Z11) and Research Foundation of Chinese Academy of Fishery Sciences (No. 2009B002 and No. 2011C016).
- Danzmann R, Davidson E, Ferguson M, Gharbi K, Koop B, Hoyheim B, Lien S, Lubieniecki K, Moghadam H, Park J, et al: Distribution of ancestral proto-Actinopterygian chromosome arms within the genomes of 4R-derivative salmonid fishes (Rainbow trout and Atlantic salmon). BMC Genomics. 2008, 9 (1): 557-10.1186/1471-2164-9-557.PubMed CentralPubMedView Article
- Santini F, Harmon LJ, Carnevale G, Alfaro ME: Did genome duplication drive the origin of teleosts? A comparative study of diversification in ray-finned fishes. BMC Evol Biol. 2009, 9: 194-10.1186/1471-2148-9-194.PubMed CentralPubMedView Article
- Meyer A, Van de Peer Y: From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays. 2005, 27 (9): 937-945. 10.1002/bies.20293.PubMedView Article
- David L, Blum S, Feldman MW, Lavi U, Hillel J: Recent duplication of the common carp (Cyprinus carpio L.) genome as revealed by analyses of microsatellite loci. Mol Biol Evol. 2003, 20 (9): 1425-1434. 10.1093/molbev/msg173.PubMedView Article
- Zhang Y, Liang L, Jiang P, Li D, Lu C, Sun X: Genome evolution trend of common carp (Cyprinus carpio L.) as revealed by the analysis of microsatellite loci in a gynogentic family. J Genet Genomics. 2008, 35 (2): 97-103. 10.1016/S1673-8527(08)60015-6.PubMedView Article
- Zhou J, Wu Q, Wang Z, Ye Y: Genetic variation analysis within and among six varieties of common carp (Cyprinus carpio L.) in China using microsatellite markers. Genetika. 2004, 40 (10): 1389-1393.PubMed
- Li D, Kang D, Yin Q, Sun X, Liang L: Microsatellite DNA marker analysis of genetic diversity in wild common carp (Cyprinus carpio L.) populations. J Genet Genomics. 2007, 34 (11): 984-993. 10.1016/S1673-8527(07)60111-8.PubMedView Article
- Sun X, Liang L: A genetic linkage map of common carp (Cyprinus carpio L.) And mapping of a locus associated with cold tolerance. Aquaculture. 2004, 238 (1-4): 8-10.1016/S0044-8486(03)00445-9.View Article
- Cheng L, Liu L, Yu X, Wang D, Tong J: A linkage map of common carp (Cyprinus carpio) based on AFLP and microsatellite markers. Anim Genet. 2010, 41 (2): 191-198. 10.1111/j.1365-2052.2009.01985.x.PubMedView Article
- Mao RX, Liu FJ, Zhang XF, Zhang Y, Cao DC, Lu CY, Liang LQ, Sun XW: [Studies on quantitative trait loci related to activity of lactate dehydrogenase in common carp (Cyprinus carpio)]. Yi Chuan. 2009, 31 (4): 407-411.PubMedView Article
- Zhang Y, Liang LQ, Chang YM, Hou N, Lu CY, Sun XW: [Mapping and genetic effect analysis of quantitative trait loci related to body size in common carp (Cyprinus carpio L.)]. Yi Chuan. 2007, 29 (10): 1243-1248.PubMedView Article
- Xu P, Wang S, Liu L, Thorsen J, Kucuktas H, Liu Z: A BAC-based physical map of the channel catfish genome. Genomics. 2007, 90 (3): 380-388. 10.1016/j.ygeno.2007.05.008.PubMedView Article
- Liu H, Jiang Y, Wang S, Ninwichian P, Somridhivej B, Xu P, Abernathy J, Kucuktas H, Liu Z: Comparative analysis of catfish BAC end sequences with the zebrafish genome. BMC Genomics. 2009, 10: 592-10.1186/1471-2164-10-592.PubMed CentralPubMedView Article
- Xu P, Wang S, Liu L, Peatman E, Somridhivej B, Thimmapuram J, Gong G, Liu Z: Channel catfish BAC-end sequences for marker development and assessment of syntenic conservation with other fish species. Animal Genetics. 2006, 37 (4): 321-326. 10.1111/j.1365-2052.2006.01453.x.PubMedView Article
- Palti Y, Luo MC, Hu Y, Genet C, You FM, Vallejo RL, Thorgaard GH, Wheeler PA, Rexroad CE: A first generation BAC-based physical map of the rainbow trout genome. BMC Genomics. 2009, 10: 462-10.1186/1471-2164-10-462.PubMed CentralPubMedView Article
- Genet C, Dehais P, Palti Y, Gavory F, Wincker P: Generation of BAC-end sequences for rainbow trout genome analysis. Plant and Animal Genome Conference. 2009, P580-
- Lorenz S, Brenna-Hansen S, Moen T, Roseth A, Davidson WS, Omholt SW, Lien S: BAC-based upgrading and physical integration of a genetic SNP map in Atlantic salmon. Anim Genet. 2010, 41 (1): 48-54. 10.1111/j.1365-2052.2009.01963.x.PubMedView Article
- Shirak A, Grabherr M, Di Palma F, Lindblad-Toh K, Hulata G, Ron M, Kocher TD, Seroussi E: Identification of repetitive elements in the genome of Oreochromis niloticus: tilapia repeat masker. Mar Biotechnol (NY). 2010, 12 (2): 121-125. 10.1007/s10126-009-9236-8.View Article
- Kuhl H, Beck A, Wozniak G, Canario AV, Volckaert FA, Reinhardt R: The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing. BMC Genomics. 2010, 11: 68-10.1186/1471-2164-11-68.PubMed CentralPubMedView Article
- Chapus C, Edwards SV: Genome evolution in Reptilia: in silico chicken mapping of 12,000 BAC-end sequences from two reptiles and a basal bird. BMC Genomics. 2009, 10 (Suppl 2): S8-10.1186/1471-2164-10-S2-S8.PubMed CentralPubMedView Article
- Terol J, Naranjo MA, Ollitrault P, Talon M: Development of genomic resources for Citrus clementina: characterization of three deep-coverage BAC libraries and analysis of 46,000 BAC end sequences. BMC Genomics. 2008, 9: 423-10.1186/1471-2164-9-423.PubMed CentralPubMedView Article
- Saini N, Shultz J, Lightfoot DA: Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly. BMC Genomics. 2008, 9: 323-10.1186/1471-2164-9-323.PubMed CentralPubMedView Article
- Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Ann Blomberg L, Bouffard P, Burt DW, Crasta O, Crooijmans RPMA, et al: Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis. PLoS Biol. 2010, 8 (9): e1000475-10.1371/journal.pbio.1000475.PubMed CentralPubMedView Article
- RepeatMasker. [http://www.repeatmasker.org]
- TransposonPSI. [http://transposonpsi.sourceforge.net/]
- Larkin DM, Everts-van der Wind A, Rebeiz M, Schweitzer PA, Bachman S, Green C, Wright CL, Campos EJ, Benson LD, Edwards J, et al: A cattle-human comparative map built with cattle BAC-ends and human genome sequence. Genome Res. 2003, 13 (8): 1966-1972.PubMed CentralPubMed
- Davidson WS, Koop BF, Jones SJ, Iturra P, Vidal R, Maass A, Jonassen I, Lien S, Omholt SW: Sequencing the genome of the Atlantic salmon (Salmo salar). Genome Biol. 2010, 11 (9): 403-PubMed CentralPubMed
- Li Y, Xu P, Zhao Z, Wang J, Zhang Y, Sun X: Construction and Characterization of the BAC Library for Common Carp Cyprinus Carpio L. and Establishment of Microsynteny with Zebrafish Danio Rerio. Marine Biotechnology. 2010,
- Sambrook J, Russell DW: Molecular cloning: a laboratory manual. 2001, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 3
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8 (3): 175-185.PubMedView Article
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.PubMedView Article
- Seqclean. [http://www.tigr.org/tdb/tgi/software]
- Univec. [http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html]
- Edgar RC, Myers EW: PILER: identification and classification of genomic repeats. Bioinformatics. 2005, 21 (Suppl 1): i152-158. 10.1093/bioinformatics/bti1003.PubMedView Article
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-358. 10.1093/bioinformatics/bti1018.PubMedView Article
- Msatfinder. [http://www.genomics.ceh.ac.uk/msatfinder]
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A: The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011, D876-82. 10.1093/nar/gkq963. 39 Database
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.