A bacterial artificial chromosome library for the Australian saltwater crocodile (Crocodylus porosus) and its utilization in gene isolation and genome characterization
© Shan et al; licensee BioMed Central Ltd. 2009
Published: 14 July 2009
Crocodilians (Order Crocodylia) are an ancient vertebrate group of tremendous ecological, social, and evolutionary importance. They are the only extant reptilian members of Archosauria, a monophyletic group that also includes birds, dinosaurs, and pterosaurs. Consequently, crocodilian genomes represent a gateway through which the molecular evolution of avian lineages can be explored. To facilitate comparative genomics within Crocodylia and between crocodilians and other archosaurs, we have constructed a bacterial artificial chromosome (BAC) library for the Australian saltwater crocodile, Crocodylus porosus. This is the first BAC library for a crocodile and only the second BAC resource for a crocodilian.
The C. porosus BAC library consists of 101,760 individually archived clones stored in 384-well microtiter plates. Not I digestion of random clones indicates an average insert size of 102 kb. Based on a genome size estimate of 2778 Mb, the library affords 3.7 fold (3.7×) coverage of the C. porosus genome. To investigate the utility of the library in studying sequence distribution, probes derived from CR1a and CR1b, two crocodilian CR1-like retrotransposon subfamilies, were hybridized to C. porosus macroarrays. The results indicate that there are a minimum of 20,000 CR1a/b elements in C. porosus and that their distribution throughout the genome is decidedly non-random. To demonstrate the utility of the library in gene isolation, we probed the C. porosus macroarrays with an overgo designed from a C-mos (oocyte maturation factor) partial cDNA. A BAC containing C-mos was identified and the C-mos locus was sequenced. Nucleotide and amino acid sequence alignment of the C. porosus C-mos coding sequence with avian and reptilian C-mos orthologs reveals greater sequence similarity between C. porosus and birds (specifically chicken and zebra finch) than between C. porosus and squamates (green anole).
We have demonstrated the utility of the Crocodylus porosus BAC library as a tool in genomics research. The BAC library should expedite complete genome sequencing of C. porosus and facilitate detailed analysis of genome evolution within Crocodylia and between crocodilians and diverse amniote lineages including birds, mammals, and other non-avian reptiles.
Crocodilians (Order Crocodylia) are a group of reptiles that originated roughly 200 million years ago [1, 2]. They are apex predators in the marine and freshwater habitats in which they reside, and they play a major role in warm-water ecosystems throughout the world. There are 23 extant species grouped into three families – Crocodilidae (crocodiles), Alligatoridae (alligators and caimans), and Gavialidae (gharials) [3, 4]. As evidenced by their frequent appearances in video documentaries and television programs, crocodilians are a subject of considerable human curiosity. Moreover, these reptiles have been common subjects/characters in mythology, folk tales, art (including cave paintings and hieroglyphics), and literature suggesting that they have considerable symbolic and practical significance in the lives of humans, past and present.
For roughly 15 years, bacterial artificial chromosome (BAC) libraries have been the principal molecular substrate used in physical mapping and complete eukaryote genome sequencing . Gridding of ordered BAC libraries (i.e., libraries in which each clone is stored in its own microtiter well) onto macroarrays and multiplex screening techniques have facilitated rapid gene isolation. The utility of BAC clones as substrates for end sequencing, in conjunction with advanced DNA fingerprinting techniques and macroarray analysis, has permitted construction of robust physical maps and selection of minimum tiling paths (i.e., sets of minimally overlapping BAC clones spanning entire chromosomes or chromosomal regions) for accurate genome sequencing and assembly. Recent advances in sequencing technologies (e.g., 454 pyrosequencing, Illumina sequencing, etc.) have created powerful opportunities in which ordered BAC libraries play a critical role. A particularly promising strategy for simultaneous physical mapping and sequencing of large eukaryotic genomes involves sequencing pools of sheared, individually "bar coded" BAC clones. After sequencing, those reads sharing a bar code (i.e., corresponding to the same BAC) are grouped together and assembled in silico, and physical maps are constructed by identifying overlapping assembled or partially assembled BAC sequences .
To expedite genome research in crocodilians, we have constructed a BAC library for the Australian saltwater crocodile (Crocodylus porosus). The C. porosus library is only the second large-insert DNA library for a crocodilian – a 10× library exists for Alligator mississippiensis  – and the only BAC library for Crocodilidae, the largest of the crocodilian families. C. porosus is the largest living crocodilian and, along with A. mississippiensis, the only crocodilian species to be commercially farmed. Here we describe generation and characterization of the C. porosus BAC library and demonstrate its utility as a tool for gene isolation, genome characterization, and comparative genomics.
Preparation of nuclei agarose plugs from crocodile blood sample
Whole blood was obtained from Errol, a male C. porosus from the Darwin Crocodile Farm  near Darwin, Australia. Blood was suspended in citrate buffer (250 mM sucrose, 40 mM trisodium citrate, pH 7.6) containing 5% v/v dimethylsulfoxide, aliquoted into 1.5 ml polypropylene tubes, flash frozen in liquid nitrogen, and shipped to the Mississippi Genome Exploration Laboratory . One of the tubes was thawed on ice and centrifuged at 4,000 rpm in a microcentrifuge for 4 min. The supernatant was decanted, the pellet was gently re-suspended in 1 ml of STEX buffer (100 mM NaCl, 100 mM Tris-HCl, 100 mM EDTA, pH 8.0), and the mixture was centrifuged as described above. The blood cell pellet was re-suspended in 500 μl STEX buffer and placed in a water bath at 45°C. After 15 min, the blood suspension was mixed with an equal volume of 45°C 2% w/v Cambrex (Rockland, ME) SeaPlaque Agarose (cat. no. 50100) in STEX buffer. The mixture was poured into a small Petri dish so that the depth of the solution was roughly 2 mm. After 20 min at 4°C, the resulting gel was cut into 10 × 5 mm rectangles, and these "plugs" were transferred into a conical 50 ml polypropylene tube containing 40 ml of lysis buffer (STEX buffer containing 1% w/vN-lauroylsarcosine and 300 mg/ml proteinase K). The capped tube was incubated at 37°C overnight with gentle agitation. Plugs were transferred into 0.5 M EDTA (pH 8.0) containing 0.1 M phenylmethylsulfonyl fluoride and incubated at 4°C for one hour. Plugs were washed in 0.5 MEDTA (pH 8.0) and then stored at 4°C in this buffer.
Preparation of high-molecular-weight insert DNA
A few test DNA plugs were exposed to different Hind III concentrations to determine conditions providing the largest number of fragments between 100 to 500 kb [see ]. The optimal enzyme concentration as determined in the test digests was used in a large-scale partial digest. Plugs used in the mass digestion were macerated and placed in a slot well of a 1% w/v Cambrex SeaKem Gold Agarose (cat. no. 50150) gel in 0.25 × TBE buffer (22.5 mM Tris, 22.5 mM boric acid, 0.5 mM EDTA, pH 8.0). Size selection of partially digested DNA was performed using pulsed-field gel electrophoresis (PFGE) according to Chalhoub et al. . Size-selected Hind III fragments between 100 and 500 kb were recovered from agarose by electroelution according to Peterson et al. .
BAC library construction
BAC library construction was performed as described in Peterson et al.  using the pIndigoBAC-5 vector (Epicentre, Madison WI) and ElectroMAX DH10B T1 Phage-Resistant Competent Cells (Invitrogen, Carlsbad, CA). Clone picking and library replication were performed using a Genetix QPixII robot (New Milton, Hampshire, UK). To monitor the quality of the BAC library and determine mean insert size, 96 BAC clones from every fiftieth 384-well plate were evaluated by Not I digestion and PFGE. For these analyses BAC DNA was isolated using an AutoGenprep 960 robot (AutoGen, Holliston, MA).
High density macroarrays were prepared using a Genetix QPixII robot. Each array consisted of 18,432 double-spotted BAC clones stamped onto a 22.5 cm2 Hybond N+ filter (GE Healthcare, Piscataway, NJ). There were enough C. porosus BAC clones to produce five complete macroarrays (101,760 clones ÷ 18,432 clones/macroarray = 5.52). Stamped arrays were placed clone-side up on LB (Luria-Bertani) agar containing 12.5 mg/L chloramphenicol and incubated at 37°C overnight. Each macroarray was fixed via incubation in 0.5 N NaOH, 1.5 M NaCl for 7 min followed by incubation in 1.5 M NaCl, 0.5 M Tris Cl for 7 min. The membranes were allowed to air dry for 1 h, treated with 0.4 N NaOH for 20 min, and washed in 5× SSPE (0.75 M NaCl, 50 mM Na2HPO4, 5 mM EDTA, pH 7.4) for 7 min. Macroarrays were air dried and stored in sealed plastic bags.
Probe design and BAC library screening
Overgo and primer sequences
C-mos overgo, forward
C-mos overgo, reverse
CR1a overgo, forward
CR1a overgo, reverse
CR1b overgo, forward
CR1b overgo, reverse
C-mos forward primer1
C-mos reverse primer1
C-mos forward primer 2
C-mos reverse primer 2
Subcloning and sequencing of the C. porosus C-mos gene
A BAC clone containing the C. porosus C-mos gene was digested with Bam HI and Hind III at 37°C for 1 hr followed by heating to 65°C for 10 min. The cloning vector pCRII-TOPO (Invitrogen, Carlsbad, CA) was likewise double-digested, purified by electrophoresis on a 1% w/v agarose gel, and isolated from agarose using a Qiagen (Valencia, CA) QiaQuick Gel Extraction kit. Ligation was performed at 16°C for 16 hrs. The ligation mixture was used to transform chemically competent Invitrogen (Carlsbad, CA) One Shot TOP10 cells according to the manufacturer's instructions. Subclones were plated and 40 were screened by PCR using the C-mos primers CMF1 and CMR1 (Table 1). One positive subclone, which was shown by gel electrophoresis to contain a 3.7 kb insert, was sent to SeqWright (Houston, TX) for cycle sequencing using the CMF1 and CMR1 primers and two additional primers (CMF2 and CMR2 – Table 1). Use of a combination of primers was intended to extend the target area so that it would encompass the entire C-mos coding sequence and several hundred bases 5' and 3' of the coding region. Base-calling and assembly of sequence reads were performed using Phred and Phrap, respectively [21–23]. Trimming of the assembled C-mos sequence was conducted using Cross_Match . The 3576 bp product was submitted to GenBank and assigned the accession FJ011695.
Results and discussion
BAC library coverage
The exact genome size of C. porosus is unknown. However, measurements made for two other Crocodylus species (C. siamensis and C. niloticus)  are both 2778 Mb. Assuming the C. porosus genome size is similar to those of these closely allied taxa, we estimate that the library affords 3.7× coverage (i.e., 10.2 Gb ÷ 2778 Mb = 3.7) of the C. porosus genome. Theoretically, this level of coverage affords 98% probability of finding any given genomic sequence at least once in the library .
Survey of CR1 elements in crocodile genome
CR1 elements are non-LTR retrotransposons existing in high copy numbers in bird and reptile genomes [26–28]; there are about 100,000 CR1 elements in the chicken genome . CR1 retrotransposons are considered excellent markers for molecular phylogenetic and population genetic studies [29, 30]. Initial studies on the sequences from the 21 BAC clones of Alligator mississippiensis available in GenBank [AC164519.3, AC154087.3, AC161341.3, AC165215.2, AC162159.2, AC155801.3, AC155802.2, AC154170.2, AC155800.2, AC155799.2, AC154169.2, AC154945.2, AC154088.2, AC149028.2, AC148923.3, AC149025.3, AC148578.2, AC149029.2, AC149026.2, AC148964.2, and AC149027.1] revealed that at least two CR1 subfamilies, referred to here as CR1a and CR1b, have recently been active in crocodilian genomes. This observation is consistent with Shedlock et al. (2007) in which the authors suggested that multiple CR1 lineages may have been active in alligators. In addition to A. mississippiensis, CR1a and CR1b have been identified in Crocodylus moreletii, and Osteolaemus tetraspis (D. Ray, unpublished) and consensus sequences for conserved regions of these elements have been generated (see Additional File 1, Table S1).
Statistical analysis of the macroarray data indicates that CR1a/b elements are not randomly distributed throughout the C. porosus genome. The macroarray contains 18,432 individual clones with an average insert size of 102 kb. Consequently, a single macroarray represents roughly 0.68 genome equivalents, i.e., (18,432•102 kb) ÷ 2778 Mb. If the crocodile genome contains 19,754 copies of CR1a/b, then we would expect approximately 13,369 copies of CR1a/b per macroarray, and if these were distributed randomly we would expect, on average, 0.73 copies of CR1a/b per clone. However, only 8.9% of clones on the macroarray show hybridization to the CR1a/b probes. To test whether such a distribution is likely by chance, we can formulate the problem as a statistical "urn model." Suppose we have 18,432 urns, and we drop 13,369 balls into them at random. In such a case, classical statistical asymptotic theory  describes the distribution of the number of occupied (or empty) urns. In this experiment we found 1,203 occupied and 17,229 empty urns. The null and alternative hypotheses are as follows:
H0: The allocation is completely random;
HA: Not H0.
Under H0, the number of empty urns is (approximately) normally distributed with mean 8,924 and standard deviation ≈ 40  (verified both theoretically and by simulation), so the observed number of empty urns (i.e., 17,229) is more than 200 standard deviations above the mean, and hence is almost impossible under H0 (the formal p-value is zero to 32 decimal places). We therefore conclude that the distribution of CR1a/b is non-random.
A project is underway that will involve sequencing some CR1 positive BAC clones from C.porosus so that we may (among other things) compare the structure and distribution of CR1 elements in the Alligator and Crocodylus genomes. The sequence data can then be used for evolutionary analyses among crocodilians, birds, and non-archosaur reptiles.
Identification of a crocodile C-mos gene containing BAC clone
A randomly selected macroarray was hybridized with the C-mos overgos (Table 1), and fortuitously one of the 18,432 double-spotted clones on the array exhibited a positive signal (Figure 3). The plate and well address of the clone was determined based upon the macroarray number, the location of the positive signal on the macroarray, and the spatial relationship between the two spots [see ]. To make sure that the correct clone was identified, a hand-held plate gridding/replicating device was used to stamp two nylon filters with the clones in the 384-well plate believed to contain the clone of interest. Blot hybridization using the C-mos overgo probes verified that the plate and well address obtained from the filter were correct (Figure 4). The duplicate filter was probed with the CR1b overgos. Of note, the C-mos positive clone shows no visible hybridization with the CR1b sequence (Figure 4). PCR with the CMF1 and CMR1 primers was used to independently verify the presence of the C-mos locus in the positive BAC clone.
The C. porosus C-mos gene
Similarity of complete coding sequences of four C-mos genes
GenBank accession or reference to database from which sequence was mined
Nucleotide identities (%) with respect to C. porosus
Amino acid identities (%) with respect to C. porosus
Crocodilian sequence and comparative genomics
Comparative genomics research is a burgeoning field with high potential to increase our understanding of the structure, function, and evolution behind the diversity of life. However, the primary focus of most efforts over the past several years has been on comparisons among mammals. For example, Miller et al. recently created a 28-way alignment of available vertebrate genomes in which only eight taxa, Gallus, Anolis, Xenopus and five fish represent the entirety of non-mammalian vertebrates. Understanding the evolution and interrelationships among all amnotes will be severely hindered by this lack of diversity. Ongoing projects to sequence the green anole (Sqamata) and the painted turtle (Chelonia) will help correct the disparity but one significant lineage of the amniote tree remains to be addressed – Crocodylia. It is our hope that the generation of this library will facilitate genomics research in this critical lineage.
We have constructed a high quality 3.7× BAC library for Crocodylus porosus and demonstrated the library's utility as a genomics tool. We are currently screening the library with other genes and repeat sequences as a means of investigating the structure of the Australian saltwater crocodile genome and facilitating comparative genomics research among archosaurs. Copies of the BAC library, individual clones, and macroarrays can be obtained from the Mississippi Genome Exploration Laboratory .
We thank Sally Isberg, Lee Miles, and Travis Glenn for helping us obtain the crocodile blood and Lauren Dembeck for conducting the initial CR1 analyses of A. mississippiensis, O. tetraspis, and C. moreletii. This research was supported, in part, by USDA award ARS-58-6402-7-241 (DGP) and National Science Foundation award DBI-0421717 (DGP). DAR was supported by the Eberly College of Arts and Sciences at WVU.
This article has been published as part of BMC Genomics Volume 10 Supplement 2, 2009: Proceedings of the Avian Genomics Conference and Gene Ontology Annotation Workshop. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/10?issue=S2
- Janke A, Arnason U: The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles). Mol Biol Evol. 1997, 14: 1266-1272.View ArticlePubMedGoogle Scholar
- Kumar S, Hedges SB: A molecular timescale for vertebrate evolution. Nature. 1998, 392: 917-920.View ArticlePubMedGoogle Scholar
- Dessauer HC, Glenn TC, Densmore LD: Studies on the molecular evolution of the Crocodylia: footprints in the sands of time. J Exp Zool. 2002, 294: 302-311.View ArticlePubMedGoogle Scholar
- Willis RE, McAliley LR, Neeley ED, Densmore LD: Evidence for placing the false gharial (Tomistoma schlegelii) into the family Gavialidae: inferences from nuclear gene sequences. Mol Phylogenet Evol. 2007, 43: 787-794.View ArticlePubMedGoogle Scholar
- Benton MJ, Donoghue PC: Paleontological evidence to date the tree of life. Mol Biol Evol. 2007, 24: 26-53.View ArticlePubMedGoogle Scholar
- Iwabe N, Hara Y, Kumazawa Y, Shibamoto K, Saito Y, Miyata T, Katoh K: Sister group relationship of turtles to the bird-crocodilian clade revealed by nuclear DNA-coded proteins. Mol Biol Evol. 2005, 22: 810-813.View ArticlePubMedGoogle Scholar
- International Chicken Genome Sequencing Consortium: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432: 695-716.View ArticleGoogle Scholar
- Slate J, Hale MC, Birkhead TR: Simple sequence repeats in zebra finch (Taeniopygia guttata) expressed sequence tags: a new resource for evolutionary genetic studies of passerines. BMC Genomics. 2007, 8: 52-PubMed CentralView ArticlePubMedGoogle Scholar
- Shedlock AM, Botka CW, Zhao S, Shetty J, Zhang T, Liu JS, Deschavanne PJ, Edwards SV: Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome. Proc Natl Acad Sci USA. 2007, 104: 2767-2772.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang H-B, Wu C: BACs as tools for genome sequencing. Plant Physiol Biochem. 2001, 39: 195-209.View ArticleGoogle Scholar
- Sundquist A, Ronaghi M, Tang H, Pevzner P, Batzoglou S: Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE. 2007, 2: e484-PubMed CentralView ArticlePubMedGoogle Scholar
- Miyake T, Amemiya CT: BAC libraries and comparative genomics of aquatic chordate species. Comp Biochem Physiol C Toxicol Pharmacol. 2004, 138: 233-244.View ArticlePubMedGoogle Scholar
- Darwin Crocodile Farm. [http://www.crocfarm.com.au]
- Mississippi Genome Exploration Laboratory (MGEL). [http://www.mgel.msstate.edu]
- Peterson DG, Tomkins JP, Frisch DA, Wing RA, Paterson AH: Construction of plant bacterial artificial chromosome (BAC) libraries: An illustrated Guide. J Agric Genomics. 2000, 5: [http://wheat.pw.usda.gov/jag/]Google Scholar
- Chalhoub B, Belcram H, Caboche M: Efficient cloning of plant genomes into bacterial artificial chromosome (BAC) libraries with larger and more uniform insert size. Plant Biotechnol J. 2004, 2: 181-188.View ArticlePubMedGoogle Scholar
- NCBI Probe Database. [http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/TechOvergo.shtml]
- McPherson JD, Marra M, Hillier L, et al: A physical map of the human genome. Nature. 2001, 409: 934-941.View ArticlePubMedGoogle Scholar
- CHORI BACPAC Resources Center. [http://bacpac.chori.org/overgohyb.htm]
- Peterson DG, Schulze SR, Sciara EB, et al: Integration of Cot analysis, DNA cloning, and high-throughput sequencing facilitates genome characterization and gene discovery. Genome Res. 2002, 12: 795-807.PubMed CentralView ArticlePubMedGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.View ArticlePubMedGoogle Scholar
- Dresser ME, Ewing DJ, Conrad MN, Dominguez AM, Barstead R, Jiang H, Kodadek T: DMC1 functions in a Saccharomyces cerevisiae meiotic pathway that is largely independent of the RAD51 pathway. Genetics. 1997, 147: 533-544.PubMed CentralPubMedGoogle Scholar
- Phred, Phrap, Consed. [http://www.phrap.org]
- Capriglione T, Olmo E, Odierna G, Improta B, Morescalchi A: Cytofluorometric DNA base determination in vertebrate species with different genome sizes. Basic Appl Histochem. 1987, 31: 119-126.PubMedGoogle Scholar
- Plomion C, Chagné D, Pot D, et al: The Pines. Genome Mapping and Molecular Breeding in Plants, Forest Trees. Edited by: Kole CR. 2007, Heidelberg, Berlin, New York, Tokyo: Springer, 7: 29-78.Google Scholar
- Chen ZQ, Ritzel RG, Lin CC, Hodgetts RB: Sequence conservation in avian CR1: an interspersed repetitive DNA family evolving under functional constraints. Proc Natl Acad Sci USA. 1991, 88: 5814-5818.PubMed CentralView ArticlePubMedGoogle Scholar
- Vandergon TL, Reitman M: Evolution of chicken repeat 1 (CR1) elements: evidence for ancient subfamilies and multiple progenitors. Mol Biol Evol. 1994, 11: 886-898.PubMedGoogle Scholar
- Shedlock AM: Phylogenomic investigation of CR1 LINE diversity in reptiles. Syst Biol. 2006, 55: 902-911.View ArticlePubMedGoogle Scholar
- Ray DA, Xing J, Salem AH, Batzer MA: SINEs of a nearly perfect character. Syst Biol. 2006, 55: 928-935.View ArticlePubMedGoogle Scholar
- Ray DA, Walker JA, Batzer MA: Mobile element-based forensic genomics. Mutat Res. 2007, 616: 24-33.View ArticlePubMedGoogle Scholar
- Roos J, Aggarwal RK, Janke A: Extended mitogenomic phylogenetic analyses yield new insight into crocodylian evolution and their survival of the Cretaceous-Tertiary boundary. Mol Phylogenet Evol. 2007, 45: 663-673.View ArticlePubMedGoogle Scholar
- Kolchin VF, Sevastianov BA, Chistyakov VP, Balakrishnan AV: Random Allocations. 1978, New York: Halsted PressGoogle Scholar
- Holst L: Limit Theorems for Some Occupancy and Sequential Occupancy Problems. Annals Mathemat Stat. 1971, 42: 1671-1680.View ArticleGoogle Scholar
- Clemson University Genomics Institute (CUGI) – Filter Illustration. [http://www.genome.clemson.edu/protocols.shtml]
- Saint KM, Austin CC, Donnellan SC, Hutchinson MN: C-mos, a nuclear marker useful for squamate phylogenetic analysis. Mol Phylogenet Evol. 1998, 10: 259-263.View ArticlePubMedGoogle Scholar
- Brochu CA: Progress and future directions in archosaur phylogenetics. J Paleontology. 2001, 75: 1185-1201.View ArticleGoogle Scholar
- Washington University Genome Sequencing Center. [http://genome.wustl.edu]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.