A bacterial artificial chromosome library for the Australian saltwater crocodile (Crocodylus porosus) and its utilization in gene isolation and genome characterization
© Shan et al. 2009
Published: 14 July 2009
Skip to main content
© Shan et al. 2009
Published: 14 July 2009
Crocodilians (Order Crocodylia) are an ancient vertebrate group of tremendous ecological, social, and evolutionary importance. They are the only extant reptilian members of Archosauria, a monophyletic group that also includes birds, dinosaurs, and pterosaurs. Consequently, crocodilian genomes represent a gateway through which the molecular evolution of avian lineages can be explored. To facilitate comparative genomics within Crocodylia and between crocodilians and other archosaurs, we have constructed a bacterial artificial chromosome (BAC) library for the Australian saltwater crocodile, Crocodylus porosus. This is the first BAC library for a crocodile and only the second BAC resource for a crocodilian.
The C. porosus BAC library consists of 101,760 individually archived clones stored in 384-well microtiter plates. NotI digestion of random clones indicates an average insert size of 102 kb. Based on a genome size estimate of 2778 Mb, the library affords 3.7 fold (3.7×) coverage of the C. porosus genome. To investigate the utility of the library in studying sequence distribution, probes derived from CR1a and CR1b, two crocodilian CR1-like retrotransposon subfamilies, were hybridized to C. porosus macroarrays. The results indicate that there are a minimum of 20,000 CR1a/b elements in C. porosus and that their distribution throughout the genome is decidedly non-random. To demonstrate the utility of the library in gene isolation, we probed the C. porosus macroarrays with an overgo designed from a C-mos (oocyte maturation factor) partial cDNA. A BAC containing C-mos was identified and the C-mos locus was sequenced. Nucleotide and amino acid sequence alignment of the C. porosus C-mos coding sequence with avian and reptilian C-mos orthologs reveals greater sequence similarity between C. porosus and birds (specifically chicken and zebra finch) than between C. porosus and squamates (green anole).
We have demonstrated the utility of the Crocodylus porosus BAC library as a tool in genomics research. The BAC library should expedite complete genome sequencing of C. porosus and facilitate detailed analysis of genome evolution within Crocodylia and between crocodilians and diverse amniote lineages including birds, mammals, and other non-avian reptiles.
Crocodilians (Order Crocodylia) are a group of reptiles that originated roughly 200 million years ago [1, 2]. They are apex predators in the marine and freshwater habitats in which they reside, and they play a major role in warm-water ecosystems throughout the world. There are 23 extant species grouped into three families – Crocodilidae (crocodiles), Alligatoridae (alligators and caimans), and Gavialidae (gharials) [3, 4]. As evidenced by their frequent appearances in video documentaries and television programs, crocodilians are a subject of considerable human curiosity. Moreover, these reptiles have been common subjects/characters in mythology, folk tales, art (including cave paintings and hieroglyphics), and literature suggesting that they have considerable symbolic and practical significance in the lives of humans, past and present.
For roughly 15 years, bacterial artificial chromosome (BAC) libraries have been the principal molecular substrate used in physical mapping and complete eukaryote genome sequencing . Gridding of ordered BAC libraries (i.e., libraries in which each clone is stored in its own microtiter well) onto macroarrays and multiplex screening techniques have facilitated rapid gene isolation. The utility of BAC clones as substrates for end sequencing, in conjunction with advanced DNA fingerprinting techniques and macroarray analysis, has permitted construction of robust physical maps and selection of minimum tiling paths (i.e., sets of minimally overlapping BAC clones spanning entire chromosomes or chromosomal regions) for accurate genome sequencing and assembly. Recent advances in sequencing technologies (e.g., 454 pyrosequencing, Illumina sequencing, etc.) have created powerful opportunities in which ordered BAC libraries play a critical role. A particularly promising strategy for simultaneous physical mapping and sequencing of large eukaryotic genomes involves sequencing pools of sheared, individually "bar coded" BAC clones. After sequencing, those reads sharing a bar code (i.e., corresponding to the same BAC) are grouped together and assembled in silico, and physical maps are constructed by identifying overlapping assembled or partially assembled BAC sequences .
To expedite genome research in crocodilians, we have constructed a BAC library for the Australian saltwater crocodile (Crocodylus porosus). The C. porosus library is only the second large-insert DNA library for a crocodilian – a 10× library exists for Alligator mississippiensis  – and the only BAC library for Crocodilidae, the largest of the crocodilian families. C. porosus is the largest living crocodilian and, along with A. mississippiensis, the only crocodilian species to be commercially farmed. Here we describe generation and characterization of the C. porosus BAC library and demonstrate its utility as a tool for gene isolation, genome characterization, and comparative genomics.
Whole blood was obtained from Errol, a male C. porosus from the Darwin Crocodile Farm  near Darwin, Australia. Blood was suspended in citrate buffer (250 mM sucrose, 40 mM trisodium citrate, pH 7.6) containing 5% v/v dimethylsulfoxide, aliquoted into 1.5 ml polypropylene tubes, flash frozen in liquid nitrogen, and shipped to the Mississippi Genome Exploration Laboratory . One of the tubes was thawed on ice and centrifuged at 4,000 rpm in a microcentrifuge for 4 min. The supernatant was decanted, the pellet was gently re-suspended in 1 ml of STEX buffer (100 mM NaCl, 100 mM Tris-HCl, 100 mM EDTA, pH 8.0), and the mixture was centrifuged as described above. The blood cell pellet was re-suspended in 500 μl STEX buffer and placed in a water bath at 45°C. After 15 min, the blood suspension was mixed with an equal volume of 45°C 2% w/v Cambrex (Rockland, ME) SeaPlaque Agarose (cat. no. 50100) in STEX buffer. The mixture was poured into a small Petri dish so that the depth of the solution was roughly 2 mm. After 20 min at 4°C, the resulting gel was cut into 10 × 5 mm rectangles, and these "plugs" were transferred into a conical 50 ml polypropylene tube containing 40 ml of lysis buffer (STEX buffer containing 1% w/vN-lauroylsarcosine and 300 mg/ml proteinase K). The capped tube was incubated at 37°C overnight with gentle agitation. Plugs were transferred into 0.5 M EDTA (pH 8.0) containing 0.1 M phenylmethylsulfonyl fluoride and incubated at 4°C for one hour. Plugs were washed in 0.5 MEDTA (pH 8.0) and then stored at 4°C in this buffer.
A few test DNA plugs were exposed to different HindIII concentrations to determine conditions providing the largest number of fragments between 100 to 500 kb [see ]. The optimal enzyme concentration as determined in the test digests was used in a large-scale partial digest. Plugs used in the mass digestion were macerated and placed in a slot well of a 1% w/v Cambrex SeaKem Gold Agarose (cat. no. 50150) gel in 0.25 × TBE buffer (22.5 mM Tris, 22.5 mM boric acid, 0.5 mM EDTA, pH 8.0). Size selection of partially digested DNA was performed using pulsed-field gel electrophoresis (PFGE) according to Chalhoub et al. . Size-selected HindIII fragments between 100 and 500 kb were recovered from agarose by electroelution according to Peterson et al. .
BAC library construction was performed as described in Peterson et al.  using the pIndigoBAC-5 vector (Epicentre, Madison WI) and ElectroMAX DH10B T1 Phage-Resistant Competent Cells (Invitrogen, Carlsbad, CA). Clone picking and library replication were performed using a Genetix QPixII robot (New Milton, Hampshire, UK). To monitor the quality of the BAC library and determine mean insert size, 96 BAC clones from every fiftieth 384-well plate were evaluated by NotI digestion and PFGE. For these analyses BAC DNA was isolated using an AutoGenprep 960 robot (AutoGen, Holliston, MA).
High density macroarrays were prepared using a Genetix QPixII robot. Each array consisted of 18,432 double-spotted BAC clones stamped onto a 22.5 cm2 Hybond N+ filter (GE Healthcare, Piscataway, NJ). There were enough C. porosus BAC clones to produce five complete macroarrays (101,760 clones ÷ 18,432 clones/macroarray = 5.52). Stamped arrays were placed clone-side up on LB (Luria-Bertani) agar containing 12.5 mg/L chloramphenicol and incubated at 37°C overnight. Each macroarray was fixed via incubation in 0.5 N NaOH, 1.5 M NaCl for 7 min followed by incubation in 1.5 M NaCl, 0.5 M Tris Cl for 7 min. The membranes were allowed to air dry for 1 h, treated with 0.4 N NaOH for 20 min, and washed in 5× SSPE (0.75 M NaCl, 50 mM Na2HPO4, 5 mM EDTA, pH 7.4) for 7 min. Macroarrays were air dried and stored in sealed plastic bags.
Overgo and primer sequences
C-mos overgo, forward
C-mos overgo, reverse
CR1a overgo, forward
CR1a overgo, reverse
CR1b overgo, forward
CR1b overgo, reverse
C-mos forward primer1
C-mos reverse primer1
C-mos forward primer 2
C-mos reverse primer 2
A BAC clone containing the C. porosus C-mos gene was digested with BamHI and HindIII at 37°C for 1 hr followed by heating to 65°C for 10 min. The cloning vector pCRII-TOPO (Invitrogen, Carlsbad, CA) was likewise double-digested, purified by electrophoresis on a 1% w/v agarose gel, and isolated from agarose using a Qiagen (Valencia, CA) QiaQuick Gel Extraction kit. Ligation was performed at 16°C for 16 hrs. The ligation mixture was used to transform chemically competent Invitrogen (Carlsbad, CA) One Shot TOP10 cells according to the manufacturer's instructions. Subclones were plated and 40 were screened by PCR using the C-mos primers CMF1 and CMR1 (Table 1). One positive subclone, which was shown by gel electrophoresis to contain a 3.7 kb insert, was sent to SeqWright (Houston, TX) for cycle sequencing using the CMF1 and CMR1 primers and two additional primers (CMF2 and CMR2 – Table 1). Use of a combination of primers was intended to extend the target area so that it would encompass the entire C-mos coding sequence and several hundred bases 5' and 3' of the coding region. Base-calling and assembly of sequence reads were performed using Phred and Phrap, respectively [21–23]. Trimming of the assembled C-mos sequence was conducted using Cross_Match . The 3576 bp product was submitted to GenBank and assigned the accession FJ011695.
The exact genome size of C. porosus is unknown. However, measurements made for two other Crocodylus species (C. siamensis and C. niloticus)  are both 2778 Mb. Assuming the C. porosus genome size is similar to those of these closely allied taxa, we estimate that the library affords 3.7× coverage (i.e., 10.2 Gb ÷ 2778 Mb = 3.7) of the C. porosus genome. Theoretically, this level of coverage affords 98% probability of finding any given genomic sequence at least once in the library .
CR1 elements are non-LTR retrotransposons existing in high copy numbers in bird and reptile genomes [26–28]; there are about 100,000 CR1 elements in the chicken genome . CR1 retrotransposons are considered excellent markers for molecular phylogenetic and population genetic studies [29, 30]. Initial studies on the sequences from the 21 BAC clones of Alligator mississippiensis available in GenBank [AC164519.3, AC154087.3, AC161341.3, AC165215.2, AC162159.2, AC155801.3, AC155802.2, AC154170.2, AC155800.2, AC155799.2, AC154169.2, AC154945.2, AC154088.2, AC149028.2, AC148923.3, AC149025.3, AC148578.2, AC149029.2, AC149026.2, AC148964.2, and AC149027.1] revealed that at least two CR1 subfamilies, referred to here as CR1a and CR1b, have recently been active in crocodilian genomes. This observation is consistent with Shedlock et al. (2007) in which the authors suggested that multiple CR1 lineages may have been active in alligators. In addition to A. mississippiensis, CR1a and CR1b have been identified in Crocodylus moreletii, and Osteolaemus tetraspis (D. Ray, unpublished) and consensus sequences for conserved regions of these elements have been generated (see Additional file 1, Table S1).
Statistical analysis of the macroarray data indicates that CR1a/b elements are not randomly distributed throughout the C. porosus genome. The macroarray contains 18,432 individual clones with an average insert size of 102 kb. Consequently, a single macroarray represents roughly 0.68 genome equivalents, i.e., (18,432•102 kb) ÷ 2778 Mb. If the crocodile genome contains 19,754 copies of CR1a/b, then we would expect approximately 13,369 copies of CR1a/b per macroarray, and if these were distributed randomly we would expect, on average, 0.73 copies of CR1a/b per clone. However, only 8.9% of clones on the macroarray show hybridization to the CR1a/b probes. To test whether such a distribution is likely by chance, we can formulate the problem as a statistical "urn model." Suppose we have 18,432 urns, and we drop 13,369 balls into them at random. In such a case, classical statistical asymptotic theory  describes the distribution of the number of occupied (or empty) urns. In this experiment we found 1,203 occupied and 17,229 empty urns. The null and alternative hypotheses are as follows:
H 0: The allocation is completely random;
H A: Not H0.
Under H 0, the number of empty urns is (approximately) normally distributed with mean 8,924 and standard deviation ≈ 40  (verified both theoretically and by simulation), so the observed number of empty urns (i.e., 17,229) is more than 200 standard deviations above the mean, and hence is almost impossible under H 0 (the formal p-value is zero to 32 decimal places). We therefore conclude that the distribution of CR1a/b is non-random.
A project is underway that will involve sequencing some CR1 positive BAC clones from C.porosus so that we may (among other things) compare the structure and distribution of CR1 elements in the Alligator and Crocodylus genomes. The sequence data can then be used for evolutionary analyses among crocodilians, birds, and non-archosaur reptiles.
A randomly selected macroarray was hybridized with the C-mos overgos (Table 1), and fortuitously one of the 18,432 double-spotted clones on the array exhibited a positive signal (Figure 3). The plate and well address of the clone was determined based upon the macroarray number, the location of the positive signal on the macroarray, and the spatial relationship between the two spots [see ]. To make sure that the correct clone was identified, a hand-held plate gridding/replicating device was used to stamp two nylon filters with the clones in the 384-well plate believed to contain the clone of interest. Blot hybridization using the C-mos overgo probes verified that the plate and well address obtained from the filter were correct (Figure 4). The duplicate filter was probed with the CR1b overgos. Of note, the C-mos positive clone shows no visible hybridization with the CR1b sequence (Figure 4). PCR with the CMF1 and CMR1 primers was used to independently verify the presence of the C-mos locus in the positive BAC clone.
Similarity of complete coding sequences of four C-mos genes
GenBank accession or reference to database from which sequence was mined
Nucleotide identities (%) with respect to C. porosus
Amino acid identities (%) with respect to C. porosus
Comparative genomics research is a burgeoning field with high potential to increase our understanding of the structure, function, and evolution behind the diversity of life. However, the primary focus of most efforts over the past several years has been on comparisons among mammals. For example, Miller et al. recently created a 28-way alignment of available vertebrate genomes in which only eight taxa, Gallus, Anolis, Xenopus and five fish represent the entirety of non-mammalian vertebrates. Understanding the evolution and interrelationships among all amnotes will be severely hindered by this lack of diversity. Ongoing projects to sequence the green anole (Sqamata) and the painted turtle (Chelonia) will help correct the disparity but one significant lineage of the amniote tree remains to be addressed – Crocodylia. It is our hope that the generation of this library will facilitate genomics research in this critical lineage.
We have constructed a high quality 3.7× BAC library for Crocodylus porosus and demonstrated the library's utility as a genomics tool. We are currently screening the library with other genes and repeat sequences as a means of investigating the structure of the Australian saltwater crocodile genome and facilitating comparative genomics research among archosaurs. Copies of the BAC library, individual clones, and macroarrays can be obtained from the Mississippi Genome Exploration Laboratory .
We thank Sally Isberg, Lee Miles, and Travis Glenn for helping us obtain the crocodile blood and Lauren Dembeck for conducting the initial CR1 analyses of A. mississippiensis, O. tetraspis, and C. moreletii. This research was supported, in part, by USDA award ARS-58-6402-7-241 (DGP) and National Science Foundation award DBI-0421717 (DGP). DAR was supported by the Eberly College of Arts and Sciences at WVU.
This article has been published as part of BMC Genomics Volume 10 Supplement 2, 2009: Proceedings of the Avian Genomics Conference and Gene Ontology Annotation Workshop. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/10?issue=S2
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.