Skip to main content

A clone-free, single molecule map of the domestic cow (Bos taurus) genome

Abstract

Background

The cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation.

Results

The optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts).

Conclusion

Alignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI’s current designation of UMD3.1 sequence assembly as the “reference assembly” and the Btau4.6 as the “alternate assembly.” The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds.

Background

Cattle are the most common type of large domesticated animals and have consequently played an important role in recent history of humankind since their domestication 8,000 to 10,000 years ago [1]. Cattle have enhanced human civilizations through their varied uses as livestock for meat, milk, and draft power. Accordingly, there are ~1.3 billion cattle in the world today providing a significant source of nutrition and livelihood to the human population. Domestic cattle comprise more than 800 breeds and are grouped taxonomically into two species—Bos taurus (taurine) and B. indicus (indicine)—which were evolved from the ancestral species of B. primigenius. Given this large and venerable resource of cattle breeds, cattle research efforts have also greatly contributed to our knowledge of genetics, endocrine function, fertilization, growth, lactation and mammalian biology. As such, there are still many unsolved questions regarding cattle adaptation to diverse terrestrial environments since domestication that center on how cattle convert low-grade forage to energy-rich fat, milk and meat, and, more fundamentally, how genetic underpinnings define economically important traits. The cattle genome was originally selected for sequencing due to its unique biology and economic importance, virtues that are also strengthened by its role as a model organism for understanding other ruminants, or mammals.

The Bovine Genome Sequencing and Analysis Consortium published the first draft sequence for the Bos taurus genome in 2009--a sizable effort costing $53 million and involving nearly 300 investigators from 25 countries [2, 3]. The initial sequence assembly (Btau4.0) was constructed by the Baylor College of Medicine Human Genome Sequencing Center using ~7.1-fold Sanger sequencing coverage of the genome. Their genome assembly approach combined a BAC (Bacterial Artificial Chromosome) clone-by-clone approach with whole genome shotgun (WGS) reads, and yielded an N50 contig size of 48.7 Kb and a N50 scaffold size of 1.9 Mb (Btau4.0; 135,743 contigs; 13,388 scaffolds; total mass: 2.77 Gb). 89 % of these assembled contigs and scaffolds were anchored onto the 29 bovine autosomes and the X chromosome based on the integrated FPC physical map [4], which combined a series of complementary mapping resources: 290,797 fingerprinted BACs, the human-cattle comparative map, the genetic map, and the radiation hybrid (RH) map [210]. The Center for Bioinformatics and Computational Biology, University of Maryland, using a different strategy, constructed another bovine assembly in 2009 based on the same raw sequence and map data (UMD2; 44,433 contigs; total mass: 2.86 Gb; contig N50: 93.56 Kb). Their strategy leveraged paired-end BAC sequence information, mapping data and, most notably, syntenic relationships to the human genome that allowed 91 % of the UMD2 contigs to be anchored to bovine chromosomes, based on the integrated bovine genome map [4, 11]. Comparisons between these two assemblies revealed substantial differences that appear as assembly errors, genome segmental inversions, chromosomal placements, sequence gap numbers, and discrepancies of the sequence coverage across the bovine genome [1113]. Two updated bovine genome sequence assemblies (Btau4.6 and UMD3.1) were released from these groups featuring additional BAC sequence data, corrected assembly errors and additional gap filling. Although comprehensive analyses of these recent releases have yet to be done, significant differences between these updated assemblies are generally expected to be encountered. Indeed, this article reports on notable disparities. Consequently, discrepancies between these assemblies engender ambiguities when using reference sequence data, which significantly impacts almost any type of genomic study in cattle.

The cattle genome, as discussed, enjoys a broad range of map resources that include: genetic linkage maps using microsatellite markers; markers comprising AFLP, EST, and BAC end sequences; a radiation hybrid map, and a BAC physical map [4, 5, 7, 9, 1417]. Despite this, these resources fall a bit short in several ways. The genetic linkage and radiation hybrid maps lack sufficient levels of unambiguous markers, but, more troubling, the linkage map is a composite constructed across many separate bovine populations and thus doesn’t reflect a single bovine genome. The bovine BAC physical map is a composite map that was constructed from three different BAC libraries developed from three different cattle breeds (Hereford CHORI-240, Holstein RPCI-42, and Angus TAMBT) [4, 7]. Understandably, such haplotype and/or breed-specific variability in these map resources could translate into ambiguities evidenced by sequence-map comparisons, which may have impacted the fine-scale assembly, or previous validations of the bovine reference sequence.

We constructed a comprehensive optical map spanning the bovine genome, using genomic DNA from just one animal (L1 Dominette 014490; the same Hereford animal that was originally sequenced) in order to circumvent this array of issues. This new resource will provide the bovine community with a highly accurate and comprehensive physical map that enables direct and independent comparisons amongst sequence builds, with goals pointed at sequence finishing and discovery of genomic differences. Briefly, Optical Mapping is a single-molecule system that constructs high-resolution physical scaffolds, covering entire genomes to guide many stages of genome sequence assembly and validation [1824]. Since it assembles genome-wide ordered restriction maps from massive datasets comprising randomly sheared genomic DNA molecules (~400 kb), artefacts associated with cloning and amplification are completely obviated. Furthermore, very long DNA molecules span complex genomic regions that are rife with repeats that generally hinder accurate sequence assembly without Optical Mapping analysis. As such, our optical map offers an uniquely effective means for resolving and mediating the differences between the two different bovine genome sequence assemblies in several ways: 1) recruiting new orphan sequence contigs that fill sequence gaps; 2) providing an independent resource that potentiates finishing through sequence gap characterization, and 3) enabling independent validations of sequence assemblies.

Results

Optical map dataset

Genomic DNA was prepared from L1 Dominette 014490 blood samples, after separation of white blood cells, and then BamHI restriction mapped using our Optical Mapping pipeline (Materials and Methods). This raw dataset holds 1,908,396 Rmaps (1 Rmap = 1 single molecule restriction map) ≥ 300 Kb, with an average size of 397.49 Kb (300–2,515.20 Kb) and a total mass of 758,574.97 Mb (~270 X coverage, before alignment, assuming a ~2.8 Gb genome). One Rmap is the restriction map of a single genomic DNA molecule; it represents the most fundamental unit of map data in functional ways akin to a sequence read. A second map dataset was contributed by Prof. Juan F. Medrano and after size filtering (≥300 kb) it added another 1,064,919 Rmaps, bringing the total raw dataset up to 2,973,315 (439 X coverage, before alignment).

Initial evaluation of the genome builds UMD3.1 and Btau4.6 via pairwise alignments of the Rmap dataset

The UMD3.1 and Btau4.6 references were first evaluated for large-scale errors by inspection of the pairwise alignments [25, 26] of the entire Rmap dataset against BamH1 in silico restriction maps (constructed in the computer) of both sequence builds (Materials and Methods). These map vs. reference alignments produce files, similar to sequence SAM/BAM files, which note the location of each aligned Rmap (Additional file 1: Figure S1 and Additional file 2: Figure S2). Such alignments allow us to quickly filter-out marginal Rmaps from the raw dataset and provide an initial assessment of the completeness of a given sequence build [27]. The average Rmap coverage after alignment varies considerably between the two builds, with 42 X for Btau4.6, while UMD3.1 boasts 70 X. Additional file 1: Figure S1 and Additional file 2: Figure S2 also show a specific example of disparate rates of Rmap coverage, focusing on a 3.3 Mb region on chromosome 8, highlighted by a green box, where 8 Rmaps (~1 X coverage) are aligned to Btau4.6, compared to 527 Rmaps (~64 X) aligned to UMD3.1. Given these vastly different overall alignment rates and patterns, we chose UMD3.1 to serve as our reference sequence build for assembling the optical map.

Optical map assembly

Our optical map assembly strategy used a two-pronged approach involving iterative assembly, requiring a sequence reference [26], and de novo assembly for dealing with large-scale discordances (sequence vs. map) and gaps in the UMD3.1 build (Fig. 1). Many of these problematic regions are sparsely populated by Rmaps as evidenced from inspection of Additional file 1: Figure S1 (see regions highlighted by purple boxes). Accordingly, the workflow (Fig. 1) shows how iterative assembly selectively shunts uncontiged Rmaps, mostly originating from these problematic regions, into “Germinate and Grow” (G & G) for de novo assembly. Resulting optical map contigs from both sides of the workflow were then curated and combined for finishing the optical map. Details follow in the next two subsections.

Fig. 1
figure1

Strategy and workflow used for assembling the optical map-- BtOM1.0. Left side describes reference-based iterative map assembly: exhaustive pairwise alignments of the complete Rmap dataset (purple lettering) against the UMD3.1 sequence reference maps (red lettering) generated in silico produces “piles” of Rmaps. Such alignments are then divided into overlapping bins (1 Mb bins; 500 Kb overlap), which are then independently assembled into updated reference maps (optical contigs bearing consensus maps) that are used for 8 subsequent cycles (blue circular arrows) of alignment (entire Rmap dataset) and assembly, all performed without using the sequence reference. Right side depicts de novo map assembly, using those Rmaps from the entire dataset not recruited for optical map contig formation during iterative map assembly, which are used to construct a de Bruijn graph via k-mer hashing. “Seed” maps are then assembled from Rmaps in each each confident node within graph and used as an optical reference for pairwise alignments (piles) of Rmaps during iterative assembly (8 times; blue circular arrows). Bottom shows the merged assembly of updated optical contigs from each bin (iterative assembly) and contigs assembled from “seed” maps (de novo assembly) into the final optical map-- BtOM1.0, which was used to tabulate map vs. sequence discordances

Iterative assembly

We published a workflow in 2010 [26], termed “iterative assembly” (Fig. 1), which embedded genome assembly algorithms [2831], originally designed to deal with small bacterial or fungal genomes, within a new pipeline. This pipeline supports the assembly and analysis of large mammalian and plant optical maps by distributing the computation into large numbers of independent jobs that can be executed on a high-throughput computing network. Briefly, iterative assembly uses an in silico restriction map of available genome sequence resources--contigs, scaffolds, pseudomolecules, etc.--as a reference for exhaustive pairwise alignment [25] of entire Rmap datasets. Both sequence data (UMD3.1) and actual genomic DNA molecules (Rmaps) are “cut” with the same restriction enzyme. Thusly placed Rmaps, termed “piles,” covering an entire genome, are then divided into 1 Mb overlapping bins along each chromosome for assembly; each bin is independently assembled into contigs. Each optical contig bears a consensus map, which now becomes the updated, independent reference; sequence information is no longer used in the assembly process. Repeatedly iterating this workflow increases optical contig length, number and depth.

Eight iteration cycles were performed using a BamH1 in silico map constructed from the UMD3.1 sequence build as the initial reference and with a minimum depth of 20 Rmaps. 3,048 contigs emerged after the first iteration ranging 404–2,943 Kb in size; averaging 1,826 Kb. However, after 8 iterations the number of contigs increased to 3,321, and their average span was boosted to 3,545 Kb (421–6,456 Kb). Contigs presenting very long tandem repeats were removed from this process. These 3,321 optical contigs were then grouped by chromosome, using alignments to UMD3.1 and each grouping was independently assembled into a total of just 79 contigs spanning 96.71 % of the UMD3.1 build.

de novo assembly

We have previously reported on the Map Assembler, a de novo optical map assembler capable of assembling bacterial maps [32]. However, the Map Assembler algorithm has polynomial complexity (degree >2) and exceeds feasible memory and time constraints for genomes of size >10 Mb. To face the challenge of assembling larger genomes, we’ve implemented Germinate and Grow (G & G), a new de novo assembly algorithm that will be fully described elsewhere. The conceptual basis for G & G is an extension of the de Bruijn graph approach to sequence assembly [33, 34]. Simply put, a whole genome optical map can be represented by the traversals of a certain graph, and the assembly problem is to discover those traversals from the input data set of Rmaps. Specifically, we use geometric k-mer hashing [35] to identify nodes in the de Bruijn graph that are very likely error-free and then traverse the “read” paths implied by the Rmaps containing instances of those nodes. This traversal allows us to localize the assembly; we then use the Map Assembler on the subsets of Rmaps that are near each other on this graph. We call the resulting consensus maps seed maps. The seed maps typically cover most of the genome and they reliably approximate highly confident paths in the graph.

The seed maps are then extended and refined using the iterative assembly engine (Fig. 1), producing another set of consensus maps. The error rate for these consensus maps is sufficiently low for resolving the corresponding Euler path and assembling all but the most repetitive regions of the genome. We then fill gaps in the assembly by repeating the process, generating another set of seed maps and extending and refining them. For this set of seed maps, we use a lower stringency (smaller value of k) and use only those Rmaps not already represented in the genome reference-based iterative assembly.

We used G & G to assemble just those Rmaps (2,448,748) that escaped assembly within the iterative assembly pipeline, which yielded 1,500 optical contigs, with most recapitulating those constructed by iterative assembly. As such, these de novo optical maps were largely used to augment and cross-validate optical map assemblies constructed by iterative assembly. The final bovine optical map--termed, “BtOM1.0”--comprises 78 contigs spanning of 2,575.30 Mb across the genome (alignments to UMD3.1 are found in Fig. 4), at an average depth of 77 Rmaps and an average contig size of 33.02 Mb (659.71 Kb–140.22 Mb; Table 1).

Table 1 BtOM1.0 contigs and their chromosome assignments

The haploid bovine genome harbors 29 acrocentric, autosomal chromosomes, and one sex chromosome, or 60 telomeric ends. Accordingly, 20/78 optical contigs (BtOMcontig_6, 8, 11, 16, 21, 23, 24, 34, 39, 40, 42, 45, 46, 47, 49, 50, 55, 57, 67, 69) present sharply demarcated ends (Figs. 2a and 3; Tables 1 and 2), which indicate that they’ve spanned into the repetitive sequences near telomeres. The remaining 40/60 chromosome ends are not, or, are partially spanned by optical maps because the short arms of these acrocentric chromosomes are densely populated by repeats, making them intractable to our analysis. Interestingly, we find that BtOMcontig_4 has ~6 Kb tandem repeats at one end, which also shows alignment to chromosome 11. In addition, BtOMcontig_2 presents tandem repeats with a repeat unit consisting of multiple BamHI fragments with a total unit mass of ~290 Kb and is anchored on bovine chromosome 6 (Fig. 2b; Table 1). Lastly, an additional 5 optical map contigs, spanning 19.43 Mb show evidence of heterozygosity, manifested as indels (40 Kb, 173 kb, 248 Kb, 348 Kb, and 418 Kb) on chromosomes (6, 14, 15, and X) within BtOM1.0 as illustrated by the examples in Fig. 2c. All the contiged Rmaps for each chromosome, and all the optical map contig consensus maps are available at GitHub (https://github.com/schwartz-lab/BovineGenomeOMdata/)

Fig. 2
figure2

Examples of optical contigs, within BtOM1.0, reveal a telomeric region, complex tandem repeats and heterozygous structural variants. a: Telomeric regions are defined through assembly when Rmap contigs present sharply defined edges (green arrow); example shown is chr14, BtOMcontig_8 (Fig. 3); horizontal tracks depict Rmaps (single molecule restriction maps) with boxes representing individual restriction fragments, color keyed as: (gold—agreement; red—extra cut; cyan—missing cut; and purple—compound events) based on comparisons against the consensus map (ConMap; blue track). b: An optical map contig, showing the structure of a complex tandem repeat. c: Consensus maps of optical contigs (gold tracks) aligned to BtOM1.0 (blue track) showing heterozygous structural variants distinguished by deletions and insertions; numbers on each fragment bar show restriction fragment size (Kb). The first haplotype deletion/insertion is in the region corresponding to UMD3.1_chr6 62,804,591–62,981,784, and this region encoded an ATPase and an aminophospholipid transporter (APLT). The second case is corresponding to UMD3.1 chr14 35,756,042–35,808,353. There is a ~27 Kb missing sequence in UMD3.1 chr14 at this region, and the other part of this region encoded a solute carrier organic anion transporter

Fig. 3
figure3

BtOM1.0 optical map (78 optical contigs) comprehensively aligned to UMD3.1. Optical contigs (consensus maps: BtOMcontig_0–77; gold tracks) extensively span across the UMD3.1 sequence assembly (cyan tracks) revealing a minimal number of gaps in the optical map (BtOM1.0)

Table 2 Alignment statistics for BtOM1.0 vs. UMD3.1 and Btau4.6

Construction of chromosome-wide optical maps and their comparison to UMD3.1 and Btau4.6

The optical contigs generated by iterative and de novo map assembly (Fig. 1) were merged through assembly of their consensus maps into 78 final optical contigs. They were then ordered and oriented, through alignment against a BamHI in silico restriction map constructed from the UMD3.1 sequence build (Fig. 3). Chromosome-wide optical maps, BtOM1.0_chr1-29 and BtOM1.0_chrX, were constructed with 500 Kb gaps inserted between any two of optical map contigs anchored on the UMD3.1 sequence assembly (BtOM1.0 available at GitHub: https://github.com/schwartz-lab/BovineGenomeOMdata/). This workflow constructed 30 chromosome-wide optical maps that were aligned to both Btau4.6 and UMD3.1 sequences using local alignment (Fig. 4). A series of contiguous restriction fragments that align between BtOM1.0 and the in silico maps of a sequence build is called a “map segment.” Tabulations describing these aligned map segments are listed in Table 2. In total, 135 map segments (over 78 optical contigs) present a total of aligned sequence segment mass of 2,297.08 Mb, with an average size of 17.02 Mb, or ~86 % of UMD3.1 are covered by optical maps. Map coverage of UMD3.1 ranges from 73 to 95 %. 50/60 chromosome ends, 19.14 Mb in total, within UMD3.1, are extended by optical maps (Fig. 5; Tables 1 and 2). For Btau4.6, 188 map segments align to BtOM1.0, with a total mass of 2,054.74 Mb (~78 % of Btau4.6) and an aligned map segment size averaging 10.93 Mb. The optical map coverage of Btau4.6 by optical maps is less than that tabulated for UMD3.1 and it ranges from ~65 to 82 % for all 30 bovine chromosomes. Lastly, 55/60 of the Btau4.6 chromosome ends are extended by optical maps with a mass totalling 38.35 Mb (Table 2).

Fig. 4
figure4

Genome-wide alignment of BtOM1.0 to UMD3.1 and Btau4.6 sequence assemblies. Tracks show BtOM1.0 (center) alignments to UMD3.1 (top) and Btau4.6 (bottom) for each chromosome. Red highlights (center track) restriction fragments aligned to both in silico maps of UMD3.1 and Btau4.6; cyan (center track) highlights alignment of BtOM1.0 to UMD3.1, or Btau4.6; and white or black highlights no alignments to either sequence build. Inset (green box) shows a zoomed view of chr27 detailing a large-scale discordance with the optical map: an inverted sequence assembly within Btau4.6. Also note transposed sequence assemblies, flagged by black lines running between separate chromosomes

Fig. 5
figure5

Examples of optical map coverage within telomeric, or sub-telomeric regions. a: BtOM1.0_chr11 aligned to chr11 of both UMD3.1 and Btau4.6. Lines show BamHI restriction sites with red highlighting those BtOM1.0_chr11 track (center) restriction fragments aligned to both UMD3.1_chr11 and Btau4.6_chr11. White highlights unaligned BtOM1.0_chr11 restriction fragments and extend 517 kb past both the UMMD3.1_chr11 and Btau4.6_chr11 sequence. b: BtOM1.0_chr22 aligned to chr22 of both UMD3.1 and Btau4.6. (Same color scheme as a.) The white region on BtOM1.0_chr22, is unaligned to both sequence builds and extends ~625 kb past UMD3.1_chr22

Discordance calling between optical maps and sequence assemblies

Discordances between BtOM1.0 and the UMD3.1 were called based on the alignments between the consensus maps that were trimmed and stripped off from the last (8th) cycle of iterative assembly (Fig. 1; Methods; [26]) and then manually curated. Complex discordances required directed alignment and assembly steps, complemented by additional curation, for their complete characterization. There are, in total, 4,754 discordances called between BtOM1.0 and the UMD3.1 based on only confident alignments and these discordances are tabulated as six categories (Additional file 3: Table S1; Table 3; Figs. 5 and 6): (1) large segments of inverted/translocated sequence (55; involving 31.11 Mb sequencing; Fig. 7a), (2) COMPLEX-events/misassembly/inversion (1,374; involving 111.54 Mb sequences; Fig. 7b and c), (3) INS-insertion/missing sequence (461;involving 15.38 Mb sequences), (4) DEL-deletion/extra sequence included (1,207; involving 44.82 Mb sequences), (5) EC-extra restriction site (1,320), and (6) MC-missing restriction site (337).

Table 3 Statistics for the six categories of discordances between sequence builds and BtOM1.0
Fig. 6
figure6

Circos plot of discordances found between BtOM1.0 (optical map) and UMD3.1 (sequence assembly). The tracks A, B, C, D, E, and F represents the six categories of discordances: Insertion (INS), implying missing sequence; deletions (DELs) implying extra sequence; compound multiple events including insertion/deletion/inversion etc. (COMPLEX); missing restriction sites (MCs); extra restriction sites (ECs); and large inversions or translocations (ITs)

Fig. 7
figure7

Examples of large-scale map vs. sequence discordances (IT discordances). a: GPS (Methods) viewer shows the BtOM1.0_chr1 optical map (center track) alignments to UMD3.1_chr1 and Btau4.6_chr1 revealing 759 Kb translocated region (UMD3.1) and a 1,284 Kb inversion (Btau4.6). Color scheme is the same as Fig. 5, with cyan highlighting BtOM1.0_chr1 portions aligning to UMD3.1_chr1 or Btu4.6_chr1. b: ~1 Mb region of chromosome 17: the in silico maps (blue track) of UMD3.1_chr17 perfectly align to the BtOM1.0_chr17, but Btau4.6_chr17/BtOM1.0_chr17 alignment reveals two misassembled regions in the Btau4.6 sequence build: an inversion (blue and gold arrows) and extraneous sequence. Restriction fragments (undulating boxes) are color keyed as: gold (agreement with blue track, or reference); red (extra cuts in optical map); cyan (missing cuts in optical map); and purple (compound events). c: ~500 Kb region on chromosome 19: alignment of BtOM1.0_chr19 and Btau4.6_chr19 (blue track) suggest that the Btau4.6_chr19 sequence (blue arrow) was probably misplaced and should be inverted and placed nearby. BtOM1.0_chr19 and UMD3.1_chr19 (blue track) alignment suggests that sequence here also presents misassemblies similar to Btau4.6_chr19

Similarly, Btau4.6 discordances were tabulated as just described for UMD3.1, but relied on the same optical consensus maps created from UMD3.1. These efforts identified 7,463 discordances in the Btau4.6 sequence assembly (Additional file 4: Table S2; Table 3; Additional file 5: Figure S3). Such tabulations include 102 large segments of inverted/translocated (IT) (involving 61.65 Mb; Fig. 7a), 2,331 COMPLEX-complex events/misassembly/inversion (involving 273.14 Mb; i.e., Fig. 7b and c), 782 INS-insertion/missing sequences (involving 82.71 Mb - sequence), 2596 DEL-deletion/extraneous sequence (involving 99.48 Mb), 1,166 EC-extra restriction sites, and 486 MC-missing restriction sites.

Discussion and Conclusions

A whole genome optical map, BtOM1.0, of the B. taurus Hereford breed, L1 Dominette 01449 was constructed using the same animal employed for whole genome shotgun sequencing, which was also the daughter of the Hereford bull L1 Domino (registration number 41170496) used for the construction of the previously analyzed BAC library [4]. The optical map spans 2,575.30 Mb across the B. taurus genome and comprises 78 optical contigs, which provide accurate size estimations for 289,155 BamHI restriction fragments. Alignments between BtOM1.0 and in silico restriction maps of UMD3.1 and Btau4.6 revealed numerous discordances at genomic length scales reaching from a restriction site to portions of a chromosomal arm. On average, there is a BamHI site every 8.91 Kb, and such “marker” density is far greater than the bovine genetic map (7,000 markers [14]) and the composite map (17,000 markers) which combines linkage and radiation hybrid resources [8, 9]. The size of the B. taurus genome, estimated by Optical Mapping, is similar to those estimates provided by the UMD3.1 and Btau4.6 genome sequence assemblies (2,660.90 Mb and 2,629.84 Mb respectively), but is ~17 % smaller than the 3,088 Mb size estimated by BAC fingerprinting [4]. Genome size differences may stem, in part, from genome analysis efforts that employed DNA samples from three different cattle breeds: Hereford, Holstein and Angus [4], whose separate BAC ibraries were used for previous mapping efforts.

Our analysis, through alignments to BtOM1.0, showed that Btau4.6 presented far more discordances as compared to UMD3.1. Overall, we found that Btau4.6 presented almost double the number of discordances across most of its 6 categories of sequence vs. map discrepancies. They include COMPLEX (misassembly) discordances as compared to UMD3.1 (2,331 vs. 1,374), which more than doubled the amount of affected sequence (273.14 Mb vs. 111.54 Mb). Further comparisons of Btau4.6 to UMD3.1 also showed a doubled count of DELs (extra sequence) discordances (2,596 vs. 1,207) and the amount of affected sequence (99.48 Mb vs. 44.81 Mb). Similarly, Btau4.6 presented a doubled rate of called INS (missing sequence) and Inverted/Translocated discordances (Table 3).

Comparisons of Btau4.6 and UMD3.1 to BtOM1.0 also revealed large-scale difference between these sequence assemblies. Such issues became most apparent through our analysis of optical map alignments at telomeric regions, or ends of chromosomes. Table 2 shows that Btau4.6 is missing more sequence at chromosome ends (38.35 Mb) as compared to UMD3.1 (19.14 Mb). Also, the Btau4.6 assembly of the X chromosome excluded ~60 Mb of sequence relative to BtOM1.0 (Fig. 8). Therefore, our comparative analysis results of UMD3.1 and Btau4.6 based on alignments to BtOM1.0 are in line with previous reports [11, 12], and affirm the NCBI’s current designation of UMD3.1 sequence assembly as the “reference” assembly and the Btau4.6.1 assembly as the “alternate” assembly [36].

Fig. 8
figure8

Map alignments among BtOM1.0 chromosome X and UMD3.1 and Btau4.6. GPS (Methods) (color scheme as in Figs. 3 and 5) reveals numerous and very large-scale discordances as a series of misassemblies. Zoomed portion shows details of such misassembled regions

There are numerous sequence gaps in the two B. taurus genome sequence assemblies (74,425 in UMD3.1 and 66,276 in Btau4.6). However, most of the sequence gaps are small, in that there are only 606 sequence gaps ≥ 2 kb in UMD3.1 and 5,450 in Btau4.6. Importantly, greater than 96 % of these gaps in UMD3.1 and Btau4.6 were successfully bridged by BtOM1.0 (584 and 5401, respectively). Accordingly, this analysis begs the question: Are the discordances called in UMD3.1 and Btau4.6, through alignments to BtOM1.0, largely due to sequence gaps inserted into the two sequence assemblies? We explored this question by intersecting the sequence gap and discordance coordinates from both sequence assemblies, and identified within UMD3.1 only 162/4,574 discordances: (86 COMPLEXs, 46 DELs, 5 INSs, 15 ECs, 0 MC, and 10 Inverted/Translocated), which intersected just 167/606 sequence gaps (≥2 kb). For Btau4.6 we found 3,801/7,463: (1,720 COMPLEXs, 1,625 DELs, 254 INSs, 160 ECs, 2 MCs, and 40 ITs; Materials and Methods), which intersected 4,586/5,450 sequence gaps (≥2 Kb). Thus, 27.6 % of the large sequence gaps in UMD3.1 contribute to only 3.4 % of the discordances called in UMD3.1, while 84.2 % of the large sequence gaps in Btau4.6 are responsible for 50.8 % of the called discordances in this assembly. As such, this simple analysis further substantiates the superior quality of UMD3.1 vs. Btau4.6, which in part, is reflected by the high rate of parsimoniously inserted sequence gaps.

Our systematic tabulation and curation of discordances found through comparison of BtOM1.0 vs. UMD3.1, or Btau4.6 will greatly facilitate future improvements of B. taurus genome sequence assemblies in order to build a more accurate and unified version of the reference sequence. Because BtOM1.0 was constructed from DNA derived from the very same animal that was sequenced, this physical map provides direct comparisons to these other resources that are not affected by genotype differences manifested by other breeds, or even animals of the same breed. Although there are many map resources available for the B. taurus genome, which include genetic linkage maps [5, 6, 1417, 37, 38], radiation hybrid maps [8, 9, 39], BAC physical maps [4, 7], cytogenetic maps [40, 41] and comparative maps between cattle and human [10, 42, 43], the resolution of these maps can be modest. Consider that the B. taurus composite map of integrated linkage/radiation hybrid maps [9, 39] and BAC physical maps [4, 7] features the greatest number of markers (17,254 markers), but with a density of only ~180 kb/marker. In comparison, BtOM1.0 boasts an average restriction site density of 8.91 Kb, which fostered resolution of difficult-to-discern errors in sequence assembly. For example, Fig. 9 shows a 79 Kb region that was inverted and misplaced based on alignment to BtOM1.0, which was also substantiated by new sequence data and PCR.

Fig. 9
figure9

Optical map (BtOM1.0_chr1) reveals a 79 Kb sequence assembly as being transposed and inverted in UMD3.1_chr1. (Top) Alignment shows an unaligned 79 Kb segment (pink bars) within the optical map and a corresponding gap in the sequence. (Middle) Same 79 Kb segment within the UMD3.1 build but apparently transposed to 60,578,754 bp and inverted. Accordingly, there is a 79 Kb gap in the optical map. (Bottom) Illumina paired-end Dominette L1 sequence data, aligned to UMD3.1 corroborates sequence misjoining points (red arrows) at 60,578,000 bp and 60,664,500 bp. PCR experiments confirms that the 79 Kb segment should be placed between 637,768 and 648,912 on chromosome 1 (data not shown). Paired-end reads mapped to UMD3.1 showing correct orientations with both ends mapped are named an intact pair (blue tracks). When only one end is mapped, or mapping shows wrong orientation, or revealing discordant distances between mapped read pairs, these events are termed broken pairs. Reads of a broken pair that map to a unique location against the reference are colored green or red, according to whether they mapped in the forward, or reverse, orientation respectively

During the course of writing this manuscript, a reviewer questioned the extent of map errors and possible biases that may be introduced through our selective use of UMD3.1 as the reference genome for BtOM1.0. Although our previous publications report a high degree of accuracy and minimal biases stemming from the iterative assembly pipeline [22, 23, 26], we compared the iterative assembly of optical maps constructed from UMD3.1 vs. Btau4.6 using chromosomes 27 and 28 (Fig. 10). Using Btau4.6 as the reference sequence, eight iterations (Fig. 1) and merging of optical contigs produced three optical map contigs for chromosome 27 and a single optical map contig was derived for chromosome 28. Alignments show that these new maps are essentially identical to BtOM1.0 except for a few restriction site differences (2 extra cuts, 5 missing cuts for chromosome 27; 4 missing cuts for chromosome 28). We attribute these minor differences to heterozygosity, since our calling of discordances uses a single representation of the physical map created by Optical Mapping. However, over an entire chromosome multi-Mb-scale differences are apparent. Fig. 10b shows three optical map contigs aligned to Btau4.6, chromosome 27, presenting a large gap (~5.3 Mb) between contig2 (8.80 Mb) and contig3 (1.89 Mb), and no gap between contig1 (30.52 Mb) and contig2, relative to BtOM1.0; while a single contig (45.27 Mb; Figs. 3 and 10) spans the same chromosome using UMD3.1 chromosome 27. In comparison, chromosome 28 shows a single optical map contig (43.20 Mb) generated from the Btau4.6 sequence as the starting reference, while two optical map contigs (contig1, 42.84 Mb; contig2, 3.01 Mb; Figs. 3 and 10) formed using the UMD3.1 sequence for chromosome 28 (Fig. 10a). The absence of the small contig2 (3.01 Mb) from the Btau4.6 derived optical contigs implies reduced coverage for chromosome 28. As such, our analysis shows that BtOM1.0 bears minimal local biases stemming from the choice of sequence build used for iterative assembly, but the overall optical map coverage varies. Fortunately, absent, or problematic genomic regions would then be covered, as required, by optical maps constructed by de novo techniques (Fig. 1). Consequently, the need for de novo assembly steps is minimized by judicious selection of a reference genome for iterative assembly of an optical map.

Fig. 10
figure10

Comparisons of optical maps assemblies seeded by UMD3.1, or Btau4.6. (Cyan tracks show in silico restriction maps generated from sequence; Orange tracks show optical maps. Striated black, or white-filled boxes flag unaligned map fragment(s). Black vertical lines demarcate restriction fragments, or map alignments.) a: Alignments of BtOM1.0_chrs 27, 28, against the in silico maps of UMD3.1--chrs 27 and 28 (also see Fig. 3). b: Iterative assembly results for chrs 27 and 28 using Btau4.6 as the reference sequence. Note large inversion (far right), not present in UMD3.1, or optical maps, revealed within the Btau4.6 chr27 sequence. c: Alignment of the optical maps shown in (a) and (b). Inset shows 3.5 Mb region; vertical lines locate BamHI restriction sites. Pink fragment (500.0 Kb) shows the gap between the two optical map contigs in BtOM1.0 chromosome 28

We conclude that BtOM1.0 will prove to be a valuable resource for advancing the state of current sequence assemblies, by serving as a largely independent physical scaffold, as shown in Figs. 5, 6, 7, 8, 9 and 10, but perhaps, more importantly, as a platform to support future comparative studies, focusing on structural variation amongst different cattle breeds, or within populations. Lastly, errors always accompany any ambitious effort pointed at comprehensive analysis of entire genomes. Accordingly, the true merits and accuracy of a new resource, such as BtOM1.0, will be comprehensively assessed over time by individual researchers in the bovine community.

Methods

DNA sample preparation

DNA was extracted from a peripheral blood sample (Dominette L1 014490; American Hereford Association registration number 42190680) provided by Leeson J. Alexander from Fort Keogh Livestock and Range Research Laboratory of USDA Agricultural Research Service; Miles City, Montana. Blood cells were pelleted at 350 g for 10 min at 4 °C and resuspended in red blood cell lysis solution (0.8 % NH4Cl, 0.1 mM EDTA, pH = 7.4) at room temperature for 10 min; white blood cells were then pelleted at 350 g at 4 °C for 10 min and then resuspended and washed in Dulbecco’s PBS (1.54 mM KH2PO4, 155.17 mM NaCl, 2.71 mM Na2HPO4, pH 7.2). High Molecular Weight (HMW) DNA for Optical Mapping was obtained by suspending white blood cells in 1 % low-melting point agarose in distilled water to form gel inserts [44] (1 million cells/ml) and lysed in modified NDSK (1 mg/ml proteinase K, 1 % lauroylsarcosine, 0.5 M EDTA, 1 M NaCl, pH 8.0) at 50 °C for two overnights with one switch of fresh NDSK solution after the first overnight; HMW DNA was extracted from prepared inserts for optical mapping as previously described [20, 24, 45].

Optical mapping

Optical mapping surfaces were prepared as previously described [19, 21, 45, 46]. Briefly, glass cover slips (22 × 22 mm, Fisher’s Finest) were cleaned by boiling in Nano-Strip (Cyantek Corp., Freemont, CA), followed by boiling in concentrated HCl, extensively rinsed with high purity water and sonicated until the pH of the wash reached 6.0 within 30 min, and then washed with ethanol twice with sonication. Cleaned glass cover slips were derivatized using trimethyl silane: (N-trimethoxysilypropyl-N,N,N-trimethylammonium chloride; 130 μl) and vinyl silane: (vinyltris(trimethysiloxy)silane); 15 μl) in 250 ml in distilled water to confer a positive charge and provide chemical moieties for covalent bonding of the acrylamide overlay to the surface.

DNA mapping, image acquisition, and processing

Bovine genomic DNA molecules (~400–500 kb) were premixed with lambda DASH II bacteriophage DNA (Stratagene, La Jolla, CA) as an internal sizing standard and then deposited on optical mapping surfaces using a silicone microchannel device [24]. A fully automated image acquisition microscope workstation (GenomeZephyr) with Mightex LED illumination (San Francisco, CA) acquired image data that was automatically processed by machine vision, within a pipeline, which compiled large files comprising ordered restriction maps for each imaged molecule (Rmap) [24].

Optical map assembly

Previous work [22] had confirmed that iterative assembly, which relies on a sequence reference map, constructs unbiased optical maps that are essentially equivalent to those crafted by a de novo method using a “divide and conquer” approach [22, 23, 26, 46]. Iterative assembly simply uses the reference sequence for anchoring Rmaps, which are then independently assembled in to optical contigs (Fig. 1). These newly assembled optical contigs become the updated reference for 8 cycles of alignment and assembly, which increase their breadth and depth. All accomplished without use of the sequence reference map. Accordingly, if a sequence reference suffers from many misassemblies, or gaps, de novo approaches are used to assemble across such regions. Because of sequence assembly issues, the B. taurus optical map incorporates these two assembly strategies for efficient and comprehensive map assembly, which used our G & G algorithm for de novo assembly.

We first used reference-based iterative map assembly and then removed the Rmaps in these assembled optical map contigs from the whole Rmap dataset, the leftover or the uncontiged Rmaps were for de novo map assembly via G & G. The combined map assembly strategy (Fig. 1) ensured the completeness of the final optical maps by maximizing the recovery of optical contigs from genomic regions not covered, or from heavily misassembled sections in the reference maps (UMD3.1 and Btau4.6).

Construction of chromosome-wide maps

Eight cycles of iterative assembly, using the UMD3.1 in silico map as the reference, produced thousands of overlapping optical contigs. Consensus maps of these contigs were merged using the map assembler into large-scale optical maps (Fig. 3). These large-scale optical maps were then further augmented and refined through additional merging operations using optical consensus maps generated from de novo assembly. After alignment to the UMD3.1 in silico BamHI restriction map, they were manually joined into chromosome-wide optical maps and viewed using GnomSpace --a map- centric genome viewer that facilitates inspection of alignments.

Calling discordances between the in silico maps of sequence assemblies and optical maps

As previously described [26] the iterative assembly pipeline automatically calls discordances, or structural variants using a reference map (UMD3.1, or Btau4.6. 5) classified as: (1) missing restriction sites [MC], (2) extra restriction sites [EC], (3) missing sequences, or gaps [DEL], (4) extra sequences [INS], and (5) compound, or complex [COMPLEX].

Very large scale, or complex discordances involving apparent translocations of sequence assemblies between chromosomes required manual intervention. These discordances were flagged as ITs (Inverted or Translocated sequences) and curated using map viewers developed in our group: GPS (unpublished work) and GnomSpace [26].

Genome viewer: GPS

Genome Polysemy and Synonymy (GPS; unpublished) is a visualization platform for the analysis of alignments between optical maps, optical contigs, and in silico restriction maps created from sequence data. The software takes an xml file consisting of several optical maps and their alignments, and converts them into an interactive graphical representation using Scalable Vector Graphics (SVG). The SVG engine within GPS enables users to zoom in/out, pan, arbitrarily position optical maps, or contigs, and highlight selected features in ways designed to greatly enhance visual analysis of alignments. Such advantages allow users to more fully understand compound events involving translocations, inversions, and frank aberrations, or discordances. GPS visualization capabilities are based on an Open Source SVG manipulating library called Apache Batik (http://xmlgraphics.apache.org/batik/), and the last version of Java (1.8). One of the most useful advantages of the software is its ability to efficiently process and render very large map alignments within sizable and complex genomes (~3 Gb). GPS source code is accessible here: https://github.com/schwartz-lab/genome-polysemy-and-synonymy

DNA sequencing

One lane of 150 bp PE Illumina sequencing was performed from blood extracted genomic DNA from Dominette L1 014490 to generate 515 million reads (the SRA archive number in NCBI: SRP05124). Reads were mapped to assembly UMD3.1 using CLC Bio Genomics Workbench software (CLC Bio, Aarhus, Denmark; 85 % of the reads mapped to UMD3.1) using the following settings: mismatch cost = 2; linear gap cost for insertions and deletions = 3; length fraction = 0.6; similarity fraction = 0.9; auto detect pair distance and ignore non-specific matches.

Ethics statement

The bovine blood sample used is the property of the ARS USDA, therefore, no specific permits were required for the described studies.

Abbreviations

BAC:

Bacterial Artificial Chromosome

BtOM1.0:

Bos taurus optical map version 1.0

Btau4.6:

Bos taurus genome sequence build 4.6 from Baylor College of Medicine Human Genome Sequence Center

COMPLEX:

Sequence misassembly based on optical map and sequence comparison

DEL:

Extraneous sequences based on optical map and sequence comparison

EC:

Extra restriction cuts based on optical map and sequence comparison

EDTA:

Ethylenediaminetetraacetic acid

FPC:

FingerPrint Contig

G & G:

"Germinate and Grow” for de novo assembly algorithm

GPS:

Genome Polysemy and Synonymy software

HMW:

High Molecular Weight

INS:

Missing sequences based on optical map and sequence comparison

IT:

Inverted/Translocated sequences based on optical map and sequence comparison

MC:

Missing restriction cuts based on optical map and sequence comparison

UMD3.1:

Bovine genome sequence build 3.1 from the Center for Bioinformatics and Computational Biology at the University of Maryland

Rmap:

Raw single molecule optical Maps

SRA:

Sequence Read Archive

SVG:

Scalable Vector Graphics

WGS:

Whole Genome Shotgun

References

  1. 1.

    Bollongino R, Burger J, Powell A, Mashkour M, Vigne JD, Thomas MG. Modern taurine cattle descended from small number of near-eastern founders. Mol Biol Evol. 2012;29(9):2101–4.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Bovine Genome S, Analysis C, Elsik CG, Tellam RL, Worley KC, Gibbs RA, et al. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 2009;324(5926):522–8.

    Article  Google Scholar 

  3. 3.

    Liu Y, Qin X, Song XZ, Jiang H, Shen Y, Durbin KJ, et al. Bos taurus genome assembly. BMC Genomics. 2009;10:180.

    PubMed Central  Article  PubMed  Google Scholar 

  4. 4.

    Snelling WM, Chiu R, Schein JE, Hobbs M, Abbey CA, Adelson DL, et al. A physical map of the bovine genome. Genome Biol. 2007;8(8):R165.

    PubMed Central  Article  PubMed  Google Scholar 

  5. 5.

    Bishop MD, Kappes SM, Keele JW, Stone RT, Sunden SL, Hawkins GA, et al. A genetic linkage map for cattle. Genetics. 1994;136(2):619–39.

    PubMed Central  CAS  PubMed  Google Scholar 

  6. 6.

    Ihara N, Takasuga A, Mizoshita K, Takeda H, Sugimoto M, Mizoguchi Y, et al. A comprehensive genetic map of the cattle genome based on 3802 microsatellites. Genome Res. 2004;14(10A):1987–98.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  7. 7.

    Schibler L, Roig A, Mahe MF, Save JC, Gautier M, Taourit S, et al. A first generation bovine BAC-based physical map. Genet Sel Evol. 2004;36(1):105–22.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  8. 8.

    Itoh T, Watanabe T, Ihara N, Mariani P, Beattie CW, Sugimoto Y, et al. A comprehensive radiation hybrid map of the bovine genome comprising 5593 loci. Genomics. 2005;85(4):413–24.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Jann OC, Aerts J, Jones M, Hastings N, Law A, McKay S, et al. A second generation radiation hybrid map to aid the assembly of the bovine genome sequence. BMC Genomics. 2006;7:283.

    PubMed Central  Article  PubMed  Google Scholar 

  10. 10.

    Larkin DM, Everts-van der Wind A, Rebeiz M, Schweitzer PA, Bachman S, Green C, et al. A cattle-human comparative map built with cattle BAC-ends and human genome sequence. Genome Res. 2003;13(8):1966–72.

    PubMed Central  PubMed  Google Scholar 

  11. 11.

    Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, et al. A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol. 2009;10(4):R42.

    PubMed Central  Article  PubMed  Google Scholar 

  12. 12.

    Zimin AV, Kelley DR, Roberts M, Marcais G, Salzberg SL, Yorke JA. Mis-assembled “segmental duplications” in two versions of the Bos taurus genome. PLoS One. 2012;7(8):e42680.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  13. 13.

    Burt DW. The cattle genome reveals its secrets. J Biol. 2009;8(4):36.

    PubMed Central  Article  PubMed  Google Scholar 

  14. 14.

    Arias JA, Keehan M, Fisher P, Coppieters W, Spelman R. A high density linkage map of the bovine genome. BMC Genet. 2009;10:18.

    PubMed Central  Article  PubMed  Google Scholar 

  15. 15.

    Kappes SM, Keele JW, Stone RT, McGraw RA, Sonstegard TS, Smith TP, et al. A second-generation linkage map of the bovine genome. Genome Res. 1997;7(3):235–49.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Barendse W, Armitage SM, Kossarek LM, Shalom A, Kirkpatrick BW, Ryan AM, et al. A genetic linkage map of the bovine genome. Nat Genet. 1994;6(3):227–35.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Fries R, Eggen A, Womack JE. The bovine genome map. Mamm Genome. 1993;4(8):405–28.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Samad A, Huff EF, Cai W, Schwartz DC. Optical mapping: a novel, single-molecule approach to genomic analysis. Genome Res. 1995;5(1):1–4.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Lai Z, Jing J, Aston C, Clarke V, Apodaca J, Dimalanta ET, et al. A shotgun optical map of the entire Plasmodium falciparum genome. Nat Genet. 1999;23(3):309–13.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Zhou S, Deng W, Anantharaman TS, Lim A, Dimalanta ET, Wang J, et al. A whole-genome shotgun optical map of Yersinia pestis strain KIM. Appl Environ Microbiol. 2002;68(12):6321–31.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  21. 21.

    Zhou S, Kvikstad E, Kile A, Severin J, Forrest D, Runnheim R, et al. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4.1 and its use for whole-genome shotgun sequence assembly. Genome Res. 2003;13(9):2142–51.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  22. 22.

    Zhou S, Bechner MC, Place M, Churas CP, Pape L, Leong SA, et al. Validation of rice genome sequence by optical mapping. BMC Genomics. 2007;8:278.

    PubMed Central  Article  PubMed  Google Scholar 

  23. 23.

    Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, et al. A single molecule scaffold for the maize genome. PLoS Genet. 2009;5(11):e1000711.

    PubMed Central  Article  PubMed  Google Scholar 

  24. 24.

    Dimalanta ET, Lim A, Runnheim R, Lamers C, Churas C, Forrest DK, et al. A microfluidic system for large DNA molecule arrays. Anal Chem. 2004;76(18):5293–301.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Valouev A, Li L, Liu YC, Schwartz DC, Yang Y, Zhang Y, et al. Alignment of optical maps. J Comput Biol. 2006;13(2):442–62.

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Teague B, Waterman MS, Goldstein S, Potamousis K, Zhou S, Reslewic S, et al. High-resolution human genome structure by single-molecule analysis. Proc Natl Acad Sci U S A. 2010;107(24):10848–53.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  27. 27.

    Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):10.

    PubMed Central  Article  PubMed  Google Scholar 

  28. 28.

    Anantharaman TS, Mishra B, Schwartz DC. Genomics via optical mapping III: contiging genomic DNA and variations. Courant Technical Report 1998;259:1-8.

  29. 29.

    Anantharaman T, Mishra B, Schwartz D. Genomics via optical mapping. III: Contiging genomic DNA. Proc Int Conf Intell Syst Mol Biol 1999:18–27.

  30. 30.

    Valouev A, Schwartz DC, Zhou S, Waterman MS. An algorithm for assembly of ordered restriction maps from single DNA molecules. Proc Natl Acad Sci U S A. 2006;103(43):15770–5.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  31. 31.

    Valouev A, Zhang Y, Schwartz DC, Waterman MS. Refinement of optical map assemblies. Bioinformatics. 2006;22(10):1217–24.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Lin J, Qi R, Aston C, Jing J, Anantharaman TS, Mishra B, et al. Whole-genome shotgun optical mapping of Deinococcus radiodurans. Science. 1999;285:1558–62.

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  34. 34.

    Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 2008;18(5):810–20.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  35. 35.

    Goldstein S, Briska A, Zhou S, Schwartz D. Sequences, Maps, Genomes and Graphs: Graph Compression Algorithms for Efficiently Comparing Genomes. UW Biostatistics and Medical Informatics Technical Report. 2004;181:1-11.

  36. 36.

    Database TBG. http://bovinegenome.org/ (Access date, 10/15/14)

  37. 37.

    Ma RZ, Beever JE, Da Y, Green CA, Russ I, Park C, et al. A male linkage map of the cattle (Bos taurus) genome. J Hered. 1996;87(4):261–71.

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Barendse W, Vaiman D, Kemp SJ, Sugimoto Y, Armitage SM, Williams JL, et al. A medium-density genetic linkage map of the bovine genome. Mamm Genome. 1997;8(1):21–8.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Williams JL, Eggen A, Ferretti L, Farr CJ, Gautier M, Amati G, et al. A bovine whole-genome radiation hybrid panel and outline map. Mamm Genome. 2002;13(8):469–74.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Eggen A, Fries R. An integrated cytogenetic and meiotic map of the bovine genome. Anim Genet. 1995;26(4):215–36.

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    Vaiman D, Schibler L, Oustry A, Schmitz A, Furet JP, Barendse W, et al. A cytogenetically anchored genetic map of bovine chromosome 1 obtained by integrating flow-sorted chromosome-derived microsatellite markers into the international bovine map. Cytogenet Cell Genet. 1997;79(3–4):204–7.

    CAS  Article  PubMed  Google Scholar 

  42. 42.

    Solinas-Toldo S, Lengauer C, Fries R. Comparative genome map of human and cattle. Genomics. 1995;27(3):489–96.

    CAS  Article  PubMed  Google Scholar 

  43. 43.

    Band MR, Larson JH, Rebeiz M, Green CA, Heyen DW, Donovan J, et al. An ordered comparative map of the cattle and human genomes. Genome Res. 2000;10(9):1359–68.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  44. 44.

    Schwartz DC, Cantor CR. Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis. Cell. 1984;37(1):67–75.

    CAS  Article  PubMed  Google Scholar 

  45. 45.

    Lim A, Dimalanta ET, Potamousis KD, Yen G, Apodoca J, Tao C, et al. Shotgun optical maps of the whole Escherichia coli O157:H7 genome. Genome Res. 2001;11(9):1584–93.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  46. 46.

    Zhou S, Kile A, Kvikstad E, Bechner M, Severin J, Forrest D, et al. Shotgun optical mapping of the entire Leishmania major Friedlin genome. Mol Biochem Parasitol. 2004;138(1):97–106.

    CAS  Article  PubMed  Google Scholar 

Download references

Acknowledgements

Thanks to Leeson J. Alexander, Vicki Leesburg, Derek Bickhart and George Wiggans from ARS, USDA for providing the blood sample; USDA and NHGRI for providing funding: (USDA-2012-67015-91425; NHGRI-R01HG000225) to DCS.

Author information

Affiliations

Authors

Corresponding author

Correspondence to David C. Schwartz.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SZ contributed to the design of the study, carried out partial data collection, performed optical map assemblies and comparisons between optical map and sequence, and drafted the manuscript. SG conceived and implemented Germinate and Grow and assisted map assembly efforts and statistical analysis. MCB contributed to the data collection and manual curation of the discordance between optical map and sequence. MP contributed to data analysis and figure generations. KP involved in data collection. LP developed new protocols. PR contributed to image analysis and figure generation. DP and JHO developed GPS software for viewing map alignments. GR and JFM contributed optical map and sequencing data, analysis, and manuscript revision. DCS conceived the study, contributed to experimental design and analysis, edited and assisted in the drafting of the manuscript. All authors read and approved the final manuscript.

Additional files

Additional file 1: Figure S1.

Rmap alignments (“hits”) against UMD3.1 for each chromosome; colored hash marks represent aligned Rmaps and annotated by tallies of coverage (X) and total mass (Mb). Rmap alignment for each chromosome is shown at the end of each chromosome. Green box (21,500,000–24,800,000 bp) highlights a 3.3 Mb region harboring dense Rmap alignments. Purple boxes (chr7:7,800,000–22,500,000 bp; chr12:70,360,000–76,785,000 bp) show regions of diminished Rmap alignments, suggesting that the sequence assemblies here are likely problematic. (PDF 16691 kb)

Additional file 2: Figure S2.

Rmap alignments (“hits”) against Btau4.6 for each chromosome; colored hash marks represent aligned Rmaps and annotated by tallies of coverage (X) and total mass (Mb). Green box (21,500,000–24,800,000 bp) highlights a 3.3 Mb region with sparse Rmap alignments. (PDF 10574 kb)

Additional file 3: Table S1.

Tabulation of discordances between BtOM1.0 and UMD3.1. (XLSX 399 kb)

Additional file 4: Table S2.

Tabulation of discordances between BtOM1.0 and Btau4.6. (XLSX 625 kb)

Additional file 5: Figure S3.

Circos plot of the discordances between BtOM1.0 optical maps and the in silico maps of the Btau4.6 sequence assembly. The tracks A, B, C, D, E, and F represents the six categories of discordances: Insertions (INSs), deletions (DELs), complex multiple events including insertion/deletion/inversion etc.(COMPLEXs), missing restriction sites (MCs), extra restriction sites (ECs), and large inversion or translocations (ITs). (EPS 3642 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, S., Goldstein, S., Place, M. et al. A clone-free, single molecule map of the domestic cow (Bos taurus) genome. BMC Genomics 16, 644 (2015). https://doi.org/10.1186/s12864-015-1823-7

Download citation

Keywords

  • Sequence Assembly
  • Radiation Hybrid
  • Bovine Genome
  • Genome Sequence Assembly
  • Scalable Vector Graphic