A second generation radiation hybrid map to aid the assembly of the bovine genome sequence

Background Several approaches can be used to determine the order of loci on chromosomes and hence develop maps of the genome. However, all mapping approaches are prone to errors either arising from technical deficiencies or lack of statistical support to distinguish between alternative orders of loci. The accuracy of the genome maps could be improved, in principle, if information from different sources was combined to produce integrated maps. The publicly available bovine genomic sequence assembly with 6× coverage (Btau_2.0) is based on whole genome shotgun sequence data and limited mapping data however, it is recognised that this assembly is a draft that contains errors. Correcting the sequence assembly requires extensive additional mapping information to improve the reliability of the ordering of sequence scaffolds on chromosomes. The radiation hybrid (RH) map described here has been contributed to the international sequencing project to aid this process. Results An RH map for the 30 bovine chromosomes is presented. The map was built using the Roslin 3000-rad RH panel (BovGen RH map) and contains 3966 markers including 2473 new loci in addition to 262 amplified fragment-length polymorphisms (AFLP) and 1231 markers previously published with the first generation RH map. Sequences of the mapped loci were aligned with published bovine genome maps to identify inconsistencies. In addition to differences in the order of loci, several cases were observed where the chromosomal assignment of loci differed between maps. All the chromosome maps were aligned with the current 6× bovine assembly (Btau_2.0) and 2898 loci were unambiguously located in the bovine sequence. The order of loci on the RH map for BTA 5, 7, 16, 22, 25 and 29 differed substantially from the assembled bovine sequence. From the 2898 loci unambiguously identified in the bovine sequence assembly, 131 mapped to different chromosomes in the BovGen RH map. Conclusion Alignment of the BovGen RH map with other published RH and genetic maps showed higher consistency in marker order and chromosome assignment than with the current 6× sequence assembly. This suggests that the bovine sequence assembly could be significantly improved by incorporating additional independent mapping information.


Background
The global importance of cattle production has resulted in considerable efforts to detect the genes controlling variations in economically important traits. This task is greatly facilitated by the availability of molecular markers ordered along chromosomes. In the last decade a number the bovine genome maps have been published, many of them based on genetic linkage between markers [1][2][3]. A major disadvantage of linkage maps is that only polymorphic loci can be included, whereas, RH maps can be constructed using sequence information from nonpolymorphic loci. Therefore, RH maps potentially contain more coding loci than linkage maps facilitating comparative mapping across species. In contrast to linkage maps, which exploit the frequency of natural recombination between markers to calculate distances and orders of markers, RH maps are constructed using the probability of breaks between markers induced by radiation. Several whole genome radiation hybrid (WGRH) panels are available for cattle that have been used to construct RH maps [4][5][6][7][8]. These RH maps have been used to create comparative maps between bovine and human chromosomes through the alignment of the loci derived from coding sequences [9][10][11][12][13][14][15]. The RH maps can also be integrated with other bovine physical maps such as BAC maps constructed by fingerprinting methods by identifying the marker loci within e.g. BAC end sequences [16,17]. This additional mapping information facilitates the ordering of fingerprint contigs and the construction of physical BAC maps covering whole chromosomes. Such physical BAC maps provide a valuable starting point for genome sequencing [18][19][20][21]. Fingerprint contig BAC maps have been constructed for cattle using clones from the INRA BAC library [22] and the CHORI-240 BAC library [23]. The ultimate map for a species is the correctly assembled genome sequence. The bovine genome sequencing project started in 2003 and uses a combination of whole genome shotgun sequences (WGS) and sample sequencing of a minimum tiling path of BAC clones spanning the genome. The current, publicly available, bovine genomic sequence (Btau_2.0) has 6-fold genome coverage from WGS assembled into scaffolds and aligned on the chromosomes using limited mapping data. The use of RH and linkage map information [24] would greatly improve the genome sequence assembly [25].
Here we report a second generation RH map of the bovine genome which can be used to improve the construction of an integrated bovine genomic map. Sequences of the markers used to construct the map were aligned with the MARC 2004 linkage map and the Illinois-Texas (ILTX 2005) RH map [15] to investigate discrepancies. Loci that were unambiguously placed in all the maps were then aligned with the Btau_2.0 sequence to identify potential problems in the current sequence assembly.

Results
Radiation hybrid map 2735 markers were added to those on the first-generation RH map of Williams et al. [7], of which 2473 are newly mapped loci and 262 are previously reported AFLP markers [26], giving a final total of 3966 markers ( Table 1). The majority of the new markers, 1999, are within genes, 1072 are microsatellite loci, 262 AFLP markers, 376 BAC end sequences and 257 are from ESTs sequences that do not show convincing similarity to the annotated bovine sequence. The RH chromosome maps constructed from this data can be viewed and information downloaded from the ArkDB database [27].
The total length of the whole genome RH map, including all bovine autosomes and the X chromosome is 760 Rays (R). The map of BTA 28 is the shortest at 1141 cR and the longest is BTA 7 (4408 cR). The average marker interval, over the whole genome, is 19 cR ranging between 12 cR (BTA 29) to 29 cR (BTA 20). Distance comparisons between common markers on the RH map, MARC 2004 linkage map and the bovine sequence suggests, on average, 1 cR on the BovGen RH map is equivalent to 0.04 cM and 23 Kbp, respectively, although this varies considerably across the genome.

Comparison with the ILTX 2005 RH map
There are 160 marker loci in common between the Bov-Gen RH map described here and the Illinois-Texas (ILTX 2005) RH map [15]. All of these common loci were assigned to the same chromosomes on both maps (see Additional File 1).
Three chromosomes (BTA 19, 28 and 29) cannot be assessed for consistency of their order between the ILTX 2005 and BovGen RH map because they have no markers in common. For the remaining 27 chromosomes, 19 are consistent with the BovGen RH map. For example, the BovGen RH map of chromosome 14 has 9 markers in common with the ILTX 2005 map and the order agrees between maps (Figure 1). Out of the 27 chromosomes another 7 have one marker out of 3 to 9 corresponding loci inconsistently positioned. On BTA 8 there are marker order discrepancies between the maps involving 2 out of the 11 corresponding markers.

Comparison with MARC 2004 linkage map
There are 885 marker loci in common between the Bov-Gen RH and the MARC 2004 linkage map [3] which allows a detailed comparison of map order and chromosome assignment. Inconsistencies in chromosomal assignment are found for 5 of these 885 loci (see Additional File 2). In all these cases only individual markers are involved.    Figure 2). Despite the similarity the marker order as suggested by the MARC 2004 map is inconsistent with the multipoint map BovGen RH data. If the RH data is forced into the order of the MARC 2004 map a much lower probability for the map is obtained. Thus, in order to determine the true order of these markers additional information is required.
On a further 13 chromosomes, minor discrepancies between these maps were observed. On BTA 3,5,8,9,12,17,19,22 and X the order of markers is essentially the same, but with a number of individual markers at different positions. For BTA 1, 2, 6 and 26 differences were observed involving the orientation of groups of markers, but with a conserved order of markers within the group. For example, on BTA 26 the marker order is in general consistent between the BovGen RH and the MARC 2004 linkage map, however two small groups of linked markers 26_A (BMS882, TGLA429, BMS2567 and BM6041) and 26_B (MAF36, ILSTS091, MAF92 and BM804) have the same marker order in both maps, but are inverted with only one marker (BM7237) at a divergent position ( Figure  3).
On four chromosomes major inconsistencies are observed where groups of linked markers map to different chromosomal positions (BTA 7,29) or where the order of markers differs within several marker groups (e.g. BTA 7,15 and 20). On BTA 7, for example, the position of two groups of linked markers 7_A (limited by the markers CSKB071 and TGLA303) and 7_B (limited by the markers BM6105 and BM2607) are exchanged. In addition the group 7_A is in a different orientation in both maps, while the marker order in 7_B is inconsistent (see Figure 4). Nevertheless, these discrepancies only involve about a quarter of the chromosome, and 12 out of the 38 common markers. The map positions of the other 26 markers are in close agreement between the two maps. These differences could be further investigated by comparison with additional mapping information (see below).

Comparison with the 6× bovine assembly
Of the 3966 markers successfully included in the RH map, 2898 could be unequivocally assigned to a position in the Btau_2.0 bovine sequence, 2767 were assigned to the same chromosome, but 131 mapped on different chromosomes between the BovGen RH map and the sequence (Additional File 3). On seven chromosomes inconsistent assignments involving groups of three or more markers were observed (Table 2).
On all but two chromosomes (BTA 9 and 14) there were many differences between the map order and the sequence: on many chromosomes large discrepancies involving groups of linked markers and/or large numbers of individual loci were seen, particularly on chromosomes 5, 7

Discussion
The ability to determine the order of close markers on genome maps differs between approaches, and all approaches, including the assembly of a whole genome sequence, are prone to errors. In some cases insufficient information is available to assign the correct order or positioning of loci, while data errors can introduce distortions in the maps. The ultimate genome map of a species is the correctly ordered DNA sequence. Achieving the correct sequence assembly uses several sources of information. Sequence information from other species, including the human genome could be used as a template, but this approach should be treated with extreme caution as species specific variations are known [28]. Therefore, direct sequence information is used for the local assembly of shotgun sequence reads into contigs, and these contigs are then assembled into scaffolds using additional information, such as overlapping clones, and sequences from paired clone ends. The ordering of these scaffolds on chromosomes and assembly of the final sequence relies on additional mapping information, including BAC fingerprint contig maps, linkage maps and RH maps. In this paper we describe an RH map with almost 4000 mapped loci which will contribute to the assembly of the bovine genome sequence.

Comparison with other linkage and RH maps
The consistency in ordering of common loci can be assessed across different maps, however, it is important that the information used when assembling the maps is independent, as circular arguments can give a false measure of agreement. The approach of e.g. Itoh et al. [8] was to use the linkage map as template for their RH map; in contrast we did not use any prior information to construct the RH map presented here. This was because the aim was to assemble the most likely map using completely independent data and so not to propagate potential errors across different maps. Resolving these inconsistencies often requires the use of additional independent evidence such as BAC FPC mapping data or cytogenetic (fluorescent in-situ hybridisation, FISH) information. We carried out an alignment of the BovGen RH map with the other available bovine genome maps and the Btau_2.0 sequence assembly, but only after the RH maps had been constructed. While this approach relies on only one source of information it may not result in the "best" possible map, however, it avoids bias and the resulting independent map can then be used to develop a combined map which carries a measure of map confidence based on similarity and differences between maps.
The BovGen and ILTX 2005 RH map appear to be more consistent with each other than with the MARC 2004 linkage map. Some inconsistencies between linkage and RH maps may be due to the different mapping approaches. However, the observation of the apparently higher consistency between the RH maps must be treated with care as the BovGen RH map has fewer loci in common with the ILTX 2005 map than with the MARC 2004 linkage map, and so fewer discrepancies could be detected. The ILTX 2005 map was constructed on the basis of the first-generation map (ILTX 2004) by increasing the marker density and a subsequent rigorous removal of markers which did not pass a quality control procedure [15]. In this process a significant number of markers common to both the Bov-Gen RH and the ILTX 2004 map were removed and as a

Comparison with the sequence assembly
Sequence similarity search algorithms used to align maps with Btau_2.0 have a considerable risk of errors as they may also detect gene duplications or similar motifs in dif-ferent genes. To minimize this problem we used very stringent parameters for minimum homology and maximized the required length of overlap between sequences. In addition, sequence matches were assessed manually. Thus the loci we aligned between the different maps and the bovine sequence carry a very high probability of correctly assigned homology. While the BovGen RH map is in general agreement with the ILTX map and the MARC 2004 map, there is poor agreement with the Btau_2.0 sequence at specific chromosomal regions. In such regions, e.g. those described above on chromosomes 7, 25 and 29, the assembled Btau_2.0 sequence is most consistent with the linkage map. This is not surprising, because, among other sources of information, the MARC 2004 map was used to order the sequence scaffolds in Btau_2.0. Recalculating the BovGen data for these chromosomes and forcing the markers into the order they appear in the sequence assembly significantly increases the map length and reduces the probability showing that our data are not consistent with the sequence order. Further information must be generated to resolve such inconsistencies.

Assignment of markers to different chromosomes
A problem in the genome assembly is that of erroneous assignments of sequence scaffolds. By comparing assignments among the different RH and linkage maps [1,39,40] and also using comparative human [30,31,[41][42][43][44][45] or mouse [29] information, it seems likely that the assignment in the bovine assembly is most often at fault ( Table 2). For example the markers PTK2B, BZ948637 and B4GALT1 (Table 2, case 4) are closely linked on the Bov-Gen RH map of BTA 8 and the linkage map of Barendse et al. [1] which also places these genes on BTA 8. This is also consistent with data from Fiedorek & Kay [29] who mapped PYK2B (alias PTK2B or Fadk) on murine chromosome 15 and Inazawa et al. [30] who mapped the gene on human chromosome 8 at positions which share conservation of synteny with BTA 8 [15]. However, these marker loci are placed on chromosome 5 in the Btau_2.0 sequence assembly. All three markers are located on a single sequence scaffold (chr5.80), suggesting that the chromosomal assignment of this scaffold is likely to be incorrect.
A group of neighbouring markers formed by KIAA0284, Q9Y4F5, KNS2 and BTBD6 were assigned to BTA 11 on the BovGen RH map; however, this assignment is not consistent with other mapping data ( Table 2, case 5). The human homologues of these loci are located on human chromosome 14 [31], suggesting that this group is correctly assigned in the Btau_2.0 sequence to chromosome 21 and that in this case the BovGen RH assignment is incorrect. Nevertheless, the linkage of this group to other markers on BTA 11 is convincing with LOD linkage values up to 13.8 between the extreme marker KIAA0284 and the neighbouring markers on the BovGen RH map. If this marker group is tested with the markers located on BTA 21 using the BovGen RH datasets it shows no linkage. In the Btau_2.0 assembly this marker group is at an extreme telomeric position which suggests that the statistical support for this assignment is weak. This chromosomal assignment may have been made on the expected position derived from the supposed conservation of synteny between human and cattle chromosomes and should be tested using independent evidence.
The markers BZ850749, CC517527 and CC471629 are assigned to BTA 14 on the BovGen RH map and to BTA 25 in the Btau_2.0 sequence assembly ( Table 2, case 6). These markers are derived from BAC end sequences of clones from the CHORI-240 library and are not present on other maps. All these markers are assigned to the scaffold Chr25.84 and are in a chromosomal region of the assembly with a low density of corresponding markers. In contrast on the BovGen RH map, the markers in the same region are at a higher density. This suggests that these markers are more tightly linked on the BovGen RH map. No further information is available to resolve this inconsistency.
Independent information is essential to produce the best maps of the bovine genome and to assemble the most accurate sequence. In addition to the RH mapping approach and linkage mapping that have been discussed here the refinement of the sequence should use additional sources of information such as BAC FPC maps, comparative mapping, fluorescent in situ hybridization, and somatic cell hybrid mapping.

Conclusion
There is reasonable consistency between the RH map presented here, the MARC 2004 linkage map and the ILTX 2005 map. However, where the maps differed it is usually not possible to determine which order of markers is correct. Manipulating the data to make the different maps match is not productive. When the major discrepancies are removed a number of inconsistencies with the Btau_2.0 bovine sequence assembly still remain. Using the various mapping information it is possible to identify potential errors in the assembly of the current bovine genome sequence which should be investigated further to aid the improvement of the next sequence build.
Using the information presented here it will not be possible to reach a final version of the sequence. The Btau_2.0 sequence assembly contains more than 100,000 scaffolds of which only 4409 are anchored to chromosomes using markers from the genetic map, and about half of the anchored scaffolds contain two mapped markers allowing them to be orientated. The data presented here will increase the number of scaffolds that can be assigned and orientated. Nevertheless it will be necessary to use additional information such as fingerprinting or BAC skim data and physical maps, such as FISH based techniques, which in addition to comparative mapping data will help to finalize the assembly and yield a reliable sequence. The primer design was carried out using dedicated software now in the public domain [32]. The software uses the nearest-neighbour method [33] to predict the complimentarily of primers and secondary structures (dimers, hairpin etc.) and is able to process large number of sequences in batches, picking primers in designated regions. To minimize the amplification of hamster DNA contained within the RH panel cell lines, primer pairs were designed with one primer within exon, the other within the adjacent intron or non-coding sequence.

Sequencing of ESTs
The primer design was standardized to achieve a maximum of uniformity in amplification conditions. Primer details are available to the public in the ArkDB database [27].

Screening of the Roslin RH panel
2473 marker loci were successfully typed on the 94 cell lines of a 3000-rad bovine/hamster RH panel as described by Williams et al. [7]. Vectors of 262 AFLP markers [26] were added to the dataset. Resulting vectors for the 3966 marker loci used (including 1231 previously mapped loci [7]) are available in the Additional File 4 for download.

RH data analysis
RH vectors were assigned to chromosomes by analysing 2pt linkage with mapped loci [7] using RH mapper [34]. Multipoint maps were constructed using the default algorithm of the Carthagene software [35]. The initial multipoint map was improved by an iterative process of inspection of marker loci and removal and alternative addition of badly linked or disrupting loci. This process resulted in the removal of 122 loci that could not be reliably fitted into the chromosome maps with the highest probability. The best maps generated by this process were compared to the ComRad RH-map [7] and the MARC 2004 linkage map [3] and regions showing discrepancies examined in detail to identify the presence of problem markers. Marker positions on the maps are available from the ArkDB database [27].

Mapping of marker associated sequences against the bovine sequence assembly
ESTs sequences used to design the primers for mapped loci were aligned with the assembled 6× bovine sequence assembly (Btau_2.0) using BLAST [36] and SPIDEY [37].
To filter out incorrect alignments the BLAST e-value was set to a maximum of 1e-20 and minimum percent identity to 90%. In addition, the relative length of the BLAST hit (i.e. coverage, or length of the hit divided by the length of the query sequence) had to be at least 80%. Where ambiguous alignments were observed higher stringency filters were applied (sequence similarity higher than 97.5% and coverage higher than 90%).

Diagrammatic representation of chromosomal maps
Visual representation of map alignments was achieved using cMap [38].