Over the past two decades more than 20 substantial genetic maps have been published for different Brassica species but little concerted efforts has been made to align maps from different populations. We have collated both published and previously unpublished genome-wide genotype data for sequence-tagged RFLP and SSR markers scored on three widely used Brassica napus populations of doubled haploid lines (BnaSNDH, BnaSGDH and BnaDYDH).
Constituent genotype matrices for each of the 19 linkage groups (LGs) were first combined to generate a consolidated genetic map for each population. Integration of component genetic maps involved selection either of bridge markers shared between populations or of markers with the highest information content to represent each unique mapping locus (bin). The skeleton bin maps for the three populations were then combined to generate an integrated map for each LG, comparing two different approaches, one encapsulated in JoinMap and the other in MergeMap. JoinMap made use of the full set of available genotype scores whilst MergeMap made use of the marker orders and cM distances of the component maps. Although the performance of MergeMap depends on the quality and accuracy of marker order within component maps, this approach has been shown to outperform JoinMap both in terms of accuracy and running time based on simulated data , and has been used successfully to construct integrated maps in barley  and cowpea .
In the present study, a relatively low proportion of marker loci (20.2%) were common to at least two populations. This may not provide sufficient information to overcome a few cases of uncertainty in locus order that were present in the component maps (e.g., between BnaSGDH and BnaDYDH for A04 and A07, and between BnaDYDH and the other two maps for C06, Additional File 1). However, for the purpose of map alignment/integration, the consistency of order among common markers between individual maps appeared to be more important than simply the number of shared loci. Our results demonstrate that the marker order was generally well conserved (i.e., a high level of collinearity) in the component maps, which provided a good foundation for the subsequent map integration analyses. Indeed, both JoinMap and MergeMap generated integrated maps with good consistency in marker order (measured by Spearman's rank correlation coefficient r) compared with component population-specific maps for most LGs (JoinMap, r > 0.90 for all three pairwise comparisons for 11 LGs; MergeMap, r > 0.95 for the three pairwise comparisons for all 19 LGs). MergeMap improved the marker order consistency for some LGs where JoinMap performed relatively poorly (e.g., A07, A08, C05 and C06).
There may be several reasons why JoinMap appeared to perform relatively poorly for some LGs. This includes the low number of shared 'bridge' markers between component maps which may hide underlying conflicts in genotype ordering that is accessible to JoinMap and not used by MergeMap. Resolving such conflicts in marker order is relatively straightforward for MergeMap as it makes use of directed acyclic graphs (DAG) to generate a single directed graph according to their shared vertices. Any ordering conflict between individual maps resulted in cycles in the combined graph. MergeMap then resolves the cycles (conflicts) by identifying and eliminating a small number of marker occurrences from some of the maps after weighting marker order differences. MergeMap only requires the marker order and cM distances of the component maps rather than the data of original genotype scores of individual populations. Thus it may be possible for consistent errors in the marker order or interval lengths in a majority of component maps to be incorporated into the integrated maps. However, in this study we can be reasonably confident that the component maps were a reliable representation of B. napus chromosomes, since the maps from independent populations and in different laboratories generated similar marker order. MergeMap was therefore expected to produce a relatively reliable marker order in the integrated map. In contrast, JoinMap is constrained by its need to resolve a consistent marker order in the integrated map based on a limited number of mean recombination frequencies and combined LOD scores. For both methods, when the degree of marker order inconsistency increases between individual maps, the performance becomes relatively inferior. Establishing the thresholds of such inconsistencies will be important for more extensive map integration where larger numbers of maps and/or reduced numbers of bridge markers are available.
Furthermore, one should note that there would be always conflicting markers between/among different component maps to be merged (Table 4). These conflicts of marker orders could be derived from the genome structural variation (deletion, inversion and translocation) between populations for some LGs or mapping errors. Thus, low correlations between the integrated map and a particular population-specific map, along with good correlations between the integrated map and the other two component maps (Table 3 and 4), could be indications of genome rearrangements in one of the populations. Further investigation of the dot plots (Figure 2 and Addition File 4) may identify the event(s) which creates such marker order conflicts.
As part of the pre-processing of genotype data prior to map integration, we carried out a masking of genotype scores where single data points were eliminated where a single locus was flanked by a double crossover. This process provides more consistent genetic lengths for specific linkage groups, and more realistic lengths between adjacent crossovers that represent exchange of large chromosomal regions. This process may also eliminate some actual genetic exchanges. However, since these would be short they will have only a small effect on the final map. Following this procedure a degree of map inflation still remained compared with those published previously for BnaSNDH [7, 16] and BnaDYDH [19, 25], which is often encountered when large numbers of markers are employed due to the cumulative effect of the low background error rate. Any overestimation of genetic length is incorporated into integrated maps calculated by MergeMap. In contrast, JoinMap makes use of all available pairwise recombination frequencies and LOD scores, and so LG lengths were closer to expectation and appeared more reliable, with good agreement with previously published component maps. In addition, JoinMap was also able to resolve a greater number of unique marker loci across all LGs, increasing the number of loci by 22.8% compared with MergeMap.
The heuristic method employed in MergeMap greatly enhances the speed of map integration compared with the regression mapping algorithm employed in JoinMap, especially where large genotype matrices are used. Indeed JoinMap is limited by the matrix size for dense maps, and so the problem needs to be broken down into sub-problems, either by bin mapping as we have done here, or by taking overlapping sub-sections of LGs, which does not provide an ideal solution. Pragmatically, where accurate estimates of genetic distances are not the priority, MergeMap provides a rapid and relatively reliable solution, especially where component maps have been generated with consistently low error rates for marker scores. The MergeMap algorithm has been successfully applied for map integration where either a large number of genetic markers are involved, such as high-throughput SNP genotyping , or where genotyping data were not available for many published genetic maps . However, JoinMap still performed well in map integration based on our map construction procedure for the three B. napus DH populations.
Overall, the BnaWAIT_01_2010a integrated map generated by the JoinMap method included 5,162 markers, compared with 1,317 markers in the previous reference BnaSNDH map of Parkin et al.  and 866 markers in the BnaDYDH map reported by Delourme et al. . This increased the marker density by 3.3 and 5.8 fold, respectively. Furthermore, the nine LGs representing C genome chromosomes contain 11.6% more markers and 11.8% more loci than the ten LGs representing A genome chromosomes in the BnaWAIT_01_2010a map. This is in close agreement with the estimated 16% larger size of the C genome [53, 54].
The BnaWAIT_01_2010a integrated map enabled us to test existing models of collinearity between Arabidopsis and Brassica. This analysis was based on twice as many markers where sequence similarity to Arabidopsis could be identified, compared with the BnaSNDH map of Parkin et al. . We identified 103 conserved colinearity blocks in B. napus relative to Arabidopsis. These corresponded to almost all 97 B. napus blocks reported in the BnaSNDH_03_2005a map, although we did not resolve 17 short blocks previously identified based solely on RFLP markers . Although the same homology hits were identified between the Arabidopsis genome and 50 RFLPs within these 17 short blocks, the criteria to define a collinearity block (i.e., four homologous loci with at least one shared locus within every 5 cM in B. napus and at least one shared locus within every 1 Mb in Arabidopsis) were not met in our study. Moreover, these short blocks only represented <5.0% of the total mapped length of the BnaSNDH_03_2005a map. Five previously unreported collinearity blocks were identified in our study. However, these new blocks covered only 14.5 cM of genetic length in total, aligned to 7.0 Mb in Arabidopsis chromosomes 3 and 5. We further established that the synteny order of the 48 collinearity blocks within the A genome of B. napus in BnaWAIT_01_2010a is essentially the same as that established in B. juncea based on intron polymorphism (IP) markers . This indicates that synteny order is highly conserved in the A genomes of B. juncea and B. napus.
We attempted to align 3,837 primer sequence pairs for the SSR markers to the Arabidopsis chromosomes to identify homology with the resultant target 'virtual PCR product' of primers. However, <2% of the primer pairs had homology in Arabidopsis, of which only 50% agreed with those identified using the corresponding SSR clone sequences. This suggests that future comparative studies within the Brassicaceae based solely on SSR primer sequences are unlikely to provide useful information where sequences have diverged over similar time scales.
The increased marker density provided by the integrated map is a valuable resource that increases the availability of markers in regions of interest, thus assisting in fine mapping. It also provides additional information for comparative mapping studies, e.g., to detect potential genome rearrangements in some populations. Furthermore, the increase in density of sequence tagged markers and availability of draft genome sequence scaffolds, enabled us to carry out a preliminary investigation of the relationship between genetic and physical distances in the Brassica A genome. This indicated that the chiasmata were not evenly distributed within chromosomes, and that there was considerable variation in the pattern of crossovers between chromosomes. Many studies have suggested the distribution of meiotic crossover events along chromosomes in plants and other species is non-random [62–66]. Non-random distributions of crossover rates have been reported to be correlated with several chromosomal features, including chromosome size, gene density, presence of transposable elements or heterochromatin, and distance to centromeres [67–72]. However, the underlying mechanisms affecting chiasmata distribution may be taxa specific , and so it is important to establish any relationships within or between Brassica chromosomes and species. Within the C genome of B. oleracea, a clear difference in relationship between genetic and physical distances has been established for IDBs on C6 . The analysis we have carried out is preliminary and any mechanistic understanding will require more complete genome sequence scaffold data that include details of the distribution of repetitive DNA and of degree of chromatin condensation. In addition, it may be necessary to select additional markers that represent the full length of individual chromosomes. Based on complete genome sequence data, Drouaud et al.  have been able to resolve details of non-random distribution of chiasmata in relation to heterochromatic knobs and other chromosomal feature on Arabidopsis chromosome 4. Access to larger populations and more reliable sequence-tagged mapping methods (e.g., high-density SNP mapping) are likely to increase the resolution and understanding of the basis of variation in recombination frequency in Brassica.
We also attempted to anchor the remainder of the unanchored A genome scaffolds onto LGs based on the B. napus integrated map, and this anchored three additional scaffolds. Given the genome structure of Brassica, some scaffolds will be in repeat-rich or duplication regions, and thus it is difficult to resolve the LG assignments.