In the first linkage map integration across independent mapping populations in white clover, we present a comprehensive analysis of the white clover genome, based on SSR and candidate gene markers aligned to the Medicago genome, with a set of mapped molecular markers made available for the research community. This map provides markers enabling homoeologue matching among populations, and thoroughly resolves all linkage groups. Furthermore, the integrated map is a robust composite assessment of the white clover genome, being derived from component linkage maps that reveal very similar data in terms of marker order, genome arrangement and map size; despite being based on dissimilar populations and distinct marker sources. This work complements prior genetic linkage maps [19–22], recent trait-focused studies [24–26], and enriches prior macrosyntenic alignments of T. repens with Medicago[22, 34, 35].
This integrated map is anchored by gene-targeted SSR markers mined from a white clover GeneThresher® (TrGT) genomic DNA sequence and from ESTs. EST-derived markers exhibit less polymorphism, but have a higher probability of being directly linked to a causative gene than genomic SSRs . Repeat number in EST-SSRs is usually low and a predominance of trinucleotide motifs is explained by changes in other common motif lengths causing frame shifts disrupting coding sequence [3, 55]. The white clover EST-SSR source had a preponderance of trinucleotide motifs and a mode of four repeats per array , whereas the methyl-filtered TrGT source was predominately dinucleotides motifs, with a mode of eight repeats. Only 71% of EST-derived SSRs produced PCR products , compared with 86%  and 92% from array targeted white clover genomic libraries  and from TrGT-derived SSRs in this study. Intron presence may affect the efficiency of generating amplicons from expressed sequence sourced SSRs, as well as influencing the observed versus predicted amplicon size. Mean observed amplicon size of white clover EST-derived SSRs was 128% of the size predicted in silico, compared with 103% for TrGT-derived SSRs. Reduced amplification efficiency attributed to M13(-21) primer-based fluorophore addition  has been demonstrated [57, 58], suggesting that more than 92% of the TrGT-derived SSR primer pairs are viable.
Literature on efficiency of SSR mining from GeneThresher® methyl-filtered sequence is scarce. Gill and co-authors  reported 0.8% of sequenced GeneThresher® clones from perennial ryegrass contained SSR arrays. This contrasts with 4.4% of white clover sequences in the present study, and 7% of EST-derived sequences in Barrett et al. . A species-related difference in array density has not been noted in other libraries, and may be a unique feature of the GeneThresher® system interaction with genome size, or other factors. The SSR array density in white clover GeneThresher® and EST sequences are both higher than in a genomic sequence of BAC end surveys .
While BLAST results suggest methyl-filtration enriched for genic regions of the white clover genome, 61% of the SSRs were dinucleotide motif repeats. These values agree with genomic DNA surveys in which 48-67% of SSRs found among a range of species are dinucleotides . Inspection of the TrGT database revealed most dinucleotide motif SSRs to be near but outside open reading frames (data not presented) and therefore unlikely to disrupt coding sequences with changes in array length. The increase in SSR array length and polymorphism detected by TrGT-SSRs relative to EST-SSRs also suggests they are from non-coding sequence.
While SSR polymorphism reflects the breeding system and diversity of the subject species, previous studies have shown that genomic SSRs are more informative than EST-SSRs [13, 14]. This is supported by the contrast of white clover TrGT-SSRs with the EST-SSR resource of Barrett et al.  where a greater proportion of TrGT-SSRs were polymorphic, and more alleles per polymorphic primer were identified.
Linkage mapping and multi-population map integration
Development of this integrated genetic linkage map relied on parental consensus maps from two unrelated, independent full-sib populations. Furthermore, while these maps were based predominantly on discrete marker sources and MP2 had a greater number of marker loci (49%) and density (39%) relative to MP1, both maps revealed largely similar views of the genome. There was only a 10% increase in map length from 1144 cM to 1264 cM for parental consensus maps of MP1 and MP2, respectively. This indicates most of the recombinogenic genome is mapped and was supported by the high genome coverage calculations, which improved after map integration (Table 2). The IM increases estimated genome coverage to 97%, relative to the prior 95% (MP2) based mainly on GeneThresher®-derived SSRs, 94% (MP1) by Barrett et al.  using EST-SSRs and 87% by Zhang et al.  which relied primarily on red clover (T. pratense) SSRs. Particular features of MP2 and the integrated map are improved resolution of group 5 (formerly G) and extension of homoeologous group 2 (formerly F) as compared to Barrett et al. . Both TrGT and EST marker sets show generalised distribution through the genetic linkage space, indicating both are suitable sources for further marker enrichment of targeted map regions.
Further evidence of the robustness of the assessment of genomic structure provided by these linkage analyses is the consistency of map length and the relative positions of joining loci between the two source maps presented here, as well as unpublished maps developed in our laboratory for both T. repens and the diploid progenitor, T. occidentale. There are no markers in common between IM and the incomplete genome map of Jones et al. , but the trait-focused parental maps of Casey et al.  and Wang et al.  exhibit regions of general marker order alignment with ‘ats’ and ‘prs’ markers common to IM. Map length is more difficult to compare due to the partial genome coverage of those maps. In contrast, comparative analysis and alignment to the map presented by Zhang et al.  of ‘ats’ and ‘prs’ markers in common with IM, indicates significant differences in marker placement both within and among linkage groups. Furthermore, the Zhang et al.  map is distinguished by a 47% increase in map length to 1877 cM, relative to the 1274 cM of IM. The recent white clover linkage map , based on a combination of white clover, red clover and Medicago truncatula-derived SSRs, also exhibits a marked inflation (97%) in total map length to 2511 cM relative to IM. Comparative alignment based on common ‘ats’ and prs’ markers also indicates regions on that map with notable divergence in marker placement relative to IM, particularly linkage groups 2a and 2b .
Care was taken in matching homoeologues between the consensus maps of MP1 and MP2 in the map integration, including use of homoeologue-specific SSR markers and allele size matching (Table 3). There is, however, insufficient information in marker and sequence resources to accurately assign linkage groups from this map to progenitor genomes identified by Williams and colleagues  and tentatively annotated O and P’. As additional sequence resources become available, this integrated map and marker resource is expected to accelerate the process of linkage group assignment into homoeologous sets, matching sets to progenitor genomes, and exploration of genome evolution within the genus Trifolium.
Mapping candidate genes places genes putatively associated with traits of interest on linkage maps. These mapped genes may provide functional markers associated with regions of the genome with a significant effect on trait phenotype, as has been shown in Medicago, and may be deployed in marker-assisted breeding. Markers derived from two introns of the SHATTERPROOF9 gene (TrSHP-2 and TrSHP-8) provided an internal control for the intron polymorphism methodology for candidate gene mapping, and mapped to the same locus (Table 1; Additional file 3; Figure 2). While TrPPD was the only candidate gene to be mapped in both homoeologues, many of the other genes exhibited additional amplicons that were not informative in the mapping population suggesting they may have loci elsewhere in the genome, including other homoeologues and paralogues. Placement of candidate genes also enables comparative mapping, for example, LEAFY (marker TrLFY; Table 1) maps to a locus at similar positions in group 3 of our integrated Trifolium map and in Medicago.
Segregation distortion was confined to discrete regions of the genome in both MP1 and MP2, most of which were population-specific (Figure 3) and characterised by flanking markers exhibiting progressive distortion decay with distance from the peak. Zhang et al.  also identified discrete regions of segregation distortion, although several individual distorted loci were closely flanked by non-distorted loci without the characteristic distortion decay. In contrast, Isobe et al.  documented segregation distortion across much of the white clover genome. It is difficult to accurately align regions of segregation distortion in the parental consensus maps of MP1 and MP2 with the maps of Zhang et al.  and Isobe et al.  due to discrepancies in marker order where there are SSRs in common. Alignment with the map of Casey et al. , in which the white clover S locus that regulates self-incompatibility was mapped to the top of a homoeologue of group 1, was straightforward as it contains marker loci in common order. In particular, a single locus homoeologue-specific SSR (prs285) near the S locus  enables homoeologue matching with MP2, and places the S locus at the top of MP2 1-2, which also exhibits strong segregation distortion in the same region (Figure 3). This highlights the value of sharing marker resources to facilitate correspondence of marker and phenotype information across populations, and localises the S locus to T. repens LG 1-2. MP1 has no segregation distortion on this homoeologue which may be explained by MP1 parents having compatible S alleles at this locus. Both MP1 and MP2 share a region of segregation distortion on 4-1 and while white clover is regarded as having a single locus self-incompatibility system , the distribution of distortion raises the question of what other loci may influence segregation in these conditions for this species.
In Silico genome alignment
The in silico alignment between T. repens and M. truncatula revealed a general case of co-linearity, and identified an inter-chromosomal rearrangement where Mt-2 and -6 were split across Tr-2 and -6, as first described by Griffiths et al. . Furthermore, orientation of T. repens relative to M. truncatula was clear in which groups 2 (F), 3 (A), 4 (D), 5 (G), 7 (C), and 8 (B), as oriented in Barrett et al. , were inverted relative to M. truncatula and reflected that of Griffiths et al. . Groups 1 (E) and 6 (H) were correctly orientated relative to M. truncatula. Comparison with M. truncatula suggests short inversions compared with white clover groups 1, 4 and 8; however it is not known if these are authentic or are artefacts of constraints in linkage analysis or genome assembly. This is also the first in silico alignment of Tr-5 (G), based on the improved marker order and numbers in the integrated map compared with Barrett et al. . Tr-2 (F) was the only T. repens linkage group with large regions with no in silico alignment to M. truncatula. This suggests Tr-2 either has large regions without homology to M. truncatula, or regions of M. truncatula with homology to actively transcribed regions of the T. repens genome have yet to be sequenced. Candidate genes, however, matched expected macrosyntenic sites between Trifolium and Medicago in all cases, including the group 2/6 translocation as annotated. When considered in totality, this in silico comparative analysis confirms a general state of co-linearity between T. repens and M. truncatula. This extent of alignment suggests the Medicago genome can be used as a reference to estimate genome locations of unmapped sequence, and is further supported by evidence of micro co-linearity .
While the split of Mt-2 across Tr-2 and −6 was clear, determining which of T. repens groups 2 (F) and 6 (H) had greatest co-linearity with Mt-2 was less so. Our data suggest that T. repens group H aligns more extensively with Mt-2, although this may only be resolved after development and alignment with a T. repens genome assembly. For consistency, however, the published  syntenic assignments of Mt-2 = F and Mt-6 = H are maintained. This split of Medicago group 2 across T. repens groups 2 (F) and 6 (H) is a key feature of the in silico alignment. According to a phylogeny of the legume vicioid clade , three general groupings, one comprising Medicago and Ononis, another Trifolium and Melilotus, and another Pisum, Lathyrus, and Vicia, had diverged from a more ancestral Cicer arietinum (Chickpea). Detailed comparative analysis of members of these groupings with Medicago shows the group 2 split is a feature of T. repens, Vicia faba, and Pisum sativum. In contrast, there is no such split between Medicago and Cicer arietinum, indicating that Medicago group 2 may represent the ancestral condition that has since undergone rearrangement during evolution of derived phyla including Trifolium.
In contrast to Mt-2 and the other M. truncatula pseudomolecules, determining alignment of Mt-6 with T. repens was more difficult. This was due to the paucity of matches between Mt-6 and T. repens; a total of 10 hits compared to a mean of 52 hits each for other Medicago groups aligned with T. repens. While Mt-6 has approximately half the sequence data of other Medicago groups (http://www.medicago.org/genome/downloads/Mt3/), the very low number of in silico matches between Mt-6 and multiple T. repens sequence sources is not a surprise for several reasons. Mt-6 is atypical of the other Mt chromosomes as it contains an over-representation of resistance gene analogues and leucine rich repeats , the greatest proportion of heterochromatin , and a corresponding under-representation of randomly selected and mapped EST markers [65, 67]. Furthermore, comparative alignment with other legumes reveals Mt-6 to have reduced marker-based synteny . Since the T. repens alignment with M. truncatula was based predominantly on exome-derived sequence, reduced synteny with the low gene density Mt-6 is not unexpected and may explain the in silico alignment gap identified in Tr-2. The full relationship with Mt-6 may only be resolved after development and alignment with a T. repens genome assembly.
The in silico alignment in this and a previous study , used an E-value threshold of 1e-20 for identifying significant BLASTN matches. Reducing stringency to <1e-5 in our analysis revealed numerous spurious matches, often to multiple regions in the Medicago genome (data not shown). A similar study by George et al. , using a subset of the data from Griffiths et al. , derived an in silico M. truncatula:T. repens alignment at the <1e-5 threshold. While the general patterns of alignment were conserved, the reduced data set and low threshold may have prevented George et al.  from determining orientation of T. repens relative to M. truncatula for groups F (Mt-2), G (Mt-5), and H (Mt-6). Evidence was also presented for a translocation of a terminal segment of Mt-1 to Mt-3 , however there is no evidence for this translocation in the current or previous studies , which are augmented significantly by the full EST-SSR dataset, and TrGT-SSRs. Furthermore, there is no evidence in our study of a general breakdown in group 1 synteny as there is a well-supported macrosyntenic relationship along the length of the groups, with a short inversion of Tr-1 relative to Mt-1 at the top end that may be an artefact of linkage analysis. Again, the full relationship between these two species may only be resolved after development and alignment of a T. repens genome assembly with Medicago and other legume genomes.