Frequency and distribution of SSRs in carrot genomic and EST sequence
Microsatellite density in genomic DNA of carrot was estimated by analysis of 1.74 Mbp of BAC end sequence (the GSSRs dataset was excluded from this analysis because it derived from an SSR-enriched library and, therefore, its analysis would result in an overestimation of the SSR density in genomic sequence). Carrot had a rather low SSR density (134.5 SSRs/Mbp) compared to other species. SSRs analyses -using the same search parameters and program as with carrot- in the complete genome sequences of four model species revealed SSR densities of 370, 507, 529, and 508 SSRs/Mbp in Arabidopsis thaliana, grapevine, rice, and poplar, respectively. The lower SSR density in carrot compared to these species cannot be attributed to differences in the source of genomic sequence (BAC ends versus whole genomes) since analyses of BAC end sequence (BES) datasets from these and other species were also much more dense in microsatellites than carrot BES (data not presented). Similarly, transcript sequences of carrot (214.8 SSRs/Mbp), although more dense in SSRs than their genomic counterparts, were also less frequent in these repeats compared to ESTs of Arabidopsis (358 SSRs/Mbp), grapevine (247 SSRs/Mbp), poplar (425 SSRs/Mbp), soybean (403 SSRs/Mbp), rice (739 SSRs/Mbp), and sorghum (646 SSRs/Mbp).
Carrot trinucleotides were more frequent in transcripts than in genomic DNA. In addition, within BSSRs trinucleotide repeats occurred preferentially inside ORFs, and accounted for ~ 50% of the total SSRs found in these protein coding regions. The abundance of these repeats in ESTs and in ORFs is consistent with the notion that protein-coding sequences tolerate better frame-shift mutations (InDels) of 3 bp -or multiples of 3 bp- than other InDel lengths. Thus, trinucleotide repeats within coding sequences may translate fully functional proteins with a few extra (or fewer) aminoacids, whereas InDels of other lengths would translate abnormal, often deleterious, proteins. Consistent with our results, an overrepresentation of trinucleotides in protein-coding sequences has been reported previously in numerous plant species [20–24], as well as in other eukaryotes including humans, primates, rodents and insects [25, 26]. The relative abundance of trinucleotides over other SSR types has been attributed not only to negative selection against frame-shift mutations in the coding regions but also to positive selection for specific single amino-acid stretches .
DNA polymerase slippage is the main mutational mechanism leading to changes in microsatellite length . These changes in SSR size are most often gradual and step-wise since polymerase slippage only generates gains or losses of one or a few repeat unit(s) . Thus, the fact that SSRs in carrot transcripts generally had fewer repeat units than SSRs in genomic sequence, even for trinucleotide repeats (trinucleotides were twice as frequent in ESTs compared to genomic data), suggests a negative selection pressure against microsatellite size increase in protein-coding sequences.
The non-random distributions of motif sequences among dinucleotide and trinucleotide SSRs of carrot included a higher than expected incidence of (AT)n repeats in genomic DNA (BAC ends), like that of several plant species including soybean, Arabidopsis and rice , but unlike the (AC)n predominant motif among dinucleotides in humans . In contrast, the (AT)n motif was less often observed in ESTs than expected, while (AG)n and (CT)n were more common than expected. This may suggest different constraints for repeat motifs across diverse organisms.
Marker development and analyses in F2 families
In this study, two different strategies were used for isolating and developing carrot SSR markers. The hybridization-based approach, as described by Glenn and Schable , yielded microsatellites (GSSRs) that were, in average, significantly longer (23.1 bp versus 13.9 bp) and had more repeat units (7.9 versus 4.4) than SSRs from BAC end sequences (BSSRs). These differences are, most likely, due to differences in the two strategies used. DNA library enrichment methods based on hybridization capture [28–31] are generally designed to yield a higher proportion of SSRs with large number of repeat units, targeting mainly long perfect repeats. Under this system, long DNA stretches of perfect repeats are hybridized more efficiently to the microsatellite probes and they are retained at a higher rate, compared to short repeats, during the washing steps, thus, increasing the relative proportion of long microsatellite sequences in cloned colonies . Conversely, the BSSRs set represents a random sample -without enrichment for length, repeat type or sequence motif- from genomic DNA. Because of this, they provide a more reliable picture of the microsatellite distribution in the carrot genome. Longer and more repetitive SSRs have been obtained through hybridization-based methods compared to sequence-searches in other plant species, regardless of the type of DNA examined (i.e., genomic or ESTs), including Brassica [32, 33], cotton , wheat and rice .
The differences in repeat number and length between GSSRs and BSSRs have important implications for marker potentiality, particularly with regard to polymorphism. In general, GSSRs were significantly more polymorphic than BSSRs, considering both the polymorphism index (PI) (23.6% versus 9.8%) and the percentage of polymorphic markers (77% versus 52%), and these differences were associated to a higher repeat number and length in the GSSRs group (as suggested by the significant positive correlations obtained between both variables and PI). Developments of SSR markers from other plant species, including cotton , barley  and pine , have also noted positive relationships between SSR polymorphism and number of repeat units. Together, these results are consistent with studies reporting that both SSR polymorphism and SSR mutation rate have a positive relationship with repeat number [38–40]. Concordantly, positive and significant relationships have also been found between repeat length and mutation rate in human , fruit fly  and yeast  microsatellites. These studies indicate that polymerase slippage, the main mutational mechanism in microsatellites , increases with higher repeat number and length, leading to a higher diversity in longer, more repetitive SSRs, as observed in the present study. However, contrary to these and our results, studies using markers developed from other plants, such as Brassica  and pearl millet , have reported lack of correlation between size of the SSR, both measured by length (bp) and repeat number, and detection of polymorphic loci. As pointed out in the latter two studies, SSR evolutionary age is a key factor for SSR diversity (i.e., recently evolved microsatellites would have fewer polymorphisms because of fewer occasions for mutation, even if they are relatively long) and this may help explain the lack of association found by them. In addition, most of the above studies (including ours) cannot rule out the possibility that InDels at regions other than the SSR motifs may account for some of the polymorphisms, thus influencing the expected relationship between length and polymorphism.
A major interest for evaluating the SSR markers in the carrot F2 populations was to assess their potential for mapping. Linkage maps using some of these F2s have already been constructed (see Table 4) and others are underway (Simon, personal communication). These maps include different phenotypic traits of interest (Table 4) and -before this study- they were mainly constructed using anonymous dominant markers, such as AFLPs and RAPDs, with only very few markers, or none, in common, thus, making their comparative analyses and/or integration difficult. The present work identified 123 SSRs (87 GSSRs and 36 BSSRs) that were polymorphic in two or more mapping populations, suggesting that these common markers may serve as anchoring points for merging carrot maps. Besides the inclusion of 56 SSR markers onto the carrot reference map (see below), work is underway in our lab to include these polymorphic SSRs in other maps with different genetic backgrounds (see Table 4). The integration of carrot linkage maps would enhance their usefulness for assisting breeding of this species, by increasing marker saturation nearby genes of interest and thereby facilitating applications like positional gene cloning, among others.
From our evaluation in seven carrot F2 families, 196 SSR markers (65%) were polymorphic in at least one mapping population. Because the PCR amplicons were size-separated using high-resolution agarose gel electrophoresis, which can only resolve fragments with size differences of at least 3 bp, a fraction of the markers evaluated in some populations, generated ambiguous band patterns. Although they may have been polymorphic, the bands were too close together in the gel to unambiguously score, and were classified as monomorphic (i.e., only unambiguously polymorphic and scorable markers were classified as "dominant" or "codominant" in Additional File 1 - Table S2). Thus, if other fragment separation systems, with better resolution, are used, such as separation of fluorescently-labeled fragments through capillary electrophoresis, the number of polymorphic markers may be expanded in some populations, particularly in cases of dinucleotide SSR markers varying in a single repeat unit.
High PCR amplification efficiencies were found in the F2 families for both sets of markers, GSSRs (83%) and BSSRs (87%). Comparable amplification efficiencies have been found in other plant species with SSR markers developed using hybridizations-based methods (~ 90% ) and sequence-based searches (85% ).
Transfer success of carrot SSRs across Apiaceae
The availability of SSR loci for economically important species has increased interest in primer transferability to related taxa, especially for species in which molecular resources are limited. In Apiaceae, only a few publicly available SSRs have been reported previously, and these were developed from carrot (9 SSRs ) and celery (11 SSRs ), the two most economically important crop species in the family. Results from this study indicate that a significant fraction of carrot SSRs transfer successfully across Apiaceae. Locus amplification success was detected in 91 to 224 markers across 15 non-carrot Apiaceae species, including economically important crops like parsley (131 SSRs), celery (133 SSRs) and cilantro (91 SSRs). Prospects of a broader utilization of these markers beyond carrot include their application in taxonomic, population, and conservation studies as well as for mapping and assisting breeding in crop species.
It is, however, important to bear in mind that when using SSR markers across distantly-related species the amplification of a PCR product does not necessarily imply locus conservation, since size homoplasy, i.e. convergence in size of non-homologous fragments, may occur. Considering the possibility of this source of confusion, verification of the PCR product identity by sequencing has been suggested previously, particularly when working across genera and if there is uncertainty regarding the size range of the amplicons obtained . However, verification through sequencing may not be necessary if working within the same genus as the species from which the SSRs markers were developed . Thus, the use of carrot SSR markers for studies in non-Daucus Apiaceae should include verification, by sequencing, of the homology to the carrot SSR product sequence (see Additional File 1 - Table S1).
Transfer of carrot SSRs across Daucus species (carrot accessions excluded) was, in general, less successful than SSR transfer rate at the subgenus level reported for other species, whereas transfer of carrot SSRs across-genera was relatively higher than found in other plants. According to a previous review of SSR cross-transferability in plants , the average transferability across species in the same genus was 76.4%, and across related genera was 35.2%. We found these values to be 58.3% across Daucus species and 41% across the Apiaceae. However, it should be noted that SSR transfer success varied greatly across the different reports for both within-same genus (4.7 - 100%) and across different genera (0 - 71.4%) . The huge variation found across these studies likely reflects differences in phylogenetic distance (and thus, also in conservation of sequences at priming sites) between the source and target taxa within each family, as well as differences in the number of taxa and SSR loci analyzed, and in the type of sequences used for marker development. For example, EST-derived SSRs are more conserved and thus they transfer across genera more readily than genomic SSRs ), among other factors.
Our data (Figure 3) suggest generally a higher rate of success in amplifying carrot SSRs in plants more closely related to carrot. This should not be surprising since closer-related taxa have higher overall sequence homology which translates to more conserved SSR flanking regions and, therefore, easier transferability of primer pairs. Negative relationships between SSR transfer success and phylogenetic distance between source and target taxa have been widely observed in many plant families [18, 46].
The potential usefulness of SSR markers for diversity and phylogenetic studies in Apiaceae will depend, to a great extent, on the possibility that markers successfully amplify across different species and on the ability of the marker to detect polymorphism among the taxa. To have a preliminary picture of how suitable the SSR markers developed in this work may be for these applications, we investigated interspecific SSR variation among non-carrot species by analysis of amplicons sizes in the agarose gel images. Thus, for each SSR, the total number of different alleles in the non-carrot species dataset was recorded (Additional File 1: Table S4). Only SSRs that successfully amplified products in at least 80% of the non-carrot species (i.e., SSRs that generated amplicons in at least 12 of the 15 non-carrot species used to assess marker transferability) were considered. Overall, our results revealed 88 SSRs that generated amplicons in most (> 80%) outside-carrot species. Of these, 40 SSRs (29%) produced 3-9 different alleles (with an average of 4.9 alleles/SSR) in the non-carrot group. It should be noted that our calculation of 4.9 alleles/SSR in these selected markers is conservative, due to the low resolution of agarose gels which do not allow discrimination of different alleles varying in one or a few repeats. These results suggest that a significant proportion of the SSR markers developed herein may be suitable for addressing taxonomic or phylogenetic questions within Apiaceae.
Further analysis of the 88 SSRs that produced amplicons in the majority of the non-carrot taxa revealed interesting differences between the two SSR datasets. Although more BSSRs than GSSRs (52 and 36 markers, respectively) amplified successfully in most non-carrot taxa, GSSRs were much more polymorphic than BSSRs at the interspecific level. For example, among GSSRs 28 markers produced 3 or more different alleles (with a range of 3-9 and mean of 5.5), whereas only 12 BSSRs generated 3 or more alleles/SSR (with a range of 3-6 and mean of 3.6). It is likely that the generally higher polymorphism of GSSRs compared to BSSRs at the inter-specific level, which is in agreement with our results for both sets of markers in the carrot F2s, may be also due to the higher number of repeat units present in GSSRs.
SSR linkage mapping
Prior to this work, important advances were made in the construction of carrot genetic maps with a range of molecular marker systems. Although some RFLPs  and a few SCAR and gene-specific markers were mapped , the most extensive genetic mapping data in carrot has been generated mainly with dominant AFLP, RAPD and Transposon-display (TD) markers [4, 5, 8–10]. While RFLPs are useful for comparative mapping purposes, high throughput genotyping and probe handling are difficult. Similarly, the carotenoid genes mapped by Just et al.  are not as easily transferred to other mapping backgrounds since their analysis relied in most cases on SNPs, due to the lack of larger polymorphisms (e.g., InDels) in these genes that can be scored as easily as SSRs. On the other hand, AFLP, RAPD and TD markers, while providing a relatively large number of markers per assay and good genome coverage, have limited information content and are not of much use for comparative mapping purposes and for validating QTL across pedigrees [8–10]. The addition of 55 SSR markers to the carrot reference linkage map together with detailed characterization of this novel set of 300 SSRs in subsets of six other mapping populations should allow significant advances in carrot comparative mapping and map-integration. The fact that most of the mapped SSRs were codominant (38 SSRs) in the B493 × QAL-derived population, with 2-8 informative markers per linkage group, together with the identification of putative codominant SSRs in other mapping populations adds extra value to the data published here for pursuing these goals. The inclusion of SSRs in linkage maps with additional pedigrees is currently underway.
The parental B493 map has a slightly larger total map length than the QAL map. Although the higher mean recombination found in B493 may help explain its larger map length, other factors -e.g., related to the type of markers used- may also cause this effect. Different recombination frequencies can be obtained just by sampling of the different markers, as well as errors derived from calculations of genetic distances from dominant markers data.
In the current map, we have modified linkage group designations and orientations, in accordance to recent cytogenetic data concerning the integration of carrot LGs with actual chromosomes . Following standard conventions, consecutive numbers were assigned to the LGs in decreasing order of chromosome length (i.e., LG1 corresponds to the longest chromosome), and four LGs were inverted in their north-south orientations to agree with the standard short arm/long arm presentation of their corresponding chromosomes. It must, however, be noted that although all the LGs could be unequivocally associated to chromosomes, and thus their number designations are correct and complete, unambiguous LG orientations could only be defined for six of the nine LGs. Thus, chromosomes 4, 6, and 9 in the current map, which correspond to former LGs 6, 3, and 7, respectively, could not be unequivocally oriented, because a single anchored BAC probe was used for LG-chromosome integrations. Thus, their orientations were not modified from previous map versions [5, 8, 9]. However, the possibility remains that future cytogenetic data (for example by FISH analysis with several BAC probes SSR-anchored to these LGs) may reveal different orientations for these LGs. These modifications based on recent cytogenetic data, and the addition of 55 new SSR markers, add value to the updated reference carrot linkage map presented herein. Overall, the current maps involve 193-202 mapped loci, including 69 highly informative markers which consist of SSR, carotenoid gene and SCAR markers, spanning 1,121-1,273 cM, making it the most comprehensive genetic linkage map in the Apiaceae to date.
The SSR loci mapped across all 9 LGs in both parental maps, and they were distributed fairly evenly within most individual LGs, thus recommending their usefulness as anchor points for merging carrot maps. In addition, such dispersed map distribution of the SSR loci, has allowed us to develop BAC FISH probes carrying SSR sequences mapped to specific LGs. These SSR-anchored probes were used for integrating some LGs of carrot with chromosomes by FISH mapping .
The positional association observed between SSRs and previously mapped genes suggests that these tandem repeats are frequent in genic regions of the genome. This is in agreement with results of Morgante et al. , demonstrating higher microsatellite frequencies in the transcribed and non-repetitive fractions of plant genomes. One SSR (gssr112) and two SSRs (gssr12 and gssr119) in LG7 and LG5, respectively, were located in the vicinity of two highly-significant quantitative trait loci (QTL) for total root carotene accumulation. These correspond to the Y and Y
loci, respectively, described by Buishand and Gabelman . Microsatellites gssr12 and gssr119, although not tightly linked to Y
, may be useful for marker assisted selection, either as a complement of, or as an alternative to the lack of amplification of other robust more-closely linked markers, such as Y2mark .