Frequency, distribution, and polymorphism
From the 11 391 unigenes obtained from 37 000 EST , 1692 microsatellite sequences were identified. 14% of unigenes contain at list one microsatellite as already mentioned for other citrus resources by Chen et al. . This value can be considered as quite high by taking account of the selection pressure that is applied on genes to maintain a lower diversity on the coding region. Nevertheless this frequency is higher than observed for dicotyledonous species ranged between 2.65% and 10.62% [41, 42]. The frequency is dependent on the presence or not of redundancy but also related to the parameters used for SSRs screening in the database mining. Varshney et al  reported that the frequency was about 5% when the minimum length for the detection of microsatellite was 20 nucleotides. In our study we were less drastic for the detection of SSR. We have fixed this criterion to a minimum of 6 repetitions for dinucleotide repeats (12 base pairs in length) and 5 for the others (15 base pairs in length for trinucleotide). This difference could explain our higher frequency of SSRs in ESTs without apparently any effect on polymorphism (see below). Trinucleotide and dinucleotide repeats were the most common SSRs in clementine ESTs (53.9% and 37.6% respectively). These values reflect the predominance of trinucleotide and dinucleotide repeats in many EST plant species [23, 42–46] meanwhile a strong divergence was observed in a hexanucleotide repeat frequency. In many crops they were abundant with a frequency ranged between 13–26%. In clementine ESTs they represent only 2.4% of overall SSRs.
Functional characterization of ESTs performed with GO annotation showed that all the main functional categories were represented. This is in agreement with previous results [18, 19]. The EST- SSRs showed similar distributions in the GO Slim categories, and no functional group was overrepresented, indicating that there is no preference in the location of microsatellites with respect to function of the genes.
Relation between SSR polymorphism and phenotypic variation could be investigated in any MIPSs functional categories. Moreover, the EST-SSRs could represent a convenient and cheap way for genes mapping when compared to RFLP technique and sequencing. Unfortunately, the frequency of gene containing a SSR sequence is relatively low (14%). Moreover less than 66% of the analyzed SSR were polymorphic. That means that less than 9% (14% × 66%) of the unigenes should be mapped by internal SSR markers. Seven of the 47 couples of primers amplified DNA fragments from clementine that were larger than expected suggesting the presence of introns. It is possible that the non amplification for the 9 other primers couples was also due to the presence of introns.
From the position analyze of SSRs in ESTs we founded that the majority of SSRs were located in the UTR and mostly (75%) in the first hundred bases of the 5' cDNA extremity. This non equal distribution of SSRs along the cDNA sequence was also reported in other crops such as rice, wheat and barley  but with some divergences. In barley, the majority of SSRs are present in the EST 3'-sequences with a high proportion of dimeric and tetrameric SSRs despite tetrameric SSRs are quite absent in Clementine ESTs. As clementine EST clones were single-pass sequenced from their 5' end and their main size were about 800 nucleotides , 3'end sequences of these ESTs were certainly under represented. The EST 3' end region was known to be also reached in microsatellites sequences [44, 45]. As a consequence of this method of EST production, we believe that we have introduced a bias in the general distribution of the different SSRs along the clementine transcribed sequences. Nevertheless, few works described the abnormally high frequency of microsatellite in 5'UTR regions of plant genes, and a lower abundance in coding region or 3'UTRs [43, 45]. Our results seem to confirm this feature. This heterogeneous distribution of SSRs could be explained by the incidence of the SSR variability on the gene transcription and/or proteins structure integrity and function. In UTR, these microsatellites can be more variable without changing gene transcription and translation. The dominance of trimeric SSRs in TR can be explained by the suppression of non-trimeric SSRs in coding regions due to the risk of frame shift mutations that may occur when those microsatellites alternate in size of one unit. In the case of trimeric repeats, it is worth to note that this kind of microsatellite was distributed homogenously along ESTs. It could be hypothesized that trinucleotide SSR variations has less impact than dinucleotide variations in the gene functionality. Indeed, modification of the number of repeats of trinucleotide does not affect the reading frame. Furthermore, dimeric SSRs seem to be more polymorphic than trimeric ones and particularly in UTR with a putative higher allelic diversity combined with higher heterozygosity contributing to a powerful capacity for distinctness. These differences between repeated unit types were attenuated or disappeared when they were located in TR. However, the importance of this result has to be attenuated since we do not have an equal representation for each situation and a too low sampled marker set. Unfortunately, only 4 loci with di-SSRs in TR were detected when compared to 12 for trimeric SSRs and then the differences were not statistically significant.
EST-SSR markers for citrus diversity
Genetic diversity analysis and systematic is a classical application of SSR markers. For such application, the ability of one marker to differentiate germplasm accessions is an important characteristic. Due to their higher polymorphism, markers localized in UTR are more interesting than markers in TR. Moreover, better rate of accession identification have been obtained with dinucleotide markers (0.61) than with trinucleotide ones (0.29).
The organization of genetic diversity obtained with EST-SSR is in agreement with the knowledge of the genetic relationships between Citrus species previously reported by studies using different markers for systematic analyses: morpho-physiological characters , biochemistry , isozymes [49, 50], genomic SSR markers [13, 14], CAPS markers , or RFLP and RAPD markers . Three major ancestral species: mandarins, pummelos and citrons are at the origin of many cultivated hybrids. As well, the parental relation of limes and lemons with citrons was clearly demonstrated by all these studies. It is in agreement with the strong differentiation we observed between acidic citrus group (lime-lemon-citron) and the pummelo-mandarin (and their hybrids) group. Lemon is thought to be a natural hybrid of a citron and a lime [47, 48], or a hybrid of citron and sour orange [51, 53]. Our results seem to comfort the participation of sour orange because 15 alleles specific from this genotype were detected in lemon since 10 from citron and only 3 from lime were observed. Nevertheless, we can not certify the parental combination because in our sampling the lime and citron groups were limited to a unique variety. The diversity of these groups were not represented as described previously [13, 14] and so few alleles from lemon (4) were still absent in the three putative parents of our study. Several hypotheses have also been proposed to explain the origin of Mexican limes: hybrids of citrons and papedas , tri-hybrid cross of citron, pummelo, and Microcitrus  or hybrid between citron and C. micrantha . As for lemon the limited diversity of our analysis does not allow to discuss these hypotheses. Sour orange is a natural hybrid of a mandarin and a pummelo and in our analysis it is associated to the pummelo cluster. The participation of the two basic species, pummelo and mandarin, to the sweet orange formation is attested by the citrus taxonomy literature. However, some troubles still remain concerning the number of crosses between these two basic species. Barkeley et al.  suggested that sweet orange was derived from one or more backcrosses to the mandarin and then its genetic was makeup derived from mandarin and a small proportion from pummelo. Nicolosi et al.  have proposed a single cross based on equal proportions of alleles from mandarin and pummelo. Our results, with a common cluster of mandarin and sweet orange support the first hypothesis where sweet orange has a higher proportion of alleles from mandarin.
Compared to the phylogeny made with genomic SSR  a single difference was observed in our representation. It concerns the genetic diversity between citrus genera. The trifoliate orange (Poncirus trifoliata) joins the cluster of citron-limes-lemon while kumquat (Fortunella japonica) remains genetically distant to other citrus. In previous work  about genetic relationships based on genomic SSRs, the situation was inverted wherein Fortunella species were much more closely related to the four other Citrus (mandarins, pummelos, citrons and papedas), and the group of Poncirus accessions were very distant to all others. This difference could be related by the overrepresentation of kumquat diversity in our study or by a real difference of polymorphism rate between genomic SSRs and EST-SSRs. A similar study on a larger citrus sampling could be suitable to resolve this question. We can not compare the transferability of EST-SSR and genomic SSRs, but a large majority of EST-SSR markers could be used to investigate the genetic of citrus relatives. Indeed, only 10% of those EST-SSR markers gave unsuccessful amplification in Box orange (Severinia buxifolia).
EST-SSR marker for citrus genome mapping
Citrus have a juvenility period with around 5 years of duration limiting the possibility to work on a second generation of hybrids. Consequently a lot of citrus genetic maps are established on F1 progenies at interspecific  and intergeneric level [26–31]. In order to evaluate the proportion of mappable EST-SSR markers we have calculated the percentage of heterozygous markers informative for all combinations between 15 sexually compatible citrus genotypes, currently used or susceptible to be used in citrus genetic programs. Table 4 represents a tool for the selection of the sexual cross most suitable for a higher efficiency of mappable markers associated to the better situation for comparison of both parental maps. Higher percentages of markers are available to map secondary species of cultivated citrus than to establish genetic maps of the three basic taxa (citron, mandarin, pummelo). As a result, a very low rate of EST-SSR markers is usable to make comparative genetic mapping between these three basic taxa: it is only 3% for Citron/Pummelo, 3% for Citron/Mandarin (cv Cleopatra) and around 9% for Pummelo (cv Pink)/Mandarin (cv Cleopatra).
It is clear that the best way to map the higher number of markers in a single progeny is to work on segregation of interspecific or intergeneric crosses. Citrus × Poncirus progenies have been highly investigated [11, 54–59]. A recent work on EST genetic maps for Citrus sinensis and Poncirus trifoliata was published . For these maps the authors have studied the segregation of 300 pairs of primers generating EST-SSR markers on the intergeneric progeny sweet orange × trifoliate orange. Among them 141 markers (47%) were mapped and distributed as following: 122 markers (40.7%) on sweet orange map, 59 (19.7%) on trifoliate orange one and 40 (13.3%) were commune to both. These values were very similar to those proposed in our work (table 4 and 5) where for the same parental cross we have estimated at 52% of of mappable EST-SSR markers and 40%, 29% and 17% respectively for orange, trifoliate orange maps and commune markers. This mapping work was done with a majority of non abundant SSRs in ESTs such as compound, tetra-, penta- and hexa-nucleotide repeats. Di and tri-nucleotide SSRs represent only 26.7% of the total studied SSR markers.
On the base of the genetic differentiation observed in our cluster analysis, it appears that in this frame, interesting progenies should be obtained from F1 hybrids between citron and pummelo, citron and mandarin, as well between poncirus or kumquat with citron or mandarin or pummelo. Such intrageneric progenies should probably have more interest for further QTLs analysis of quality traits.