Diversity of microsatellite distribution in nematode genomes
In this study, we used Msatfinder  to scan the recently assembled M. incognita, M. hapla, B. malayi and P. pacificus genomes for perfect microsatellites of 1-6 bp long. To validate our results, we performed a similar analysis of the C. elegans genome using the same bioinformatics tool and search parameters. The coverage of microsatellite loci in C. elegans has previously been estimated at 2139 bp/per Mbp , and we were encouraged in that our results are consistent with those previously reported for this nematode (~2059 bp/Mbp; Table 1). Indeed, such a consistency between two independent studies may be considered as a strong indication of the robustness of the global analysis.
It has long been considered that the structure and organization of the C. elegans genome would serve as accurate model for most nematodes . Previous comparative studies of model eukaryotes have shown that C. elegans has the lowest frequency and coverage of microsatellites in its genome, even less than Saccharomyces cerevisiae and other fungi [18, 20, 21]. Our own analysis reveals a huge variation among the five nematode genomes investigated in terms of both coverage and density of microsatellites, that is independent from the size of the genomes considered. While P. pacificus has similar microsatellite density and coverage than C. elegans, the two RKN species, M. hapla and M. incognita, both exhibit a twofold lower abundance of microsatellite loci. Conversely, the animal parasite B. malayi contains five times more microsatellites/Mbp than C. elegans. Phylogenetic relationships within the Nematoda phylum have been considered to tentatively explain such variability. Based on small subunit ribosomal DNA sequences, the phylogeny of nematodes identifies five major clades in the phylum . In this framework, C. elegans and P. pristionchus belong to Clade V, M. incognita and M. hapla belong to Clade IV, and B. malayi belong to Clade III, respectively. Although a loose correlation may be seen here, additional data from many other nematode species from all five clades are required before considering any hypothetical relationship between the clade of origin and the density of microsatellites for a given species. Clearly, these data provide evidence of variable patterns of microsatellite distribution in nematode genomes, indicating that the particular contribution of these shorts tandem repeats to the genome of C. elegans may not be the rule for other nematodes.
Without taking into account mononucleotide repeats, it is generally reported that dinucleotides are the most common microsatellites in many organisms [23, 24]. Here, dinucleotides appeared significantly underrepresented in both RKN species, with as few as 3.5 and 4.0% of the total number of microsatellites in M. hapla and M. incognita, respectively. Conversely, trinucleotides were overrepresented in both the two latter species compared to C. elegans, P. pacificus and above all B. malayi. Such a distribution bias in favour of trinucleotides has been reported for some other eukaryotes, e.g., the fungus Neurospora crassa or the insect Tribolium castaneum. Also, a recent data-mining analysis of ESTs from phytoparasitic nematodes, including some RKN species, showed that trinucleotide repeats were the most abundant microsatellites in coding ESTs .
Data-mining of 26 completed genomes showed that microsatellites with low GC content are predominant in most eukaryotic genomes . This trend also emerged from our survey, with the majority of the most frequent 1-6 bp microsatellite motifs from nematodes being AT-rich. However, one notable exception was constituted by polyG/polyC mononucleotide repeats in M. incognita and P. pacificus. Conversely, none of the most frequent di- to hexanucleotide repeats contains exclusively Cs or Gs. Among nematode dinucleotides, (AT)n motifs seem to be predominant compared with other motifs, while (CG)n were extremely rare, and even absent in the two RKN species. (CT)n motifs, which are the most abundant dinucleotides in insects  and other invertebrates , were also frequently detected in nematode genomes. In the same way, trinucleotides were dominated by AT-rich motifs, with (ATT)n and (AAT)n being always present in the four most common motifs in the five species investigated. The only exceptions were (CCT)n and (AGG)n, which were frequently detected in the P. pacificus genome. AT-richness was further exhibited by tetra- to hexanucleotides, with some remarkable exceptions, i.e., (AAGGG)n and (ACGGGG)n in M. incognita, or (ACCGGT) n in C. elegans. Overall, the diversity of microsatellite motifs gave each of the five nematode species a unique pattern of repeat distribution, even in the case of the two related RKN species tested here, suggesting that they can be efficient at differentiating those species.
Microsatellites in the protein-coding sequences of Meloidogyne incognita
A total of 1,094 perfect microsatellites have been identified from the whole M. incognita CDS dataset, i.e., 3.9% of protein-coding sequences possess such repeats. The vast majority (>99%) of them are trinucleotide repeats. This result confirms a recent report indicating that trinucleotide repeats were the most abundant microsatellites in coding ESTs from 16 species of plant-parasitic nematodes belonging to seven genera, including Meloidogyne. Moreover, in agreement with these authors, we found that (AAC)n repeats were the most abundant in M. incognita CDS, while (ATG)n or (TTA)n, that could act as start or stop codon, respectively, were not detected. The proportion of trinucleotide repeats in M. incognita CDS exponentially decreased as the number of repeats increased, with 14 repeats being a critical threshold. It is hypothesized that longer repeats are eliminated by selection acting on the nematode genome, since long amino acid repeats may have detrimental effects on protein functions [29, 30]. In addition, our data showed that the hydrophilic amino acid Asn, and to a lesser extent Lys and Arg, are over-represented among the runs of amino acids in M. incognita proteins, which is consistent with the observation that stretches of hydrophilic amino acid are more tolerated in proteins [29, 30].
A broad range of functions have been ascribed to amino acid repeats in proteins, including roles in intracellular protein-protein interactions, binding to host-cell receptors and polymerisation of their associated, non-repeated domains . In parasitic eukaryotes such as Trypanosoma brucei or Plasmodium falciparum, protein repeats are often implicated in antigenic recognition and evasion of the host immune response to infection . Mining the M. incognita predicted proteome revealed a set of 801 proteins containing stretches of amino acid repeats, of which 497 had at least one InterPro domain assigned, thus illustrating the importance of these peculiar structures in parasitic nematodes too. In addition, GO annotation revealed that binding was the molecular function preferentially assigned to these proteins. Based on the hypothesis that amino acid repeats can mediate interactions between the parasite and its host, and thus could potentially have a role in pathogenicity [33, 34], a similar role for (some of) these proteins identified in M. incognita may be proposed. From this point of view, our analysis generated a large set of candidate proteins for further functional analysis in relation to pathogenicity in plant nematodes.
Microsatellites as genetic markers in Meloidogyne incognita
Due to the relatively small size of nematode genomes, and the high impact of many parasitic species on animal/human health or agricultural production, one of the expected outputs of nematode genome projects is the development of reliable and informative molecular markers usable as genotyping tools in population studies. Among the nematode species included in this work, and except for the model organism C. elegans, microsatellite markers have been very poorly developed, and their practical application has been limited. Thus, a significant output of the present survey has been the identification of a wealth of microsatellite motifs in each of the nematode genomes analyzed. In the case of plant-parasitic nematodes, population genetics studies based on the use on polymorphic microsatellite markers are very scarce. They mainly concern the cyst nematodes Globodera pallida[35–37] and Heterodera schachtii, as well as the pinewood nematode Bursaphelenchus xylophilus, the reniform nematode Rotylenchulus reniformis and the dagger nematode Xiphinema index. Conversely, because of the lack of microsatellite markers available, no such study has been developed on M. incognita, although this species has been considered as 'the world's most damaging plant pathogen' . Indeed, only one microsatellite locus had been characterized so far in RKN, i.e. in M. artiellia, a species assumed to reproduce both by amphimixy and facultative meiotic parthenogenesis . The present study, with a total of 2,245 di- to hexanucleotide loci identified in the genome of M. incognita, now opens new perspectives for the development of a wide range of microsatellite markers in this nematode. Moreover, it provides useful information about possible physical linkage between microsatellite loci and identifies markers located in coding regions that may not be considered as neutral.
From a more general point of view, the microsatellites identified in this study may be useful for linkage analysis in the context of genetic mapping. Although not available for the parthenogenetic RKN M. incognita, genetic maps have been extensively developed in many nematode species, including M. hapla, C. elegans and P. pacificus and should benefit from the input of large batches of new markers. Additionaly, in the two latter species, location of microsatellites on sex chromosomes may be of further help for kinship analysis. To our knowledge, only one study reported the genome-wide chromosomal location of microsatellites in C. elegans, with no significant distorsion in frequency and distribution on the X chromosome compared to autosomes . Conversely, no such information can be provided for RKNs, due to the lack of sex chromosomes in these species, where sex determinism is under environmental epigenetic control [45, 46]. However, microsatellite loci could be located on the 2,995 contigs resulting from the genome assembly of the nematode , and this information has been obtained for the 4,880 microsatellite loci presently identified.
Because of its negligible cost, in silico mining of microsatellites makes this approach more comfortable than bench screening of genomic libraries. Although further experimental work is needed to setup PCR protocols and select the polymorphic loci that will become usable as genetic markers, recent genetic mapping or DNA fingerprinting studies demonstrated the success of this strategy . For example, in the insect T. castaneum, 509 new polymorphic markers were experimentally validated from 12,160 loci identified by a bioinformatics analysis . Of the 2,245 di- to hexanucleotide microsatellite sequences discovered in M. incognita, 2,183 had sufficient flanking sequences to allow the design of primer pairs. These data, including primer sequences, may become useful for developing variable markers in M. incognita.