Genome-wide survey and analysis of microsatellites in nematodes, with a focus on the plant-parasitic species Meloidogyne incognita

Background Microsatellites are the most popular source of molecular markers for studying population genetic variation in eukaryotes. However, few data are currently available about their genomic distribution and abundance across the phylum Nematoda. The recent completion of the genomes of several nematode species, including Meloidogyne incognita, a major agricultural pest worldwide, now opens the way for a comparative survey and analysis of microsatellites in these organisms. Results Using MsatFinder, the total numbers of 1-6 bp perfect microsatellites detected in the complete genomes of five nematode species (Brugia malayi, Caenorhabditis elegans, M. hapla, M. incognita, Pristionchus pacificus) ranged from 2,842 to 61,547, and covered from 0.09 to 1.20% of the nematode genomes. Under our search criteria, the most common repeat motifs for each length class varied according to the different nematode species considered, with no obvious relation to the AT-richness of their genomes. Overall, (AT)n, (AG)n and (CT)n were the three most frequent dinucleotide microsatellite motifs found in the five genomes considered. Except for two motifs in P. pacificus, all the most frequent trinucleotide motifs were AT-rich, with (AAT)n and (ATT)n being the only common to the five nematode species. A particular attention was paid to the microsatellite content of the plant-parasitic species M. incognita. In this species, a repertoire of 4,880 microsatellite loci was identified, from which 2,183 appeared suitable to design markers for population genetic studies. Interestingly, 1,094 microsatellites were identified in 801 predicted protein-coding regions, 99% of them being trinucleotides. When compared against the InterPro domain database, 497 of these CDS were successfully annotated, and further assigned to Gene Ontology terms. Conclusions Contrasted patterns of microsatellite abundance and diversity were characterized in five nematode genomes, even in the case of two closely related Meloidogyne species. 2,245 di- to hexanucleotide loci were identified in the genome of M. incognita, providing adequate material for the future development of a wide range of microsatellite markers in this major plant parasite.


Background
Microsatellites, also known as simple sequence repeats, are 1-6 base pair (bp) nucleotide motifs tandemly repeated in the genome of every eukaryotic organism analyzed so far. In contrast to non-repetitive DNA, microsatellite polymorphism is primarily due to variation in the number of repeated motifs rather than to substitutions. A very high mutation rate -from 10 -4 to 10 -3 mutation per microsatellite and per generation [1]is usually associated with microsatellite loci, resulting in high heterozygosity and the presence of multiple alleles at a given locus [2]. These markers are co-dominant, abundant in non coding regions of the genome, are relatively easy to isolate, can be specifically amplified by PCR, and evolve according to mutation models that are well described [1,2]. Therefore, microsatellites have emerged as the most popular and versatile neutral markers for geneticists working on a wide range of topics including, among others, forensics, genome mapping, population structure, phylogeny, linkage and kinship relationships [3]. The most conventional procedure for the isolation of microsatellite markers, i.e., enrichment of genomic DNA for microsatellite motifs cloning, screening of the resulting library and sequencing of the positive clones [4], is challenging, time-consuming and costly. Such enrichment methods also generally use one or a few specific repeated motifs that are most often selected without prior knowledge of their abundance in the genome and may not produce suboptimal results. The recent availability of huge amount of sequence data for a wide range of organisms, together with new methodological developments of in silico mining of microsatellites, have tremendously increased the characterization of these markers [5], and will certainly catalyze the study of genomic distribution of microsatellites in eukaryotes.
The isolation of microsatellites as useable markers appears to be more difficult in some taxa than in others, and has proved to be difficult in many invertebrates, including nematodes [6,7]. Except for the model species Caenorhabditis elegans, whose genome has been sequenced in the past decade [8], no genome-wide survey of microsatellites is available for nematodes. In addition, there have been relatively few studies of microsatellites isolated by conventional molecular biology approaches in this phylum compared to other eukaryotes, e.g. insects or vertebrates. Such unpopularity of microsatellites as genetic markers in nematodes was attributed in part to the unusually high proportion of loci that fail to produce interpretable PCR patterns, possibly as the result of interlocus flanking sequence similarity [9]. The root-knot nematode (RKN) Meloidogyne incognita is a serious plant parasite characterized by both its world-wide distribution and its very large host range [10], which raise questions about the origin, the processes of dispersal and the resulting genetic structure of the populations. In recent years, studies of genetic diversity have been carried out in this mitotic parthenogenetic organism using neutral molecular markers such as RAPD or AFLP, and revealed rather unexpected levels of clonal diversity among populations [11]. But surprisingly, like in other nematodes, microsatellites, which are usually regarded as among the most appropriate tools to study variation at the individual level, have been very poorly investigated in this taxon.
The recent completion of genome sequencing projects has provided new opportunities to evaluate and compare the distribution of microsatellites in nematodes. Besides the genome of C. elegans, additional whole-genome data are now available for nematodes with very different life styles, i.e. the necromenic species Pristionchus pacificus [12], the plant-parasitic species M. incognita and M. hapla [13,14], and the animal-parasitic species Brugia malayi [15]. Based on these genomic resources, we report here the first survey and comparative analysis of microsatellites in nematodes, which reveal variable patterns of microsatellite abundance and diversity in the genomes of these organisms. A more detailed focus on the genome of the RKN M. incognita allowed the characterization of 2,245 di-to hexanucleotide loci, providing the material basis for the future development of a wide range of microsatellite markers in this plant parasite of major agronomic importance.

Relative abundance and diversity of microsatellites in nematode genomes
We examined the distribution of perfect 1-6 bp microsatellites using an optimized detection threshold of 12, 8, 5, 5, 5 and 5 repeats for mono-, di-, tri-, tetra-, pentaand hexa-nucleotide motifs, respectively. For each motif type, these are the minimum number of repeats required for a microsatellite to be reported, that have been optimized as default parameters of the software to eliminate repeats which might be observed by chance [16]. So the results described here apply to microsatellites meeting this criterion. Accordingly, the total numbers of microsatellites found in the five nematode genomes appeared highly variable, ranging from 2,842 to 61,547, and covered from 0.09 to 1.20% of the nematode genomes (Table 1). When considering density and coverage of the microsatellites (i.e., number and length per Mbp of analysed sequence, respectively) in the five genomes, three homogenous groups could be defined ( Table 1). The first one comprised the two RKN species, M. incognita and M. hapla, which shared the lowest abundance of microsatellite loci. The second group combined C. elegans and P. pacificus, with microsatellite density and coverage values about two times higher than those of RKN. The largest values were exhibited by B. malayi (about five times those of C. elegans). The variability of microsatellite number found in nematode genomes was well explained by genome size for M. incognita, M. hapla, C. elegans and P. pacificus (r 2 = 0.95, F 1,2 = 37.5, p = 0.03), but not when B. malayi was included (r 2 = 0.10, F 1,3 = 0.34, p = 0.6) ( Table 1). Table 2 shows the relative abundance of the different microsatellite length classes (i.e., mono-, di-, tri-up to hexanucleotides). Overall, distributions significantly varied among species (Fisher exact test, p < 10 -4 ). The B. malayi genome exhibited the highest density and coverage of microsatellites, except for 6-bp motifs that proved to be more frequent in C. elegans. Mononucleotide repeat motifs notably outnumbered all other length classes in the five nematode genomes, ranging from 48.7% of microsatellites in M. hapla, to 75.5% in B. malayi. Between species, there was a large variation in the number of mononucleotide repeats per Mb of genomic DNA, with a significant increase in B. malayi. After mononucleotides, trinucleotides were the next most abundant length class in nematode genomes, except for B. malayi. In this particular species, di-and trinucleotide motifs each represented about 10% of the total number of microsatellites. Quite surprisingly, dinucleotide motifs appeared underrepresented in the two RKN species, at 3.5 and 4% of the total number of microsatellites in M. hapla and M. incognita, respectively. Further, there was a drop in the frequency of longer motifs in the five nematode genomes. Exceptions were tetranucleotide motifs in M. hapla, and hexanucleotide motifs in C. elegans, which comprised 9.3 and 2.6% of the total number of microsatellites, respectively.  The most common repeat motifs for each length class varied with the different nematode species considered (Table 3). A n and T n repeats were the most frequent motifs in three genomes (M. hapla, C. elegans and B. malayi), while C n and G n repeats dominated in the two others (M. incognita and P. pacificus), with no obvious relation to the AT-richness of the genomes (r 2 = 0.32, F = 1.39, p = 0.37). Overall, (AT) n , (AG) n and (CT) n were the three most frequent dinucleotide microsatellite motifs found in any of the five genomes considered, but with a variable relative rank in each species. The (AT) n motif was particularly dominant in M. hapla and in B. malayi. All together, these three motifs comprised from 69.4% (in C. elegans) to 94.7% (in P. pacificus) of the whole set of dinucleotide microsatellites in each genome. The frequency of tri-to hexanucleotide motifs was more variable, with the list of most frequent motifs becoming quite specific for each nematode species. Except for two motifs in P. pacificus (i.e., (CCT) n and (AGG) n ), all the most frequent trinucleotide motifs were AT-rich, with (AAT) n and (ATT) n being the only common to the five nematode species. Tetra-to hexanucleotide repeats were much less common in all five genomes, except tetranucleotide motifs in B. malayi, and to a lesser extent hexanucleotide motifs in C. elegans, and none of these single motifs appeared to be shared by the five nematode species. Among these three length classes, only three of the most common motifs exhibited a GC content >50%, i.e. (AAGGG) n and (ACGGGG) n in M. incognita, and (ACCGGT) n in C. elegans. It is to be noted that no such GC-rich motifs occurred among the most frequent in P. pacificus, although this genome has the highest GC content. The (AACCCT) n telomeric-like hexanucleotide repeat [17] was found abundant only in the B. malayi genome.

Diversity of microsatellites in the genome of Meloidogyne incognita
We further focused our study specifically on the RKN M. incognita, because it is a plant-parasitic species of major economic importance for which molecular Table 3 Relative frequency a of the most frequent microsatellite motifs found in nematode genomes markers suitable for population genetics are not available. Using our search criteria, we could identify a repertoire of 4,880 microsatellite loci in the genome of this nematode, represented by 58 different motifs that covered 0.09% of the total genomic DNA analysed. Genomic location of microsatellites (i.e., their position on the genome contigs), motif types and motif iterations are given in the Additional file 1. The twenty most abundant microsatellite classes were C n , G n , (GTT) n , (AAC) n , T n , A n , (AAT) n , (ATT) n , (AAG) n , (CTT) n , (AGG) n , (CCT) n , (CT) n , (AAAT) n , (ATTT) n , (AT) n , (AATT) n , (ACC) n , (GGT) n and (ACT) n ( Figure 1). Together, they comprised 95.6% of all microsatellites identified. Although they were the most frequent ( Figure 1; Table 2), mononucleotide motifs were no longer considered in the following analyses, because they are not of interest as putative microsatellite markers. For the same reason, dinucleotide microsatellites were taken into account only when they had more than eight repeated units (and more than five units for the tri-to hexanucleotides).
Overall, 54 di-to hexanucleotide motifs were identified and contributed to the diversity of the microsatellite repertoire in the M. incognita genome (Table 4). Small number (≤10) of tandem repeats of the microsatellite motifs predominated with more than 97% of all motifs belonging to this length group. A declining trend in abundance according to the number of repetitions occurred for all motifs, except for the tetranucleotide (AGGT) n . Interestingly, this motif also comprised the most iterated microsatellite locus, spanning 124 repetitions. Microsatellites consisting of trinucleotide repeats were by far the most abundant, representing 80% of the di-to hexanucleotides found in the nematode genome. They were followed by tetra-and dinucleotides, at a relative frequency of 9.5 and 8.7%, respectively, while penta-and hexanucleotides appeared quite rare (1.7 and 0.3%, respectively). Five different dinucleotide motifs were identified, that were comparable in terms of frequency (from 1 to 2.7%). Conversely, the distribution was extremely variable among the 15 trinucleotide motifs characterised, with frequencies ranging from 0.3 to 20.5%. The most frequent motifs were the AT-rich (GTT) n and (AAC) n , which collectively represented more than 41% of the total number of di-to hexanucleotides identified in the M. incognita genome. In contrast, the only motif composed solely of C and/or G, (CGG) n , was very poorly represented (0.09%). Overall, a total of 99 microsatellites (i.e., 66 di-, 13 tri-and 20 tetranucleotides) exhibited a large number of repeated motifs (≥10).

Distribution of microsatellites in the coding regions of the genome of Meloidogyne incognita and Gene Ontology annotation
Among the 20,365 predicted protein-coding loci (CDS) searched, a total of 1,094 microsatellites were identified  in 801 CDS, including 181 CDS containing at least two microsatellites (Additional file 2). They represent 22.4% of the total number of microsatellite loci found in the whole genome of this nematode. Overall, the distribution density of microsatellites is 53.7 per Mb of coding sequence. The occurrence of the repeat motifs found is shown in Figure 2A. Only mono-, di-and trinucleotide motifs were observed, this last category representing more than 99% of the total set of microsatellites. With 64.2% of repeat motifs, (AAC) n turned out to be the most frequent microsatellite, followed by (AAG) n (9.9%), (AAT) n (8.6%) and (AGG) n (6.0%). Among trinucleotide motifs, GC-rich microsatellites represented 13.6% only of the total number of loci. We further investigated the frequency of the amino-acid repeats (AAR) encoded by the tri-nucleotide repeats in the CDS ( Figure 2B). The ten different amino acids that composed the AAR identified were, from the most to the least frequent, Asparagine (Asn), Lysine (Lys), Threonine (Thr), Arginine (Arg), Serine (Ser), Isoleucine (Ile), Proline (Pro), Leucine (Leu), Glycine (Gly) and Valine (Val). The longest AAR encoded by trinucleotide motifs was observed for Asn with 14 repeats. When scanned for InterPro domains, 497 out of the 801 CDS (~62%) containing at least one microsatellite motif were found to harbour at least one known domain (Additional file 3), and were further assigned a corresponding Gene Ontology (GO) term. Overall, Cellular Component, Molecular Function and Biological Process GO terms could be assigned to 218, 349 and 246 CDS, respectively. For each GO category, the relative distribution of GO terms is represented in Figure 3. With regard to the cellular component category, 40.8% of the CDS were assigned a nucleus GO term, followed by the cell (33.5%) and the membrane (28%) categories. Conversely, the extracellular (0.5%) and extracellular matrix (1.4%) GO terms were poorly represented. Macromolecule metabolism (38.2%) was the most frequent GO term for CDS in the biological process category, just followed by regulation of biological process (37.4%) and cellular process (33.3%). Notably, only 0.8% and 1.6% of the annotated CDS were assigned a secretion and response to stimulus GO term, respectively. When the molecular function category was considered, binding GO term was over-represented (89.1%), followed by transferase (28.1%) and hydrolase (21.8%) activities.

Diversity of microsatellite distribution in nematode genomes
In this study, we used Msatfinder [16] to scan the recently assembled M. incognita, M. hapla, B. malayi and P. pacificus genomes for perfect microsatellites of 1-6 bp long. To validate our results, we performed a similar analysis of the C. elegans genome using the same bioinformatics tool and search parameters. The coverage of microsatellite loci in C. elegans has previously been estimated at 2139 bp/per Mbp [18], and we were encouraged in that our results are consistent with those previously reported for this nematode (~2059 bp/Mbp; Table 1). Indeed, such a consistency between two independent studies may be considered as a strong indication of the robustness of the global analysis. It has long been considered that the structure and organization of the C. elegans genome would serve as accurate model for most nematodes [19]. Previous comparative studies of model eukaryotes have shown that C. elegans has the lowest frequency and coverage of microsatellites in its genome, even less than Saccharomyces cerevisiae and other fungi [18,20,21]. Our own analysis reveals a huge variation among the five nematode genomes investigated in terms of both coverage and density of microsatellites, that is independent from the size of the genomes considered. While P. pacificus has similar microsatellite density and coverage than C. elegans, the two RKN species,  M. hapla and M. incognita, both exhibit a twofold lower abundance of microsatellite loci. Conversely, the animal parasite B. malayi contains five times more microsatellites/Mbp than C. elegans. Phylogenetic relationships within the Nematoda phylum have been considered to tentatively explain such variability. Based on small subunit ribosomal DNA sequences, the phylogeny of nematodes identifies five major clades in the phylum [22]. In this framework, C. elegans and P. pristionchus belong to Clade V, M. incognita and M. hapla belong to Clade IV, and B. malayi belong to Clade III, respectively. Although a loose correlation may be seen here, additional data from many other nematode species from all five clades are required before considering any hypothetical relationship between the clade of origin and the density of microsatellites for a given species. Clearly, these data provide evidence of variable patterns of microsatellite distribution in nematode genomes, indicating that the particular contribution of these shorts tandem repeats to the genome of C. elegans may not be the rule for other nematodes.
Without taking into account mononucleotide repeats, it is generally reported that dinucleotides are the most common microsatellites in many organisms [23,24]. Here, dinucleotides appeared significantly underrepresented in both RKN species, with as few as 3.5 and 4.0% of the total number of microsatellites in M. hapla and M. incognita, respectively. Conversely, trinucleotides were overrepresented in both the two latter species compared to C. elegans, P. pacificus and above all B. malayi. Such a distribution bias in favour of trinucleotides has been reported for some other eukaryotes, e.g., the fungus Neurospora crassa [25] or the insect Tribolium castaneum [26]. Also, a recent data-mining analysis of ESTs from phytoparasitic nematodes, including some RKN species, showed that trinucleotide repeats were the most abundant microsatellites in coding ESTs [27].
Data-mining of 26 completed genomes showed that microsatellites with low GC content are predominant in most eukaryotic genomes [5]. This trend also emerged from our survey, with the majority of the most frequent 1-6 bp microsatellite motifs from nematodes being ATrich. However, one notable exception was constituted by polyG/polyC mononucleotide repeats in M. incognita and P. pacificus. Conversely, none of the most frequent di-to hexanucleotide repeats contains exclusively Cs or Gs. Among nematode dinucleotides, (AT) n motifs seem to be predominant compared with other motifs, while (CG) n were extremely rare, and even absent in the two RKN species. (CT) n motifs, which are the most abundant dinucleotides in insects [28] and other invertebrates [20], were also frequently detected in nematode genomes. In the same way, trinucleotides were dominated by AT-rich motifs, with (ATT) n and (AAT) n being always present in the four most common motifs in the five species investigated. The only exceptions were (CCT) n and (AGG) n , which were frequently detected in the P. pacificus genome. AT-richness was further exhibited by tetra-to hexanucleotides, with some remarkable exceptions, i.e., (AAGGG) n and (ACGGGG) n in M. incognita, or (ACCGGT) n in C. elegans. Overall, the diversity of microsatellite motifs gave each of the five nematode species a unique pattern of repeat distribution, even in the case of the two related RKN species tested here, suggesting that they can be efficient at differentiating those species.

Microsatellites in the protein-coding sequences of Meloidogyne incognita
A total of 1,094 perfect microsatellites have been identified from the whole M. incognita CDS dataset, i.e., 3.9% of protein-coding sequences possess such repeats. The vast majority (>99%) of them are trinucleotide repeats. This result confirms a recent report indicating that trinucleotide repeats were the most abundant microsatellites in coding ESTs from 16 species of plant-parasitic nematodes belonging to seven genera, including Meloidogyne [27]. Moreover, in agreement with these authors, we found that (AAC) n repeats were the most abundant in M. incognita CDS, while (ATG) n or (TTA) n , that could act as start or stop codon, respectively, were not detected. The proportion of trinucleotide repeats in M. incognita CDS exponentially decreased as the number of repeats increased, with 14 repeats being a critical threshold. It is hypothesized that longer repeats are eliminated by selection acting on the nematode genome, since long amino acid repeats may have detrimental effects on protein functions [29,30]. In addition, our data showed that the hydrophilic amino acid Asn, and to a lesser extent Lys and Arg, are over-represented among the runs of amino acids in M. incognita proteins, which is consistent with the observation that stretches of hydrophilic amino acid are more tolerated in proteins [29,30].
A broad range of functions have been ascribed to amino acid repeats in proteins, including roles in intracellular protein-protein interactions, binding to host-cell receptors and polymerisation of their associated, nonrepeated domains [31]. In parasitic eukaryotes such as Trypanosoma brucei or Plasmodium falciparum, protein repeats are often implicated in antigenic recognition and evasion of the host immune response to infection [32]. Mining the M. incognita predicted proteome revealed a set of 801 proteins containing stretches of amino acid repeats, of which 497 had at least one InterPro domain assigned, thus illustrating the importance of these peculiar structures in parasitic nematodes too. In addition, GO annotation revealed that binding was the molecular function preferentially assigned to these proteins. Based on the hypothesis that amino acid repeats can mediate interactions between the parasite and its host, and thus could potentially have a role in pathogenicity [33,34], a similar role for (some of) these proteins identified in M. incognita may be proposed. From this point of view, our analysis generated a large set of candidate proteins for further functional analysis in relation to pathogenicity in plant nematodes.

Microsatellites as genetic markers in Meloidogyne incognita
Due to the relatively small size of nematode genomes, and the high impact of many parasitic species on animal/ human health or agricultural production, one of the expected outputs of nematode genome projects is the development of reliable and informative molecular markers usable as genotyping tools in population studies. Among the nematode species included in this work, and except for the model organism C. elegans, microsatellite markers have been very poorly developed, and their practical application has been limited. Thus, a significant output of the present survey has been the identification of a wealth of microsatellite motifs in each of the nematode genomes analyzed. In the case of plant-parasitic nematodes, population genetics studies based on the use on polymorphic microsatellite markers are very scarce. They mainly concern the cyst nematodes Globodera pallida [35][36][37] and Heterodera schachtii [38], as well as the pinewood nematode Bursaphelenchus xylophilus [39], the reniform nematode Rotylenchulus reniformis [40] and the dagger nematode Xiphinema index [41]. Conversely, because of the lack of microsatellite markers available, no such study has been developed on M. incognita, although this species has been considered as 'the world's most damaging plant pathogen' [10]. Indeed, only one microsatellite locus had been characterized so far in RKN, i.e. in M. artiellia, a species assumed to reproduce both by amphimixy and facultative meiotic parthenogenesis [42]. The present study, with a total of 2,245 di-to hexanucleotide loci identified in the genome of M. incognita, now opens new perspectives for the development of a wide range of microsatellite markers in this nematode. Moreover, it provides useful information about possible physical linkage between microsatellite loci and identifies markers located in coding regions that may not be considered as neutral.
From a more general point of view, the microsatellites identified in this study may be useful for linkage analysis in the context of genetic mapping. Although not available for the parthenogenetic RKN M. incognita, genetic maps have been extensively developed in many nematode species, including M. hapla [14], C. elegans [43] and P. pacificus [44] and should benefit from the input of large batches of new markers. Additionaly, in the two latter species, location of microsatellites on sex chromosomes may be of further help for kinship analysis. To our knowledge, only one study reported the genome-wide chromosomal location of microsatellites in C. elegans, with no significant distorsion in frequency and distribution on the X chromosome compared to autosomes [20]. Conversely, no such information can be provided for RKNs, due to the lack of sex chromosomes in these species, where sex determinism is under environmental epigenetic control [45,46]. However, microsatellite loci could be located on the 2,995 contigs resulting from the genome assembly of the nematode [13], and this information has been obtained for the 4,880 microsatellite loci presently identified.
Because of its negligible cost, in silico mining of microsatellites makes this approach more comfortable than bench screening of genomic libraries. Although further experimental work is needed to setup PCR protocols and select the polymorphic loci that will become usable as genetic markers, recent genetic mapping or DNA fingerprinting studies demonstrated the success of this strategy [5]. For example, in the insect T. castaneum, 509 new polymorphic markers were experimentally validated from 12,160 loci identified by a bioinformatics analysis [47]. Of the 2,245 di-to hexanucleotide microsatellite sequences discovered in M. incognita, 2,183 had sufficient flanking sequences to allow the design of primer pairs. These data, including primer sequences, may become useful for developing variable markers in M. incognita.

Conclusions
This analysis of microsatellites in completely sequenced nematode genomes provides a snapshot of the differential coverage and density of 1-6 bp repeats among the five species investigated. In particular, the two RKN species, M. hapla and M. incognita, both exhibit a two times lower abundance of microsatellite loci than C. elegans, which was previously considered as containing the lowest frequency of microsatellites among (model) eukaryotes. Quite surprisingly, dinucleotide motifs appeared underepresented in the two RKN species compared to the other nematode genomes, while trinucleotides were equally overrepresented in the two latter species compared to C. elegans, P. pacificus and B. malayi. The focus on M. incognita led to the identification of 4,880 microsatellites, 2,183 of them being a priori suitable to design markers for population genetics. Interestingly, 22.4% of the detected microsatellites were located in coding regions, almost all being short (<14 repetitions) trinucleotide motifs. This result suggests that microsatellites may affect evolution of proteins structure and function in this species.
The nucleotide sequences of the microsatellite loci obtained in this work, together with their flanking regions and amplification primers, are available for the five nematode species, upon request to the authors. Undoubtedly, this information may become useful for the development of large sets of markers that should in turn allow linkage mapping studies and facilitate population genetic research on nematodes.

Sequence data
The M. incognita and M. hapla genome assemblies [13,14] were downloaded directly from the sequencing project websites at http://www.inra.fr/meloidogyne_incognita and http://www.hapla.org. respectively. The genome sequence of C. elegans, B. malayi and P. pacificus were downloaded from WormBase (release WS210) at http://www.wormbase.org/. The total number of bp searched and (G + C) content for each of the three genomes are indicated in Table 1. For M. incognita, the whole set of 20,365 protein-coding sequences (including splice variants) predicted from the whole-genome sequence [13] was included in the analysis.

Sequence analyses
Genomic sequences were scanned for microsatellite content using the program Msatfinder v2.0.9 [16] downloaded at http://www.genomics.ceh.ac.uk/msatfinder/. Msatfinder is a Perl script designed to allow the identification and characterisation of microsatellites in a comparative genomic context. The Regex search engine, which implements fast regular expressions to search once through the sequence, was used in our analyses. Detection criteria were constrained to perfect repeat motifs of 1-6 bp and a minimum repeat number of 12, 8, 5, 5, 5 and 5, for mono-, di-, tri-, tetra-, penta-and hexa-nucleotide microsatellites, respectively. Primer pairs for the identified microsatellite loci were designed using the Primer3 software [48] implemented in Msatfinder using default parameters.
Partial standardization of the microsatellite motifs was used for categorizing and comparing microsatellites, by considering overlapping components occurring in one DNA strand only, i.e. without including the sequence complement [49]. For example, the AAC class contains the (AAC) n , (ACA) n and (CAA) n microsatellites. In the present study, a total of 4, 6, 20, 60, 204 and 670 theoretical classes were considered for mono, di, tri, tetra, penta and hexanucleotides, respectively. To allow direct comparisons regardless of the size of the genomes analysed, density (number of loci) and coverage (number of bp) of the microsatellites were calculated for 1 Mbp of the corresponding genomic sequence.
The complete set of predicted protein-coding sequences resulting from the M. incognita genome project [13] was scanned for InterPro domain content [50]. Based on the domain annotations, Gene Ontology (GO) terms were assigned using the annotation tools available on the GO consortium website http://www.geneontology.org. GO annotations were further formatted for input into the GOSlim program and the output was parsed to count the occurrence of each GO category.