Properties of zebra finch EST-SSRs
SSRs appear to be relatively abundant within zebra finch ESTs. 3.7% of unique sequences contain an SSR greater than 20 bp long and almost 1% of unique sequences contain an SSR greater than 30 bp long. These values are comparable to studies of other species, most of which have been conducted in cereals or other plants [36, 42, 43]. These are likely to be conservative estimates of the number of SSRs per gene as many ESTs or assembled contigs do not span the entire length of the gene transcript. Estimates of the proportion of ESTs that contain EST-SSRs are generally not available for other vertebrates. However, our unpublished data indicate that approximately 3.8% of chicken unigenes contain EST-SSRs while in mammals the proportion ranges from ~2.0% in sheep to ~15.6% in mouse. Note that inter-genic microsatellites are thought to be much rarer in avian genomes than mammalian genomes  but there is little indication that a similar pattern holds for EST-SSRs.
Among EST-SSRs of ≥20 bp length, trinucleotides were the most abundant type of repeat motif, followed by dinucleotides. Among EST-SSRs ≥30 bp, the two repeat types were equally abundant (each ~35% of all SSRs). The proportion of dinucleotides appears to be similar in the zebra finch as in other species for which comparable data are available, e.g. [36, 45]. The observation that dinucleotides are relatively more frequent among long EST-SSRs is also consistent with previous studies .
The most common motif among EST-SSRs ≥20 bp or ≥30 bp was the dinucleotide AT. Similar observations have been made in rice  and several species of pine , but this pattern is by no means consistent across species [36, 43]. The relative frequency of different trinucleotide motifs was dependent on SSR length. Among SSRs ≥20 bp AGG was the most common (12% of all EST-SSRs), but among the SSRs ≥30 bp AAT was the most common (13%). In other species there is no clear consensus to which trinucleotides are most frequent [35, 36, 46].
Approximately 1/3 of EST-SSR loci could be assigned to known genes in the Ensembl chicken peptide database. It was possible to predict the within-gene location of these EST-SSRs, which revealed clear differences between the repeat types. Trinucleotides were more often located in coding sequence (CDS) of genes (29.8% of cases) than other repeat types (0%, 10% and 3% for dinucleotides, tetranucleotides and pentanucleotides respectively). This observation is expected as a loss or gain of repeat unit in a trinucleotide will not result in a frameshift mutation. CDS trinucleotide repeats are of particular interest, as in other organisms a number of pathologies and behaviours are associated with triple repeat expansions [47–51]. There were two tetranucleotides and two pentanucleotides that were identified within the CDS of genes. However, all four loci were relatively short (4–5 repeat units long) and two of them were interrupted. Therefore, these loci may have relatively low mutation rates, minimising the probability of frameshifts arising. Among non-CDS EST-SSRs trinucleotides and pentanucleotides were more likely to be in the 5' UTR than the 3'UTR, while the opposite was true for dinucleotides. A similar pattern is observed in pine species , although there are relatively fewer EST-SSRs in the CDS of zebra finch, regardless of repeat type. In practical terms this is useful as non-CDS SSRs are the most likely to be polymorphic (see below).
Chromosomal location of two-thirds of EST-SSR loci was predicted by in silico mapping to the chicken genome. It should be noted that these predicted chromosomal locations can only be confirmed when the zebra finch genome sequence is assembled or a linkage map constructed. Given that synteny is highly conserved between the chicken and passerine genomes [28, 30] it is likely that loci assigned to a particular chicken chromosome will prove to be linked in the zebra finch. Therefore, EST-SSRs appear to be dispersed approximately evenly across the zebra finch genome, although they are probably at a marginally higher density on microchromosomes than macrochromosomes. This observation is consistent with the reports that gene density in chickens is greatest on the microchromosomes . Once these EST-SSRs are assigned a map position in the zebra finch (and other species) they will provide insight into the extent of chromosomal rearrangements between different avian lineages. Note that the four linked loci that we mapped appeared to be in the same order as their linked homologs on chicken chromosome 7.
Applications of EST-SSRs
The SSRs identified in this study represent a useful resource for evolutionary genetic studies of birds. Most obviously, they can be used to build a zebra finch linkage map, possibly acting as framework loci, used in tandem with SNPs typed at a higher density. We have demonstrated that linkage map construction should be relatively straightforward, and will provide a useful complement to ongoing efforts to construct a physical map. Evolutionary quantitative genetic studies of zebra finches have estimated the heritability of traits such as stress response , sperm morphology , body condition , bill colour  and digit ratio . The availability of a linkage map would facilitate the next stage of genetic studies (i.e. mapping the loci that determine additive genetic variance) of an important model organism in evolutionary and ecological research. A linkage map would also represent a useful tool to aid assembly of the zebra finch genome once shotgun sequencing is complete, because it can help identify which contigs reside on particular chromosomes. Assembly of the chicken genome sequence was partially reliant on the consensus chicken linkage map .
Use of the EST-SSRs described here need not be restricted to studies of the source species. Previous studies have estimated that less than 10% of passerine microsatellites are polymorphic in species from different taxonomic families [26, 57, 58]. Therefore, although >900 microsatellites have been isolated in passerines, they were derived from ~75 different species, and the majority are not informative in any one species. The location of these microsatellites relative to genes was unknown, although the majority were probably intergenic. There are several reasons to suspect that EST-SSRs will be much more widely applicable across species than intergenic microsatellites.
First, because EST-SSRs are located within exons they are likely to be under greater functional constraint than intergenic microsatellites. Therefore, sequence that flanks the repeat motif of an EST-SSR is expected to diverge at a slower rate than is commonly observed with intergenic markers. This expectation is demonstrable with our data. Using the BLASTn default settings 68% of EST-SSRs showed significant (E < 1e-10) homology to the chicken genome, with an average sequence similarity of 91.6%. This figure compares favourably with a study of passerine intergenic microsatellites , where just 14.0% of markers showed sequence similarity to chicken at E < 1e-10 under the same settings.
Secondly, an encouraging proportion of the limited number of zebra finch EST-SSRs that we tested were polymorphic in another passerine species, the house sparrow. Estimating divergence times between passerine species is not straightforward, but zebra finches and house sparrows probably diverged between 20 MYA  and 45MYA  and are in completely different families (Estrildidae and Passeridae). Therefore, polymorphic zebra finch EST-SSRs appear to be conserved across passerine families.
A third piece of support for the widespread applicability of zebra finch EST-SSRs is provided by the small proportion (6 out of 638) of loci identified in this study that have also been isolated in other passerines. Two of these markers Ase49 (cloned in the Seychelles warbler Acrocephalus sechellensis and homologous to contig 26) and MSLP4 (isolated in the Japanese Marsh Warbler Locustella pryeri and homologous to DV951916) have been examined with respect to cross-species amplification success rate [61, 62]. Both loci are polymorphic in species of different sub-families to the source species, and in fact no other markers cloned in these species have greater cross-species amplification success.
Although, there is substantial evidence that the zebra finch EST-SSRs are conserved across other passerine species, there use in many population genetic studies will be limited unless they are polymorphic (although monomorphic loci may still be useful in molecular evolution studies). Data presented here and elsewhere indicate that a reasonably large proportion of EST-SSRs will be polymorphic in other passerines. Among loci that produced a PCR product 8/8 (100%) were polymorphic in the source species (the zebra finch) and 6/7 (86%) were polymorphic in a distantly related passerine, the house sparrow.
In silico analysis of contigs with three or more overlapping sequences indicated that greater than 40% of loci were polymorphic within the zebra finch. This figure is likely to be an underestimate of the proportion of polymorphic markers for two reasons. First, the majority of loci were represented by only 3 or 4 sequences, which means polymorphism will be undetected at some variable loci. This point is illustrated by the fact that an analysis restricted to loci represented by four or more sequences, resulted in a higher estimate of 51%, while a similar analysis of uninterrupted repeats yielded an estimate of 60% [see Additional file 2]. Second, because many of the ESTs come from the same libraries, it is inevitable that some contigs will include multiple sequences from the same individual and will not be independent - thereby making it impossible to detect polymorphism. More generally, there is already good support that ESTs-SSRs are often polymorphic within both the source species and other species [37–40, 42].
There are several strategies that could be employed to ensure that future laboratory efforts focus on zebra finch EST-SSRs that are variable in the source species and in other species.
The first way in which polymorphic EST-SSRs could be identified is to concentrate laboratory efforts on dinucleotide repeats. Previous studies have shown dinucleotides to be more polymorphic than longer repeat types [38, 46, 63, 64], although an analysis of passerine intergenic microsatellites did not support this observation . Secondly, it is likely that EST-SSRs within non-coding regions are more variable than those found in coding regions, as they are less likely to be under functional constraint. This prediction does have empirical support from studies of rice  and bread wheat . Note that among EST-SSRs identified in this study, dinucleotides were the least likely to be in the coding region, which again supports the maximisation of laboratory efforts on dinucleotides. A third way to enhance the proportion of EST-SSRs that are polymorphic is to focus efforts towards the longest and purest repeats. Among passerine intergenic microsatellites there is a significant positive relationship between repeat length and the probability of being polymorphic . Similarly, there is a positive relationship between repeat length and heterozygosity in a variety of taxa, including birds . This pattern seems to hold for EST-SSRs [38, 43]. There is also empirical support for the idea that uninterrupted (ie pure) repeats are more variable than those with interruptions [66, 67]. Finally, polymorphism can be detected in silico within overlapping EST-SSR sequences. In summary, the 51 dinucleotide EST-SSRs that are ≥30 bp long, and the putatively polymorphic loci reported in Additional file 2 are probably the most likely to be variable in zebra finches and other passerine species.